COMPSCI 210: Assignment #3 - Computer Organization & C++ Programming

This assignment aims to give you some experience with C++ programming in the context of Computer Organization. The solutions should use the C++ programming language, and only command-line applications are required.

You may wish to study and understand the C++ example programs supplied to you in the lectures and tutorials before commencing this assignment. Some of the sample code supplied may give you a head start with the assignment.

We will be using an automated marking facility, so it is important that you strictly follow the input/output format specified in the assignment. If in doubt, please consult with the lecturer or tutor. A marking guideline for the assignment is supplied, and you must go through the guideline before submitting your solution.

We will be using the GNU compiler gcc on login.cs.auckland.ac.nz for marking. Therefore, you must ensure that your submissions compile and run on this system. Submissions that fail to compile will attract no marks.

An alternative representation for unsigned integers

In the first part of this course, you learnt how primitives types such as integers and floating-point numbers are represented in a computer.

This assignment is about an alternative representation for unsigned integers: uintvar, or variable-length unsigned integer.

Variable-length unsigned integers, or uintvar in short, are also known as variable-length quantities. Refer to the Wiki article on the topic for a historical note and current usage. The article also has an illustrative example.

A uintvar representation uses variable number of bytes to represent an unsigned integer. Only the lower 7 bits of a byte are used for the number. This means the maximum value a single byte can house is 127. Values greater than 127 bytes will need more than a byte to represent them.

The most-significant bit (MSB) of the byte is used as a continuation marker. If MSB is 0, that means no further bytes to follow, and an MSB of 1 means there is a byte to follow. This allows chaining an arbitrary number of bytes.

Representing a value as a uintvar: Write down the binary representation of the given value. Group them into 7-bit groups starting with the lowest significant bit. For example, 137 is 10001001. Grouped into 7-bit groups, this will be 1:0001001. For each group, except the last group, add an eighth bit at MSB and set it to 1. For the last group, add the eighth bit at MSB as before but set it to 0. Our 137 example, then becomes 10000001:00001001. In other words, the uintvar corresponding to 137 (89 in hex) has two bytes: 81 and 9, both in hex.

To decode a sequence of bytes representing a uintvar into an integer, we follow the reverse of the process described above: discard the MSB of each byte, and concatenate all the remaining 7 bits of each byte; find the integer value represented by the resulting binary number. Note that we need to ascertain that all but the last of the input bytes have their MSBs set to 1, while the last byte its MSB set to 0.

A uintvar is related to base-128 representation of an integer. For example, 137 in base-128 is 19. The only addition we have for uintvar is the presence of the continuation bit. You can use the techniques you learnt earlier in this course to represent values in, for example, hex to represent values in base-128. Once the value is in base-128, then the only addition required for uintvar is to set the continuation bits appropriately. Similarly, when you are decoing a uintvar, you can find the value represented by the lower 7 bits of each byte (this will be less than or equal to 127), and apply base-128 arithmetic to find the integer value represented by the uintvar.

You may also find this observation useful: when we have 7-bit values in a byte, setting the MSB amounts to adding 0x80 (or 128 in decimal) to the byte value.

Examples: The value 128 requires two bytes: 1000 0001 0000 0000. And value 137 will be: 1000 0001 0000 1001. The second bytes for the values 128 and 137 have their first bits set to 0 to indicate there are no more bytes to follow.

Part I: Representing unsigned integers as uintvars

In the first part of this assignment, you write a program that prints to standard output the uintvar representation of an unsigned integer.

The program should accept input from the command-line and print results to the standard output. For each unsigned integer supplied in the command-line the program prints the corresponding uintvar byte values. Note that the inputs in the command-line are assumed to be in decimal format.

(and nothing else). Here byte values (81, 0) make up the uintvar representation of 128 (decimal). And byte values (d7, 57) make up the uintvar representation of 11223 (decimal). Each byte value is printed out in hexadecimal format.

You may assume the inputs on the command-line will be valid, and therefore you do not need to check for input errors (such as running ./a.out 3.14159 hello 2.125 ).

You may assume the inputs on the command-line will be valid, and therefore you do not need to check for input errors (such as running ./a.out 3.14159 hello 2.125 ).

A sample solution for this part is available: a3ss_part1.s. You are required to try it out: compile the file on login.cs.auckland.ac.nz using the command g++ -o a3ss_part1 a3ss_part1.s and run the resulting executable a3ss_part1. Note that the option -o tells the compiler to create the executable in the supplied file name rather than the default a.out. If you omit -o a3ss_part1, then the executable will be called a.out.

Note that g++ is an alias to gcc -lstdc++. In other words, g++ invokes the GNU compiler gcc and instructs it to use the standard C++ library at link stage. Therefore g++ hello.cpp and gcc -lstdc++ hello.cpp are equivalent commands.

Part II: Finding the unsigned integer values represented by uintvars

This part of the assignment requires you to write a program that is the inverse of what you accomplished in Part I.

For each uintvar supplied in the command-line the program prints the corresponding integer values. Inputs in the command-line are assumed to be in hexadecimal representation. The output integer is in decimal representation.

Note that the input 813e5a2f above is terminated by byte 3e. The remaining bytes (i.e. 5a2f) have no relevance. Effectively 813e5a2f is the same as 813e.

Note that the input of 9a81 above is not valid since there is no terminating byte (i.e. there is no byte with the MSB set to 0). In such a case, we output 0 to mean an error.

You may assume the inputs on the command-line will be valid, and therefore you do not need to check for input errors (such as running ./a.out 3.14159 hello 2.125 ).

A sample solution for this part is available: a3ss_part2.s, and you are required to try this out.

Part III: Converting WBMP to PBM (Bonus)

The bonus part of the assignment is optional, and if this is completed, you get a bonus mark of 20% of the total contribution of this assignment.

The bonus part requires the solution to the main (i.e. non-optional) parts of the assignment, and thus you must only attempt to solve the bonus part after completing the main parts.

Wireless Bitmap (WBMP) is a monochromatic image file format used by older mobile devices. A number of sample WBMP images are available here. A tutorial handout on the format of WBMP is available.

Portable Bitmap (PBM) is a text-based file format for images. A number of sample monochromatic PBM images are available here.

In this part of the assignment, you are required to convert monochromatic images in the WBMP format to PBM format (of type P1).

To this end, you need to find out the width and height of the WBMP image and its bitmap data. You can get these by reading the content of the WBMP file. See the file input/output tutorial provided at cplusplus.com. Note that the width and height of the image in a WBMP file are represented as uintvars. Therefore, you need to use your solution from Part II.

You will need to use the C++ input file stream class ifstream to open the WBMP image file, and use ifstream's read method to read bytes from the file. An illustrative example of how this could be done is provided at cplusplus.com.

Note that you can redirect the output to a PBM file (e.g. ClockTower.pbm) which can then be viewed through a suitable viewer (such as SlowView on Windows).

A sample solution for this part is available: a3ss_part3.s, and you are required to try this out.

Submission of Assignments

Before submission, please ensure that you have gone through the supplied marking guideline.

Further Work

A string (i.e. a sequence of characters) has an arbitrary length. Find out how strings are stored and how the length is determined. Can we use the continuation-bit technique used in uintvars to store strings? Discuss your answer.

COMPSCI 210: Assignment #3 - Computer Organization & C++ Programming

An alternative representation for unsigned integers

Part I: Representing unsigned integers as uintvars

Part II: Finding the unsigned integer values represented by uintvars

Part III: Converting WBMP to PBM (Bonus)

Submission of Assignments

Further Work