Computer Science
Assignments
Assignment #1 Due: Tuesday 29 Apr 08, in class
You should do this assignment alone. Feel free to use any sources of information you can find--the goal is to learn as much as you can about parallel computing.
You may work with others, but the algorithms and programming should be your work.
The purpose of this assignment is to give you an opportunity to experience writing simple programs using shared memory.
This experience will provide a foundation for writing and/or running much more complex programs.
You may use any machine available to you that has at least four distinct processors. One such machine you may use is wintermute03.cs.auckland.ac.nz, a computer that has four cores, each with two strands (hardware threads).
You must use this machine only for 703 homework assignments or, with instructor's permission, for your course project. You have many choices in doing this assignment. You may use any language and any tools available on this computer.
Assignment 1 (5% of grade)
Write a shared-memory program that reads from a file a list of double-precision
floating point numbers and rearranges them. It then writes to the standard output(the
screen, most probably) the largest, smallest, and median values. Your program should
operate in parallel with P processors specified on the command line. You may want
to test your program with smaller files, but the benchmark data consists of 50 million
double-precision (8 byte) floating point numbers stored in binary format in a file to be made available.
The data is little-endian (since the processors are Intel Xeons) so be careful to convert if moving
to a big-endian machine. As a sanity check the first element in the array is: 0.0108401219279
You can find the file at: /var/tmp/floats
Do not try to copy the file--it is 400 MBytes! In addition to printing the three
values, you should also print out (a) the wall-clock time from when the file is
opened until it is assembled in main memory as a single array, and (b) the wall-clock
time from when the array is filled until the three values have been determined (but
not yet printed). You may want to run the experiments several times, because times
may vary based on other demand on the system. You should report only the best time.
The assignment is to determine from the data the three values of minimum, maximum, and median. There is a linear-time algorithm for determining median, and you are permitted (but not encouraged) to use this algorithm instead of sorting. You need not do a complete sort if you can determine these three numbers from the data by some other means.
As another sanity check, you could try your program on the file floats.small. This one has the same distribution as the other file but is only 4MBs. The stats for this file are as follows:-
first 0.0201262799209min 5.13395136603e-22
max 6.77587867453e+21
median 1.0026277229
length 500000
Important: this isn't the assignment file. Use it as a sanity check only if you want to.
What to Hand In
Turn this homework in on paper in class at the beginning of lecture on the due date (Tuesday 6May08) - This is the new date and it includes the extension. No late assignments will be accepted.
Describe the algorithm you used, how you parallelized the program and why you believe this is the best method. Provide printouts of your program using 1, 2, & 4 processors. These runs should use processor affinity to make sure that the strands are running on different processors. In addition, you should run the program with 8 strands, the full complement available on this processor.
Printout should include times, number of processors, and the answers. (M is the maximum number of processors/hardware threads you are able to use, ideally 16). Report some other result you have obtained developing and testing the program, e.g., the time for sorting a range of file sizes, or the results of sorting a partially sorted file. Explain why the result is interesting.
As an appendix, you should attach source code for all experiments you reported.
Hints
1. Use the following for CPU/Memory wintermute03 stats:
cat /proc/cpuinfo # CPUs/cores and their numbers
free -m # Total and free memory (in mbs)
top # Whether others might be using it
2. wintermute03 has four processors, each with two strands (hardware threads). Two strands running on a single processor never run twice as fast as a single strand. You should explore the difference, for example, between two strands running on the same processor and two strands running on different processors.
3. Things to google:
- General: cpu affinity, selection algorithm, quicksort
- Python: Numeric, python affinity 0.1.0
- Linux: linux, Compsci 215 Lectures
If you have any questions, please feel free to contact Fuad (fuad at cs.auckland.....)
-
Related Programmes