Data Structures and Algorithms

Workshop 1 - 1998

Familiarisation

The basic aims of this workshop are to

familiarise you with techniques to time an algorithm written in ANSI C and
experimentally confirm the time complexity of addition and searching operations for various data structures.

You will need to write a simple program to insert data into a tree, measuring the average time to add an item and to find a randomly chosen item in the tree.

Download and save into your directory from the download window:
1. Collection.h
2. Collection.c
3. tree_struct.c
4. tree_add.c
5. tree_find.c and
6. the test program tc.c.
Modify the collection implementation so that it uses a tree structure rather than the original array. Edit out the original structure, find and add methods and load in the new ones. Of course, you will be able to leave the specification of the Collection class untouched.
Compile into an executable:

gcc -o tc tc.c collection.c
Note that you will need to use an ANSI C compiler as the programs are written in ANSI C. On some Unix machines, cc only accepts the older "K&R C".
Run the program and verify that it runs as expected. Examine the test program listing to determine how it runs!
Now we want to find out how efficiently it runs:
1. Modify the test program so that it generates and then inserts a large number, n, of integers into the collection.
  Note that you will need to be careful about the choice of the set of numbers that you generate. Compare what happens to your times when you use the set of integers, 1 to n, to when you use the rand() function to generate a set of random numbers to add.
  Once you have created a collection with n items in it, determine how long it takes, on average, to find an item in the collection. Again you will need to generate a set of "probe" data which you search for in the collection. (Searching for the same item multiple times may cause problems and give you misleading answers - why?)
  
  Timing
  Timing an individual function call has some traps: read these notes for some guidelines.
2. Determine the average time to search the collection for a randomly generated integer - again by finding the time to search for n randomly generated integers. Use the random number generator, rand(), to generate random integers.
3. Modify the program so that it prints out the insertion and searching times for a range of values of n. Suitable values will produce run times between about 1 and 20 seconds. About 10 values should enable you to determine the characteristics of the time vs n curve.

Warning: Note carefully that the test program, tc.c is a test program - designed to demonstrate that the collection code is correct. You will need to re-organise it quite a bit to perform the timing analyses efficiently.

Report

Prepare a brief report which summarises your results. (The report should be plain ASCII text - not the native form of any word processor.)

This report should start by forming a hypothesis about your expected results. This should be followed by the actual results. The conclusion should provide a convincing argument as to why these results confirm your original hypotheses. It should also highlight and attempt to explain any discrepancies.

You are expected to measure the addition and searching times. Your report should also discuss the difference (if any) in results observed when a sequence of integers, 1, 2, .. is used for the test data compared to a randomly chosen list of integers.

If you design your program efficiently, you will be able to get it to generate a table of data for you which you can paste directly into the report.

Submission Instructions will be available soon!

Data Structures and Algorithms

Workshop 1 - 1998

Familiarisation

Timing

Report