INSTRUCTIONS BEFORE DEPLOYING THE PROJECT
******************************************

In order to deploy the project on your machine you need to follow these instructions

1: Plateform
-------------

We conducted our experiment on Pentium-IV 2x2.13GHz machine with 3G main and 160G disk memory under 
Windows-XP. But if you want to run in in LINUX environment then you might need to make a little settings
in order to deploy the liberaries

As concerning the tools we implemeted it in Java language using Eclipse IDE Version: 3.3.1.1. But it will work
for almost all Java IDE.

2: Deployment of libraries
--------------------------

To measure the memory and processing cost of the algorithm We used some built-in libraries that are included 
in the project folder with file name "sizeofag.jar". You need to copy this jar file in you working directory 
and need to include it in Eclipse libray using the option "Add External JARS". As for measuring the proceesing
cost we used "System.nanoTime()" method provided by Java API.

3: MultiHashMap
----------------

Since java hash table does not support to store the multiple tuples corrosponding to same key value and in our
scenario we need to store multiple tuples against each foreign key therefore we used MultiHashMap provided by
Apachi Common collections that deals the given situation. The libray of Common collection is also included 
in the project with folder name "collections-generic-4.01". You need to unzip the folder and copy "collections-generic-4.01"
jar files in your working directory and also need to include this library in your Java IDE as stated in point 2.

4. Data specification
---------------------
Currently the data setting for HYBRIDJOIN algorithm is according to 50MB memory budget. However you need to set these
values for every new memory budget. The distribution of memory to the different components of HYBRIDJOIN is according
to the formulas we mentioned in the paper. 

The further specification of components is as under:

a. Relation R
-------------
The disk-based relation R is master data which we stored in database. Before starting the experiments you need to write a script
to generate this master data and need to store it into database with Primary Key on join attribute. In our experiment we also assume
that the R is sorted according to the frequency of accessing.

b. DISK-BUFFER
---------------
The dis buffer is a component to load the disk pages into memory.We tune the size for disk buffer using the cost model we proposed in
 paper. For our current settings its value is 60KB.

c. STREAM-BUFFER
----------------- 
The stream-buffer is used to hold the fast input stream if required. The size of stream buffer is too small (0.05MB is 
sufficient for our all experiments)

d. HASH_TABLE
----------------
 Hash table is used to store the stream tuples. As mentioned above that the java hash table does not support to store the
 multiple tuples corrosponding to same key value therefore we used MultiHashMap provided by Apachi. The value of fudge factor
 that we considered in our implementations is 8.

e. Queue
---------------------
The queue to store the pointers addresses of the tuples in hash table in order to keep the record of expired tuples.
Each node in the queue based on double link list,containing the attribute value and the addresses of one step neighbour nodes. 

5. Output
---------

For every execution the output in terms of processing cost is written in the text file with name that you will specify in the
programme.




 