Presbyterian College > Academic Web Server > Jon Bell > CSC 232 > Labs > #6
In this lab, you will study the difference in performance between linear search and binary search, by using a software "timer" to measure how much time each method requires on a real set of data.
Download the following program and data files:
The four *.txt files contain the number of words given in the file names, sorted in ascending "alphabetical" order. This makes them usable for both linear and binary search without any further sorting or modification.
Compile the program and run it, and give it one of the *.txt files as input. Right now it should simply ask you for the file to read, then read the words from that file into a vector, and display the number of words in the vector.
The program also has functions for performing linear search and binary search, the same ones we saw in lecture, but it doesn't use either of them right now.
On our system, a single search using either linear search or binary search happens so quickly that it's hard to measure the time it takes. Besides, the time for a single search depends on which word we're searching for, and the linear versus binary search comparison may come out different depending on the word.
Therefore, we need to perform many searches for different words, so that we get a representative sample, and measure the total time required for all the searches. The best way to get a representative sample of words is to choose them randomly from the vector, by generating random numbers to use as vector indexes (subscripts).
If you don't have the following header file already, download it:
This gives you direct access to the random-number generator that the Dice class uses. To use it, you must also link to the library file /usr/local/lib/tapestry.a. We are going to use it to select words at random from the list that your program reads in.
Somewhere in your program, before you actually need the random numbers, declare an object of type RandGen:
RandGen myRandGen;
Now, whenever you need a random integer ranging from a to b, call this object's RandInt member function:
randomNumber = myRandGen.RandInt (a, b);
For example, if you want random numbers ranging from zero up to the largest index for the vector words:
randomNumber = myRandGen.RandInt (0, words.size()-1);
Remember the index doesn't go all the way to words.size() because it starts at zero, not 1.
Add code to wordsearch.cpp so that it asks you for the number of times to search, then goes around a loop that many times. Each time around, it should pick a random index for your words vector, get that word from the vector, and search for it in the vector using linear search. Of course, the search should succeed each time; verify that this does indeed happen by inserting appropriate debugging output statements.
After you've verified that the program works OK so far, comment out the output statements inside the loop (but don't delete them completely).
Download the following header file:
This header file also requires that you link to /usr/local/lib/tapestry.a, which you're already doing anyway. The basic usage is simple. First you declare an object of type CTimer:
CTimer myTimer;
To start, stop and reset the timer:
myTimer.Start(); myTimer.Stop(); myTimer.Reset();
Just like a real stopwatch, if you don't reset the timer, it always starts counting at the time it was last stopped. To display the total cumulated time between all "starts" and "stops", not including the pauses in between:
cout << myTimer.CumulativeTime();
Set up a timer in your program, and use it to time how long it takes to perform the searches inside your loop. To avoid including the time it takes to control the loop and generate the random numbers, start the timer just before you call the search function, and stop it after you call the search function. That is, re-start and stop the timer each time around the loop. After the loop finishes, divide the total cumulated time by the number of searches to get the time per search.
Run your program on each of the four *.txt files (which have 500, 1000, 2000, and 4000 words, respectively), and tell it to do 10000 searches each time. Record the time per search that it reports for each of the sizes.
Now, edit your program so it does the searches using binary search instead of linear search. You should need to change only one line, where you call the search function. Recompile the program, run it on the four *.txt files again, and record the results.
How do the two search methods compare, in terms of their performance as the amount of data increases?
Submit a file containing a copy of your program, and the numeric results you obtained. Just type in a little table of your results at the top of the file, using pico. Submit it to the drop box 232-lab6.
Presbyterian College > Academic Web Server > Jon Bell > CSC 232 > Labs > #6
These pages are maintained by Jon Bell (jbell at presby.edu), who is solely responsible for their content.