BBS: Inland Empire Archive Date: 06-18-92 (06:18) Number: 808 From: RICHARD VANNOY Refer#: NONE To: ALL Recvd: NO Subj: Indexing 101 2/3 Conf: (2) Quik_Bas
INDEXING 101, Page 2 of 3 The last thing we have to do before using this data base is to sort each .NDX file so we can use a binary search to quickly find what we want. EMPLNUMBR.NDX will now look like: Employee Numb. REC ___----------- --- 23 3 (Sorted by Employee number) 32 2 46 1 EMPLNAME.NDX will now look like: BothNames REC ___------ --- McGee Bobbie 3 (Sorted by name) Que Suzie 2 Vannoy Richard 1 So far it seems like a lot of work to do all this for just three records, but remember that we probably have hundreds, maybe even thousands of records, so be aware that the problem of NOT having to search thousands of records for a name OR an employee number (and BOTH can never be in the same order in the original .DBF file) is what got us here in the first place. In order to understand the speed of an indexing system, it is necessary to know how a binary search works, so we'll digress a moment to cover it. The first requirement of a binary search is that the records we are searching MUST BE IN ORDER, either numerically or alphabetically, for the search to work. That's why we sorted the .NDX files above. Let's say we are searching the EMPLNAME.NDX file for me (Vannoy Richard), all the records are in alphabetical order by name, there are 1000 records and my name happens to be at record 687. First set LOW to 1 and HIGH to the number of records (1000). the formula for our first LOOK is going to be (HIGH + LOW)\2 or 500. LOOK at record #500. Is the name there (alphabetically) greater than or less than MY name? Well, since I'm at 687, the name at 500 has to be LESS. We now readjust and recompute LOOK as folows: IF NameAt LOOK < MyName THEN LOW = 500 ELSE HIGH = 500 END IF NewLOOK = (HIGH + LOW)/2 Notice, we have cut the search area IN HALF (thus the name BINARY) by looking at the file JUST ONCE. And by refining our LOOK record to the middle of the new search area which has just been halved, it won't take long to find the right record. The code looks like: found = 0 low = 1 high = numberOfRecords DO look = (high + low) \ 2 'Integer division GET #3, look, ENAM IF ENAM.BothNames < SearchingForName$ THEN low = look ELSEIF ENAM.BothNames > SearchingForName$ THEN high = look ELSE found = 1 EXIT DO END IF LOOP UNTIL low = high IF found THEN 'Process the record ELSE PRINT "Record not found." END IF So eventually, one of two things has to happen, either the record is found (ENAM.BothNames = SearchName$) or, if the record is NOT in the data base, the HIGH and LOW numbers will run together (Thus the LOOP UNTIL low = high). Again, this may SEEM like a lot of trouble, because, say with 100 records, the best case is that the record you want is at record #50 (1 look!) but the WORST case is that it will take 7 looks, for an average of about four looks at 100 records to find someone. (Remember that in a sequential file, the worst case could be 100 looks for an average of about 50 looks!) And the beauty of binary search is that YOU CAN DOUBLE THE NUMBER OF RECORDS AND ONLY INCREASE THE AVERAGE NUMBER OF LOOKS BY ONE! <continued next message> ~~ ___ > MegaMail 2.1b #0:MilliHelen: Beauty required to launch one ship. --- Maximus 2.00 * Origin: D.J.M.BBS (1:202/307)
Books at Amazon:
Back to BASIC: The History, Corruption, and Future of the Language
Hackers: Heroes of the Computer Revolution (including Tiny BASIC)
Go to: The Story of the Math Majors, Bridge Players, Engineers, Chess Wizards, Scientists and Iconoclasts who were the Hero Programmers of the Software Revolution
The Advent of the Algorithm: The Idea that Rules the World
Moths in the Machine: The Power and Perils of Programming
Mastering Visual Basic .NET