Echo BASIC - Indexing 101 2/3

 BBS: Inland Empire Archive
Date: 06-18-92 (06:18)             Number: 808
From: RICHARD VANNOY               Refer#: NONE
  To: ALL                           Recvd: NO  
Subj: Indexing 101 2/3               Conf: (2) Quik_Bas

INDEXING 101, Page 2 of 3

The last thing we have to do before using this data base is
to sort each .NDX file so we can use a binary search to
quickly find what we want.

EMPLNUMBR.NDX will now look like:
Employee Numb.  REC
___-----------  ---
     23          3    (Sorted by Employee number)
     32          2
     46          1

EMPLNAME.NDX will now look like:
BothNames      REC
___------      ---
McGee Bobbie    3     (Sorted by name)
Que Suzie       2
Vannoy Richard  1

So far it seems like a lot of work to do all this for just
three records, but remember that we probably have hundreds,
maybe even thousands of records, so be aware that the
problem of NOT having to search thousands of records for a
name OR an employee number (and BOTH can never be in the same
order in the original .DBF file) is what got us here in the
first place.

In order to understand the speed of an indexing system, it
is necessary to know how a binary search works, so we'll
digress a moment to cover it.  The first requirement of a
binary search is that the records we are searching MUST BE
IN ORDER, either numerically or alphabetically, for the
search to work.  That's why we sorted the .NDX files above.

Let's say we are searching the EMPLNAME.NDX file for me
(Vannoy Richard), all the records are in alphabetical order
by name, there are 1000 records and my name happens to be at
record 687.

First set LOW to 1 and HIGH to the number of records (1000).
the formula for our first LOOK is going to be (HIGH + LOW)\2
or 500.  LOOK at record #500.  Is the name there
(alphabetically) greater than or less than MY name?  Well,
since I'm at 687, the name at 500 has to be LESS.  We now
readjust and recompute LOOK as folows:

IF NameAt LOOK < MyName THEN
   LOW = 500
ELSE
   HIGH = 500
END IF
NewLOOK = (HIGH + LOW)/2

Notice, we have cut the search area IN HALF (thus the name
BINARY) by looking at the file JUST ONCE.  And by refining
our LOOK record to the middle of the new search area which
has just been halved, it won't take long to find the right
record.  The code looks like:

   found = 0
   low = 1
   high = numberOfRecords
   DO
      look = (high + low) \ 2  'Integer division
      GET #3, look, ENAM
      IF ENAM.BothNames < SearchingForName$ THEN
         low = look
      ELSEIF ENAM.BothNames > SearchingForName$ THEN
         high = look
      ELSE
         found = 1
         EXIT DO
      END IF
   LOOP UNTIL low = high
   IF found THEN
      'Process the record
   ELSE
      PRINT "Record not found."
   END IF

So eventually, one of two things has to happen, either the
record is found (ENAM.BothNames = SearchName$) or, if the
record is NOT in the data base, the HIGH and LOW numbers
will run together (Thus the LOOP UNTIL low = high).

Again, this may SEEM like a lot of trouble, because, say
with 100 records, the best case is that the record you want
is at record #50 (1 look!) but the WORST case is that it
will take 7 looks, for an average of about four looks at 100
records to find someone.  (Remember that in a sequential file,
the worst case could be 100 looks for an average of about
50 looks!)

And the beauty of binary search is that YOU CAN DOUBLE THE
NUMBER OF RECORDS AND ONLY INCREASE THE AVERAGE NUMBER OF
LOOKS BY ONE!
<continued next message>

~~
___
> MegaMail 2.1b #0:MilliHelen: Beauty required to launch one ship.
--- Maximus 2.00
 * Origin: D.J.M.BBS (1:202/307)