Echo BASIC - HASHING 5/9

 BBS: Inland Empire Archive
Date: 05-09-92 (01:15)             Number: 190
From: TOM HAMMOND                  Refer#: NONE
  To: LYN BORCHERT                  Recvd: NO  
Subj: HASHING     5/9                Conf: (2) Quik_Bas

Continued from Hashing 4/9

Another problem with hashing is in deleting data.  With a hashed data
base, we determine whether or not our data is found by comparing it to
the contents of the hashed address.  Since we terminate our search when
we find the data, or an empty array element, we can't just wipe the
data.  We need to mark the data as having been deleted, so that the
search can continue until the data is found or an empty array entry is
found.  The deleted cell is available for reuse, so if the data is not
found elsewhere, reusing the deleted entries' space will save time in
the future.  Basically, a specific value that will not occur for real
data should be placed in the deleted entry.  In the unlikely event that
all the DATA elements are occupied or marked as deleted, you will be
unable to add DATA, as a blank data element does not exist.

Since the size of the array, or data file, is a critical part of the
hashing algorithym, you can't resize the file with no real thought.  A
special program needs to be written to perform that action.  One way is
to just move the data from one hashed array or file to another.  Another
way is to export the data to a sequential file, create a new file, and
then import the data.

A disk based system adds a few complications.  To access the records
randomly, it is necessary to use FIELD statements to define the records,
and GET and PUT statements to access it.  All records MUST be the same
size.  And the data file has to be created to its full size before it is
used (barring any programmer cleverness).  The number of elements and
size of the elements is so important that some people use the first
record to store information about these characteristics.  So the data
base opens the file, reads the first record, recovers the info, closes
the file, and reopens it as a random file of the appropriate size.  This
avoids having to hard code this information in the program.  Some people
at work "simplified" their programs and "saved some time" by hard coding
the array size.  When the users resized the data base, the programmers
got to rewrite the programs for them, which reduced the time savings
somewhat, and made the programmers look like jerks.

Producing sorted lists, or processing the data sequentially, is more
difficult with a hashed data structure than with others, as there is no
inherent order.  Sorting the data in place would destroy our ability to
randomly access it, so the data must be manipulated outside the hashed
DATA structure.  The big question here is whether to create a sort array
or file that contains all the data in the hashed structure, or one that
just contains the keys and record numbers.  With a small data base with
few elements, there is little reason to not copy the data.  With a large
DATA base, it is more time and space efficient to copy the key and the
record number to a new array or file and refer back to tbe main data
area to pick up your data.  Once this decision is made, we scan the main
data area, extract the live data fields, copy them to a new array or
file, sort the new array, and then process the sorted data.

If we extracted all the data, we could process the sorted list directly.
If we extracted just the record numbers, we would need to pick up the
record number from the sorted list and then access the data in the main
array.  This is the approach I took in the demo program as it is better
suited to "real-world" problems.

And there you have it, an overview of searching, with special emphasis
on hashing.

MIKE
(All done!  Can you believe it!!??)

Rob,

Here's a demo program to explain hashing.  Hope it's of some use.  Just
save the 7 pieces off the BBS, cut and paste them IN ORDER in your
favorite editor, and then run it via QuickBasic.  (Hope no BBS's chop it
up too badly.  If it get's mangled, I'll ZIP it and post it somewhere
accessible.)

Program code follows in next file (HASHING - 6 of 9)


--- WM v2.00/91-0231
 * Origin: The Modem Zone BBS (314) 893-5106 (1:289/2)