Echo BASIC - Hashing 1/9

 BBS: Inland Empire Archive
Date: 05-09-92 (01:18)             Number: 186
From: TOM HAMMOND                  Refer#: NONE
  To: LYN BORCHERT                  Recvd: NO  
Subj: Hashing     1/9                Conf: (2) Quik_Bas

LB>Is there anyone out there with enough knowledge of
LB>hashing and a kind enough
LB>heart to give us little guys a little tutorial boost of
LB>what and how hashing
LB>is used?

Lyn:

Hope the following helps... thank Mike Avery for this contribution!

The entire document is contained in ?? pieces (all of which follow
immediately, I hope).

Several lines are longer than I think SLMR will accept, these have been
WRAPPED (at about 70 characters) with an underscore "_" at the end of
the wrapped line.  Hope I got 'em all.


From    : MIKE AVERY                  Number    : 3098 of 3199
To      : ROB FLOR                    Date      : 12/29/91 1:03p
Subject : Searching/Hashing           Reference : NONE

Rob,
You started this when you quoted me and asked a question....

MA>> . . . I suggested that instead of a linear search he used a hashed
MA>>access scheme.

 RF>     What is hashed access?
 RF>     Looking forward to your monograph.

This is a bit more than you asked for.  I'll talk about searching in
general, with special emphasis on linear searches and hashed access.

The monograph will be in the next 5 or so messages.  I hope it is of
some use to you and other members of the echo.....

Searching......

Two of the classic dp problems are sorting and searching.  Hashed access
is a kind of searching.  Searching is covered for about 6 to 8 weeks in
the "Introduction To Data Structures" course at the University of Texas.
Needless to say, the discussion below is a bit shorter.  Some source
code for sequential access will be included for the algorityhmically
impaired.  Also, in another set of posts is a hashing demonstration
program called "HASH.BAS".

The source code in this set of posts is untested, and a veteran QB
hacker will say, "GEE - why didn't he.... that's only 1 command in QB!"
Maybe it's because I'm still learning QB.  Maybe it's because the way I
did it makes more sense to me.  Maybe it's because I screwed up.  In any
case, I welcome corrections AND simplifications.

The HASH.BAS program is fairly well tested, but far from a finished
product - the user interface is not too cool.  And lots of info about
hashing is displayed that we wouldn't want to see in a program that was
intended for a real purpose.  Again, comments, corrections, and
suggestions are welcome.

Being a practical kind of guy, I'll base this discussion around a name
and phone number data base.  We'll want to be able to add data to and
retrieve DATA from the data base, and from time to time, we'll want to
get a sorted list of phone numbers.  We'll always search by name, and
since we are GREAT typists, we won't worry about frills like closest
matches.  When we type "BIL", that's the dude's full name!  Not "BILL"
or "WILLIAM".

One particular problem I'll largely ignore is the duplicate key
question.  How many people named "Tom Johnson" have you met?  I have met
5 that I remember.  Many data bases have trouble dealing with duplicate
keys.  In Hispanic communities the problem is worse.  Based on what some
programmers at work who deal with a large customer data base tell me,
about 90% of all Hispanic people have one of 10 last names.  And there
is a tradition of naming the child for the saint of the day, which
limits the number of combinations further.  One co-worker looked up her
name, "Carmen Hernandez ", and got 27 SCREENS of matches.  When she
redefined the search" parameters for her city and age group, there were
still over a dozen people with her name.  She felt a bit sad - no longer
quite as special as she had felt before.  In general, I'll leave the
whole duplicate key question as an exercize for the reader in order to
keep this already long document from becoming a textbook.

All of the discussion that follows applies to either an in memory data
base or a disk based data base.  The efficiency considerations are more
important to a disk based data base, since we have to be concerned with
the physical slowness of disk drives compared to the electronic speed of
memory.  In any case, to keep this from being even longer than it
already promises to be, I'll just talk about memory resident data bases.

The simplest data base is a linear one, so it's what we usually start
with.  Somewhere we'll DIM A$(500,1) and A$(?,0) is the name and A$(?,1)
is the phone number.  We'll get a new name into a variable called
TestName$.  We'll keep a count of how many entries we've used and write
a search like this...

TestPoint% = 0
DO WHILE TestPoint% < LastEntry% AND a$(TestPoint%, 0) <> TestName$
   TestPoint% = TestPoint + 1
LOOP

IF a$(TestPoint%, 0) = TestName$ THEN
   PRINT a$(TestPoint%, 0); "'s Phone Number Is "; a$(TestPoint%, 1)
ELSE
   BEEP
   PRINT TestName$; "'s Phone Number Not On File."
   INPUT "Do you want to add it"; Answer$
   'etc - give the user a chance to add a new entry for the name not on
   '      file
END IF

Continues in next file (HASHING - 2 of 9)


--- WM v2.00/91-0231
 * Origin: The Modem Zone BBS (314) 893-5106 (1:289/2)