BBS: Inland Empire Archive Date: 05-09-92 (01:18) Number: 186 From: TOM HAMMOND Refer#: NONE To: LYN BORCHERT Recvd: NO Subj: Hashing 1/9 Conf: (2) Quik_Bas
LB>Is there anyone out there with enough knowledge of LB>hashing and a kind enough LB>heart to give us little guys a little tutorial boost of LB>what and how hashing LB>is used? Lyn: Hope the following helps... thank Mike Avery for this contribution! The entire document is contained in ?? pieces (all of which follow immediately, I hope). Several lines are longer than I think SLMR will accept, these have been WRAPPED (at about 70 characters) with an underscore "_" at the end of the wrapped line. Hope I got 'em all. From : MIKE AVERY Number : 3098 of 3199 To : ROB FLOR Date : 12/29/91 1:03p Subject : Searching/Hashing Reference : NONE Rob, You started this when you quoted me and asked a question.... MA>> . . . I suggested that instead of a linear search he used a hashed MA>>access scheme. RF> What is hashed access? RF> Looking forward to your monograph. This is a bit more than you asked for. I'll talk about searching in general, with special emphasis on linear searches and hashed access. The monograph will be in the next 5 or so messages. I hope it is of some use to you and other members of the echo..... Searching...... Two of the classic dp problems are sorting and searching. Hashed access is a kind of searching. Searching is covered for about 6 to 8 weeks in the "Introduction To Data Structures" course at the University of Texas. Needless to say, the discussion below is a bit shorter. Some source code for sequential access will be included for the algorityhmically impaired. Also, in another set of posts is a hashing demonstration program called "HASH.BAS". The source code in this set of posts is untested, and a veteran QB hacker will say, "GEE - why didn't he.... that's only 1 command in QB!" Maybe it's because I'm still learning QB. Maybe it's because the way I did it makes more sense to me. Maybe it's because I screwed up. In any case, I welcome corrections AND simplifications. The HASH.BAS program is fairly well tested, but far from a finished product - the user interface is not too cool. And lots of info about hashing is displayed that we wouldn't want to see in a program that was intended for a real purpose. Again, comments, corrections, and suggestions are welcome. Being a practical kind of guy, I'll base this discussion around a name and phone number data base. We'll want to be able to add data to and retrieve DATA from the data base, and from time to time, we'll want to get a sorted list of phone numbers. We'll always search by name, and since we are GREAT typists, we won't worry about frills like closest matches. When we type "BIL", that's the dude's full name! Not "BILL" or "WILLIAM". One particular problem I'll largely ignore is the duplicate key question. How many people named "Tom Johnson" have you met? I have met 5 that I remember. Many data bases have trouble dealing with duplicate keys. In Hispanic communities the problem is worse. Based on what some programmers at work who deal with a large customer data base tell me, about 90% of all Hispanic people have one of 10 last names. And there is a tradition of naming the child for the saint of the day, which limits the number of combinations further. One co-worker looked up her name, "Carmen Hernandez ", and got 27 SCREENS of matches. When she redefined the search" parameters for her city and age group, there were still over a dozen people with her name. She felt a bit sad - no longer quite as special as she had felt before. In general, I'll leave the whole duplicate key question as an exercize for the reader in order to keep this already long document from becoming a textbook. All of the discussion that follows applies to either an in memory data base or a disk based data base. The efficiency considerations are more important to a disk based data base, since we have to be concerned with the physical slowness of disk drives compared to the electronic speed of memory. In any case, to keep this from being even longer than it already promises to be, I'll just talk about memory resident data bases. The simplest data base is a linear one, so it's what we usually start with. Somewhere we'll DIM A$(500,1) and A$(?,0) is the name and A$(?,1) is the phone number. We'll get a new name into a variable called TestName$. We'll keep a count of how many entries we've used and write a search like this... TestPoint% = 0 DO WHILE TestPoint% < LastEntry% AND a$(TestPoint%, 0) <> TestName$ TestPoint% = TestPoint + 1 LOOP IF a$(TestPoint%, 0) = TestName$ THEN PRINT a$(TestPoint%, 0); "'s Phone Number Is "; a$(TestPoint%, 1) ELSE BEEP PRINT TestName$; "'s Phone Number Not On File." INPUT "Do you want to add it"; Answer$ 'etc - give the user a chance to add a new entry for the name not on ' file END IF Continues in next file (HASHING - 2 of 9) --- WM v2.00/91-0231 * Origin: The Modem Zone BBS (314) 893-5106 (1:289/2)
Books at Amazon:
Back to BASIC: The History, Corruption, and Future of the Language
Hackers: Heroes of the Computer Revolution (including Tiny BASIC)
Go to: The Story of the Math Majors, Bridge Players, Engineers, Chess Wizards, Scientists and Iconoclasts who were the Hero Programmers of the Software Revolution
The Advent of the Algorithm: The Idea that Rules the World
Moths in the Machine: The Power and Perils of Programming
Mastering Visual Basic .NET