BBS: Inland Empire Archive Date: 03-28-93 (21:04) Number: 249 From: QUINN TYLER JACKSON Refer#: NONE To: ALL Recvd: NO Subj: Quik_Bas Faq2.1 6/ Conf: (2) Quik_Bas
>>> Continued from previous message Entry Bucket ============== 1 (empty bucket) A 2 (no collisions) B 3 (no collisions) C 4 (no collisions) 5 (empty bucket) H 6 (arrived at after two collisions) 7 (empty bucket) Now, remember from past explanations that searches are conducted by comparing each entry to the key until an empty bucket is reached. Therefore, to find A in the table, we calculate a bucket of 65 MOD 7, or 2. We look in bucket 2, and see that our key of A is the same as the table entry A. We have therefore found our entry in one look! Now, let's look for I. That's a bit different, since it isn't in the list. How many looks are needed to tell us that it isn't? Well 73 MOD 7 is 3, and we see immediately that bucket 3 is a B, not an I. We recalculate the next bucket, and get: Offset = 4 NewPos = (3 - 4) or -1 Less than 0, so NewPos = 6 Bucket 6 is occupied by an H, and so we calculate the next bucket: Offset = 4 NewPos = (6-4) = 2 Bucket 2 is occupied by an A, and so: NewPos = (2 - 4) NewPos = -2 + 7 = 5 Finally, bucket 5 is empty. Therefore, since we've arrived at an empty bucket BEFORE arriving at I, we can say that I is not in the list. How many steps required? Four. Quite a bit of overhead on a short list of 7 entries, but consider a list of 100,000 entries! Four searches to find an item is fast! Q3.3 Okay, how about a real working example of hashing in QuickBASIC, Quinn? Theory is fine for CompSci freaks, but I'm a coffee and pizza programmer, not an egghead. A3.3 I mentioned that one perfect use of hashing is for word distribution counters. Here is one from Rich Geldreich that has been tweaked by me to account for some things that Rich did not know then about hashing table sizes. S5.0 WORDHASH.BAS [F210S05.BAS] 'WORDHASH.BAS v1.10 By Rich Geldreich 1992 ' Modified by Quinn Tyler Jackson for demonstrative purposes. ' 'Uses hashing to quickly tally up the frequency of all of the words in a 'text file. (This program assumes that words are seperated by either tab 'or space characters. Also, all words are converted to uppercase before 'the search.) ' DEFINT A-Z DECLARE SUB Show.Counts () DECLARE SUB Process.Line (A$) DECLARE SUB UpdateFreq (A$, KeyIndex) CONST TRUE = -1, FALSE = 0 DIM SHARED TableSize Main: FileName$ = COMMAND$ CLS LOCATE 1, 1 PRINT "WORDHASH.BAS By Rich Geldreich 1992" PRINT " Tweaked by Quinn Tyler Jackson 1993" OPEN FileName$ FOR INPUT AS #1 LEN = 16384 ' In Rich's original version, the TableSize was set at 7000. My version ' guesses at how large the table needs to be based on this: ' There are 5.5 characters in the average word. Therefore, divide the ' text file length by 5.5. For safety, assume that as many as ' half of those will be unique. In normal text, half the words are in the ' hundred most common list, so this plays it pretty safe! It will die ' if you take a file that is over about 50% unique words, however! This ' is for NORMAL text files, not word dictionaries, where all entries are ' unique! ' 'SPLICE IN FROM EARLIER SAMPLE 4.0 IN THIS FAQ ' VVVVVVVVVVVVV TableSize = funFirstPrime(LOF(1) * .09) REDIM SHARED WordTable$(TableSize) REDIM SHARED Counts(TableSize) DIM SHARED New.Words DO UNTIL EOF(1) LINE INPUT #1, A$ Process.Line A$ N = N + 1 LOCATE 3, 1: PRINT N; "lines processed,"; New.Words; "words found" LOOP SUB Process.Line (A$) ASEG = SSEG(A$) 'QuickBASIC 4.5 users change this to VARSEG(A$) AOFS& = SADD(A$) DEF SEG = ASEG + AOFS& \ 16 AAddress = AOFS& AND 15 Astart = AAddress AEndAddress = AAddress + LEN(A$) 'get a word GOSUB GetAWord 'update the frequency of the word until there aren't any words left DO WHILE Word$ <> "" UpdateFreq Word$, KeyIndex GOSUB GetAWord LOOP EXIT SUB GetAWord: Word$ = "" 'find a character P = PEEK(AAddress) DO WHILE (P = 32 OR P = 9) AND AAddress <> AEndAddress AAddress = AAddress + 1 P = PEEK(AAddress) LOOP 'if not at end of string then find a space IF AAddress <> AEndAddress THEN KeyIndex = 0 GOSUB UpdateKeyIndex 'remember where the character started WordStart = AAddress AAddress = AAddress + 1 P = PEEK(AAddress) GOSUB UpdateKeyIndex 'find the leading space DO UNTIL (P = 32 OR P = 9) OR AAddress = AEndAddress AAddress = AAddress + 1 P = PEEK(AAddress) GOSUB UpdateKeyIndex
Books at Amazon:
Back to BASIC: The History, Corruption, and Future of the Language
Hackers: Heroes of the Computer Revolution (including Tiny BASIC)
Go to: The Story of the Math Majors, Bridge Players, Engineers, Chess Wizards, Scientists and Iconoclasts who were the Hero Programmers of the Software Revolution
The Advent of the Algorithm: The Idea that Rules the World
Moths in the Machine: The Power and Perils of Programming
Mastering Visual Basic .NET