Quik_Bas Faq2.1 6/

 BBS: Inland Empire Archive
Date: 03-28-93 (21:04)             Number: 249
From: QUINN TYLER JACKSON          Refer#: NONE
  To: ALL                           Recvd: NO  
Subj: Quik_Bas Faq2.1       6/       Conf: (2) Quik_Bas
>>> Continued from previous message
        Entry   Bucket
        ==============
                  1 (empty bucket)
         A        2 (no collisions)
         B        3 (no collisions)
         C        4 (no collisions)
                  5 (empty bucket)
         H        6 (arrived at after two collisions)
                  7 (empty bucket)

        Now, remember from past explanations that searches are conducted
        by comparing each entry to the key until an empty bucket is reached.
        Therefore, to find A in the table, we calculate a bucket of
        65 MOD 7, or 2.  We look in bucket 2, and see that our key of A is
        the same as the table entry A.  We have therefore found our entry in
        one look!  Now, let's look for I.  That's a bit different, since
        it isn't in the list.  How many looks are needed to tell us that
        it isn't?  Well 73 MOD 7 is 3, and we see immediately that bucket
        3 is a B, not an I.  We recalculate the next bucket, and get:

                Offset = 4
                NewPos = (3 - 4) or -1
                Less than 0, so
                NewPos = 6

        Bucket 6 is occupied by an H, and so we calculate the next bucket:

                Offset = 4
                NewPos = (6-4) = 2

        Bucket 2 is occupied by an A, and so:

                NewPos = (2 - 4)
                NewPos = -2 + 7 = 5

        Finally, bucket 5 is empty.  Therefore, since we've arrived at
        an empty bucket BEFORE arriving at I, we can say that I is not
        in the list.  How many steps required?  Four.  Quite a bit of
        overhead on a short list of 7 entries, but consider a list of
        100,000 entries!  Four searches to find an item is fast!

Q3.3    Okay, how about a real working example of hashing in QuickBASIC,
        Quinn?  Theory is fine for CompSci freaks, but I'm a coffee and
        pizza programmer, not an egghead.

A3.3    I mentioned that one perfect use of hashing is for word distribution
        counters.  Here is one from Rich Geldreich that has been tweaked
        by me to account for some things that Rich did not know then
        about hashing table sizes.

S5.0    WORDHASH.BAS [F210S05.BAS]

'WORDHASH.BAS v1.10 By Rich Geldreich 1992
' Modified by Quinn Tyler Jackson for demonstrative purposes.
'
'Uses hashing to quickly tally up the frequency of all of the words in a
'text file. (This program assumes that words are seperated by either tab
'or space characters. Also, all words are converted to uppercase before
'the search.)
'

DEFINT A-Z
DECLARE SUB Show.Counts ()
DECLARE SUB Process.Line (A$)
DECLARE SUB UpdateFreq (A$, KeyIndex)
CONST TRUE = -1, FALSE = 0

DIM SHARED TableSize

Main:
 FileName$ = COMMAND$
 CLS
 LOCATE 1, 1
 PRINT "WORDHASH.BAS By Rich Geldreich 1992"
 PRINT "     Tweaked by Quinn Tyler Jackson 1993"
 OPEN FileName$ FOR INPUT AS #1 LEN = 16384

' In Rich's original version, the TableSize was set at 7000.  My version
' guesses at how large the table needs to be based on this:

' There are 5.5 characters in the average word.  Therefore, divide the
' text file length by 5.5.  For safety, assume that as many as
' half of those will be unique.  In normal text, half the words are in the
' hundred most common list, so this plays it pretty safe!  It will die
' if you take a file that is over about 50% unique words, however!  This
' is for NORMAL text files, not word dictionaries, where all entries are
' unique!
'
'SPLICE IN FROM EARLIER SAMPLE 4.0 IN THIS FAQ
'           VVVVVVVVVVVVV
TableSize = funFirstPrime(LOF(1) * .09)
REDIM SHARED WordTable$(TableSize)
REDIM SHARED Counts(TableSize)
DIM SHARED New.Words

 DO UNTIL EOF(1)
     LINE INPUT #1, A$
     Process.Line A$
     N = N + 1
     LOCATE 3, 1: PRINT N; "lines processed,"; New.Words; "words found"
 LOOP

SUB Process.Line (A$)

    ASEG = SSEG(A$) 'QuickBASIC 4.5 users change this to VARSEG(A$)
    AOFS& = SADD(A$)
    DEF SEG = ASEG + AOFS& \ 16

    AAddress = AOFS& AND 15
    Astart = AAddress
    AEndAddress = AAddress + LEN(A$)

    'get a word
    GOSUB GetAWord
    'update the frequency of the word until there aren't any words left
    DO WHILE Word$ <> ""
        UpdateFreq Word$, KeyIndex
        GOSUB GetAWord
    LOOP

    EXIT SUB

GetAWord:
    Word$ = ""

    'find a character
    P = PEEK(AAddress)
    DO WHILE (P = 32 OR P = 9) AND AAddress <> AEndAddress
        AAddress = AAddress + 1
        P = PEEK(AAddress)
    LOOP

    'if not at end of string then find a space
    IF AAddress <> AEndAddress THEN
        KeyIndex = 0
        GOSUB UpdateKeyIndex

        'remember where the character started
        WordStart = AAddress

        AAddress = AAddress + 1
        P = PEEK(AAddress)
        GOSUB UpdateKeyIndex
        'find the leading space
        DO UNTIL (P = 32 OR P = 9) OR AAddress = AEndAddress
            AAddress = AAddress + 1
            P = PEEK(AAddress)
            GOSUB UpdateKeyIndex
Outer Court
Echo Basic Postings

Books at Amazon:

Back to BASIC: The History, Corruption, and Future of the Language

Hackers: Heroes of the Computer Revolution (including Tiny BASIC)

Go to: The Story of the Math Majors, Bridge Players, Engineers, Chess Wizards, Scientists and Iconoclasts who were the Hero Programmers of the Software Revolution

The Advent of the Algorithm: The Idea that Rules the World

Moths in the Machine: The Power and Perils of Programming

Mastering Visual Basic .NET