Echo BASIC - Quik_Bas Faq2.1 5/

 BBS: Inland Empire Archive
Date: 03-28-93 (21:04)             Number: 248
From: QUINN TYLER JACKSON          Refer#: NONE
  To: ALL                           Recvd: NO  
Subj: Quik_Bas Faq2.1       5/       Conf: (2) Quik_Bas

>>> Continued from previous message
A3.1    Well, first let's consider where hashing is in its prime.  (You'll
        pardon that one, okay?)  It is best suited to dynamic list
        generation where items need to be added on a regular basis, but
        not deleted, since deletion is fairly difficult to implement on
        a hashed list.  The main strength of a hashing system is its
        ability to quickly insert new items into the table in such a
        manner that they can be located quickly "on-the-fly."   (See T1.0
        for the average number of collisions before locating the correct
        entry.)  Since the collisions increase with the ratio of full
        buckets to empty buckets, and not with the size of the actual
        table involved, hashing is more efficient than even binary
        searches when lists start to become huge.  Also, because the
        binary method of searching demands a sorted list, insertion of
        items at a later time becomes very cumbersome, even with such
        techniques as the QuickSort and pushing all entries after the
        insertion up by one.  (Try that technique on a list of 30,000
        items, when you only want to add two new items that land near
        the beginning of the list, and you'll know what disk wear and
        tear is all about!)

        Typical applications of the hashing algorithm involve word
        distribution counts, dictionary table generators that involve
        dictionaries that will be added to dynamically, and things of
        that nature.

        Consider the word distribution count problem.  Each word is a
        unique key, and so is perfect for hashing.  Sequential methods
        only work well up until the table has so many entries in it that
        looking up entries in the table becomes a real effort.  Remember,
        words already in the list do not need to be added twice.  Binary
        methods allow for quick searching, but each case of a new word
        being added to the list requires a sort or cumbersome insertion.
        This takes time, if a text file is of even average length.

        Hashing, on the other hand, can increment the count of words
        already in the list, or add new words to the list, without
        the overhead of sorting, sequential searches, or push-type
        insertion.  Also, remember that entry deletion is a problem with
        hashing.  Word distribution counts NEVER require entries to be
        struck, and so are well-suited to hashing systems.

        A good rule of thumb to determine which method may be best for a
        given problem is to cosider the points on this table:

T2.0    List Management System Ratings

                                      List  Type
                        SEQUENTIAL      BINARY          HASHED
                =====================================================
small list                  1              3              2
medium list                 3              1              2
large list                  3              2              1
huge list                   3              2              1

Insertion                   2              3              1
Modification                3              2              1
Deletion                    1              2              3
Browsing                    2              1              3

                     (Systems are ranked first, second, or third)

=======>8 TABLE 2.0 ENDS HERE 8<=========

        Using this table, we can see that the best method for short
        lists that require frequent deletions might be the sequential
        list.  The best for huge lists that require insertions,
        modifications, but not deletions (such as a nodelist index) is
        probably a hashed list.  A hashed list, however, will not do
        much for you if you regularly want to access the next item,
        first item in the list, or last item, such as in a list browsing
        system.  Hashed lists have no logical beginning or end, and
        for this reason, there is no such thing as a "first item" or
        "next item" in a hashed list.  Each entry is a single entity,
        retrievable only as a single entity, with no relation to any
        other entry in the hashed list.  This excludes applications
        that require browsing, as I have mentioned, but is perfect
        for symbol tables, dictionaries, and the like.

Q3.2    This is all pretty new to me.  Give me a practical review.

A3.2    Okay.  In the hashed list there is no sense of sequence in
        the classic sense of the concept.  Items are put into buckets
        based upon the type of calculation I have already discussed, and
        if the bucket is already in use, a new bucket is found according
        to a set system. Therefore, two similar items in a hashed table
        may actually have a physical distance of 500 entries between them.

        A practical example:

        We have a hash table 7 buckets big, and you want to store three
        entries in it, using hashing.  For simplicity, let's just store the
        characters A, B, and C, using their ASCII values as keys.  Their
        buckets would be:

        Item   Formula    Bucket
        =========================
          A    65 MOD 7     2
          B    66 MOD 7     3
          C    67 MOD 7     4

        No collisions have occured here, since this is a simple case.
        Now, let us add just one more item: H.  The first bucket that
        H will request is 72 MOD 2, or 2, which is being used by A.
        This is collision.  Now, we must find an empty bucket, and so,
        we apply a common method to the old bucket: we subtract an
        offset from 2.  The offset is calulated thus:

                Offset = TableSize - Bucket, or
                Offset = 7 -2
                Offset = 5

        Okay, now, whenever a collision occurs, we recalculate a position
        using this formula:

                NewPos = OldPos - Offset
                NewPos = 2 - 5
                NewPos = -3

        In cases where NewPos is less than 0, we then add the table size
        to the interim result:

                NewPos = NewPos + TableSize, or
                NewPos = -3 + 7
                NewPos = 4

        We see that this new bucket, 4, is being used by C, and so we
        have to recalculate the bucket one more time:

                NewPos = OldPos - Offset, or
                NewPos = 4 - 5
                NewPos = -1

                NewPos <0 so
                NewPos = NewPos + TableSize, or
                NewPos = -1 + 7
                NewPos = 6

        We see that 6 is an empty bucket, and therefore, our table now
        looks something like this:
>>> Continued to next message

 * OLX 2.1 TD * Post all questions in this box --> []

--- Maximus 2.01wb
 * Origin: VKUG/VPCC QuickBasic Echo - Richmond, BC (1:153/151)