BBS: Inland Empire Archive Date: 03-28-93 (21:04) Number: 248 From: QUINN TYLER JACKSON Refer#: NONE To: ALL Recvd: NO Subj: Quik_Bas Faq2.1 5/ Conf: (2) Quik_Bas
>>> Continued from previous message
A3.1 Well, first let's consider where hashing is in its prime. (You'll
pardon that one, okay?) It is best suited to dynamic list
generation where items need to be added on a regular basis, but
not deleted, since deletion is fairly difficult to implement on
a hashed list. The main strength of a hashing system is its
ability to quickly insert new items into the table in such a
manner that they can be located quickly "on-the-fly." (See T1.0
for the average number of collisions before locating the correct
entry.) Since the collisions increase with the ratio of full
buckets to empty buckets, and not with the size of the actual
table involved, hashing is more efficient than even binary
searches when lists start to become huge. Also, because the
binary method of searching demands a sorted list, insertion of
items at a later time becomes very cumbersome, even with such
techniques as the QuickSort and pushing all entries after the
insertion up by one. (Try that technique on a list of 30,000
items, when you only want to add two new items that land near
the beginning of the list, and you'll know what disk wear and
tear is all about!)
Typical applications of the hashing algorithm involve word
distribution counts, dictionary table generators that involve
dictionaries that will be added to dynamically, and things of
that nature.
Consider the word distribution count problem. Each word is a
unique key, and so is perfect for hashing. Sequential methods
only work well up until the table has so many entries in it that
looking up entries in the table becomes a real effort. Remember,
words already in the list do not need to be added twice. Binary
methods allow for quick searching, but each case of a new word
being added to the list requires a sort or cumbersome insertion.
This takes time, if a text file is of even average length.
Hashing, on the other hand, can increment the count of words
already in the list, or add new words to the list, without
the overhead of sorting, sequential searches, or push-type
insertion. Also, remember that entry deletion is a problem with
hashing. Word distribution counts NEVER require entries to be
struck, and so are well-suited to hashing systems.
A good rule of thumb to determine which method may be best for a
given problem is to cosider the points on this table:
T2.0 List Management System Ratings
List Type
SEQUENTIAL BINARY HASHED
=====================================================
small list 1 3 2
medium list 3 1 2
large list 3 2 1
huge list 3 2 1
Insertion 2 3 1
Modification 3 2 1
Deletion 1 2 3
Browsing 2 1 3
(Systems are ranked first, second, or third)
=======>8 TABLE 2.0 ENDS HERE 8<=========
Using this table, we can see that the best method for short
lists that require frequent deletions might be the sequential
list. The best for huge lists that require insertions,
modifications, but not deletions (such as a nodelist index) is
probably a hashed list. A hashed list, however, will not do
much for you if you regularly want to access the next item,
first item in the list, or last item, such as in a list browsing
system. Hashed lists have no logical beginning or end, and
for this reason, there is no such thing as a "first item" or
"next item" in a hashed list. Each entry is a single entity,
retrievable only as a single entity, with no relation to any
other entry in the hashed list. This excludes applications
that require browsing, as I have mentioned, but is perfect
for symbol tables, dictionaries, and the like.
Q3.2 This is all pretty new to me. Give me a practical review.
A3.2 Okay. In the hashed list there is no sense of sequence in
the classic sense of the concept. Items are put into buckets
based upon the type of calculation I have already discussed, and
if the bucket is already in use, a new bucket is found according
to a set system. Therefore, two similar items in a hashed table
may actually have a physical distance of 500 entries between them.
A practical example:
We have a hash table 7 buckets big, and you want to store three
entries in it, using hashing. For simplicity, let's just store the
characters A, B, and C, using their ASCII values as keys. Their
buckets would be:
Item Formula Bucket
=========================
A 65 MOD 7 2
B 66 MOD 7 3
C 67 MOD 7 4
No collisions have occured here, since this is a simple case.
Now, let us add just one more item: H. The first bucket that
H will request is 72 MOD 2, or 2, which is being used by A.
This is collision. Now, we must find an empty bucket, and so,
we apply a common method to the old bucket: we subtract an
offset from 2. The offset is calulated thus:
Offset = TableSize - Bucket, or
Offset = 7 -2
Offset = 5
Okay, now, whenever a collision occurs, we recalculate a position
using this formula:
NewPos = OldPos - Offset
NewPos = 2 - 5
NewPos = -3
In cases where NewPos is less than 0, we then add the table size
to the interim result:
NewPos = NewPos + TableSize, or
NewPos = -3 + 7
NewPos = 4
We see that this new bucket, 4, is being used by C, and so we
have to recalculate the bucket one more time:
NewPos = OldPos - Offset, or
NewPos = 4 - 5
NewPos = -1
NewPos <0 so
NewPos = NewPos + TableSize, or
NewPos = -1 + 7
NewPos = 6
We see that 6 is an empty bucket, and therefore, our table now
looks something like this:
>>> Continued to next message
* OLX 2.1 TD * Post all questions in this box --> []
--- Maximus 2.01wb
* Origin: VKUG/VPCC QuickBasic Echo - Richmond, BC (1:153/151)

Books at Amazon:
Back to BASIC: The History, Corruption, and Future of the Language
Hackers: Heroes of the Computer Revolution (including Tiny BASIC)
Go to: The Story of the Math Majors, Bridge Players, Engineers, Chess Wizards, Scientists and Iconoclasts who were the Hero Programmers of the Software Revolution
The Advent of the Algorithm: The Idea that Rules the World
Moths in the Machine: The Power and Perils of Programming
Mastering Visual Basic .NET