BBS: Inland Empire Archive Date: 05-09-92 (01:14) Number: 188 From: TOM HAMMOND Refer#: NONE To: LYN BORCHERT Recvd: NO Subj: HASHING 3/9 Conf: (2) Quik_Bas
Continued from Hashing 2/9 Balanced? Perfect? What's that all about? What if the data is not added in an order that leads to the tree above? What if the data is added alphabetically, except for Hank? The resulting tree looks like this... Aaron / \ Bill / \ George / \ Leonard / \ Margaret / \ MIKE / \ Sam / \ Willard / \ This is a worst case tree. The search time is proportional to (Number of entries/2) * processing overhead, like a linear search, except that the processing overhead is a good bit higher. When we add "Hank", he would be entered into the "Less than Leonard" slot. A tree is balanced when you look at it and it looks balanced. Loosely speaking, it is "perfect" when the tree extends down no more levels than those needed to represent the tree. In our test data base case, with 8 test entries, we see that we need 4 levels to represent the tree. The number that represents this is, I think, INT(LOG(BASE 2)N)+1. The log, base 2, of 8, should be 3. Adding 1 to this gives 4. Please note that in both the trees shown here, and in all trees, there is only a single spot where a new entry should be placed. One sticky spot in implementing a tree is deleting elements. It is necessary to adjust the tree so that all the elements are correctly placed after the deletion. There are some simple algorithyms for this, but the book I like for this is at work. From memory though.... If the item you want to delete is at the end of the tree (has no children), there is no problem, just delete it. Examples of this in the first tree I drew were Aaron, George, Margaret, and Sam. In the second tree, only Willard qualifies. If the item you want to delete has only 1 child, there is no problem, just adjust the pointer of the parent to the item you wish to delete to point to the only child of the item you wish to delete. In the case of the first tree, only Willard qualifies. And to delete Willard, you would need to make Mike's greater than pointer point to Sam. In the second tree I drew, ANY item can be deleted this way. The last case is where the item to be delete has more than 1 child. There is a simple algorithym that allows this to be done with something like 2 changes. But like I said, the book's at work. All of the deletion code can become more difficult if you are trying to maintain a balanced tree. The methods for searching the tree to produce an in-order, or sorted, list are pretty straight forward. In psudeo-code.... Look for the lowest entry Repeat display the current entry search for the next higher entry until all nodes have been visited Hopefully someone who has used binary trees more than I have will be kind enough to post some real source code.... And on to hashing....Hashing has as it's fundamental principle that the DATA should be transformed into an index into the array. So, I take a name, transform it, come up with a number, and if all goes well, that number is the index of the array element where the name and phone number are. The placement of data in the array seems as random as the input data. Most people immediately think that they need as many array elements as there are possible keys to make this work. With social security numbers, that suggests that 999,999,999 (or so) entries are needed. This probably wouldn't fit into your computer's RAM this year, so hashing suddenly seems like a pipe dream. However, it is understood that the transform need not produce a unique address, and that data "collisions" will occur where several (or many) keys will result in the same index. This is taken care of by one of several methods we'll discuss later. You can size the array in a fashion that makes sense to you. In practice, with a well chosen transformation, the search time is proportional to the "fullness" of the array. That is, we see that we usually find or data in 1 or 2 searches, regardless of how much data we have to search, until the data base passes a critical threshold (usually around 75 to 80% full), at which point things fall apart rapidly. This approach is very attractive when using external storage, as it minimizes disk delays. So you can start with a data base that contains the names and phone numbers of your 20 closest friends and then expand to a data base that contains the names and phone numbers of everyone on Earth, and in theory the search time will be the same - based solely on how full the DATA base is. (Of course, you might have to resize your data base from time to time, and you might need a larger hard disk to allow the data to fit.) Continues in next file (HASHING - 4 of 9) --- WM v2.00/91-0231 * Origin: The Modem Zone BBS (314) 893-5106 (1:289/2)
Books at Amazon:
Back to BASIC: The History, Corruption, and Future of the Language
Hackers: Heroes of the Computer Revolution (including Tiny BASIC)
Go to: The Story of the Math Majors, Bridge Players, Engineers, Chess Wizards, Scientists and Iconoclasts who were the Hero Programmers of the Software Revolution
The Advent of the Algorithm: The Idea that Rules the World
Moths in the Machine: The Power and Perils of Programming
Mastering Visual Basic .NET