BBS: Inland Empire Archive Date: 05-09-92 (01:15) Number: 190 From: TOM HAMMOND Refer#: NONE To: LYN BORCHERT Recvd: NO Subj: HASHING 5/9 Conf: (2) Quik_Bas
Continued from Hashing 4/9 Another problem with hashing is in deleting data. With a hashed data base, we determine whether or not our data is found by comparing it to the contents of the hashed address. Since we terminate our search when we find the data, or an empty array element, we can't just wipe the data. We need to mark the data as having been deleted, so that the search can continue until the data is found or an empty array entry is found. The deleted cell is available for reuse, so if the data is not found elsewhere, reusing the deleted entries' space will save time in the future. Basically, a specific value that will not occur for real data should be placed in the deleted entry. In the unlikely event that all the DATA elements are occupied or marked as deleted, you will be unable to add DATA, as a blank data element does not exist. Since the size of the array, or data file, is a critical part of the hashing algorithym, you can't resize the file with no real thought. A special program needs to be written to perform that action. One way is to just move the data from one hashed array or file to another. Another way is to export the data to a sequential file, create a new file, and then import the data. A disk based system adds a few complications. To access the records randomly, it is necessary to use FIELD statements to define the records, and GET and PUT statements to access it. All records MUST be the same size. And the data file has to be created to its full size before it is used (barring any programmer cleverness). The number of elements and size of the elements is so important that some people use the first record to store information about these characteristics. So the data base opens the file, reads the first record, recovers the info, closes the file, and reopens it as a random file of the appropriate size. This avoids having to hard code this information in the program. Some people at work "simplified" their programs and "saved some time" by hard coding the array size. When the users resized the data base, the programmers got to rewrite the programs for them, which reduced the time savings somewhat, and made the programmers look like jerks. Producing sorted lists, or processing the data sequentially, is more difficult with a hashed data structure than with others, as there is no inherent order. Sorting the data in place would destroy our ability to randomly access it, so the data must be manipulated outside the hashed DATA structure. The big question here is whether to create a sort array or file that contains all the data in the hashed structure, or one that just contains the keys and record numbers. With a small data base with few elements, there is little reason to not copy the data. With a large DATA base, it is more time and space efficient to copy the key and the record number to a new array or file and refer back to tbe main data area to pick up your data. Once this decision is made, we scan the main data area, extract the live data fields, copy them to a new array or file, sort the new array, and then process the sorted data. If we extracted all the data, we could process the sorted list directly. If we extracted just the record numbers, we would need to pick up the record number from the sorted list and then access the data in the main array. This is the approach I took in the demo program as it is better suited to "real-world" problems. And there you have it, an overview of searching, with special emphasis on hashing. MIKE (All done! Can you believe it!!??) Rob, Here's a demo program to explain hashing. Hope it's of some use. Just save the 7 pieces off the BBS, cut and paste them IN ORDER in your favorite editor, and then run it via QuickBasic. (Hope no BBS's chop it up too badly. If it get's mangled, I'll ZIP it and post it somewhere accessible.) Program code follows in next file (HASHING - 6 of 9) --- WM v2.00/91-0231 * Origin: The Modem Zone BBS (314) 893-5106 (1:289/2)
Books at Amazon:
Back to BASIC: The History, Corruption, and Future of the Language
Hackers: Heroes of the Computer Revolution (including Tiny BASIC)
Go to: The Story of the Math Majors, Bridge Players, Engineers, Chess Wizards, Scientists and Iconoclasts who were the Hero Programmers of the Software Revolution
The Advent of the Algorithm: The Idea that Rules the World
Moths in the Machine: The Power and Perils of Programming
Mastering Visual Basic .NET