BBS: Inland Empire Archive Date: 09-17-82 (10:59) Number: 239 From: RICH GELDREICH Refer#: NONE To: MARK BARBA Recvd: NO Subj: Data Compression Conf: (2) Quik_Bas
MB> Hi, I would like to know if anyone out there could give me an example of MB>how to use Data Compression. I never figured out what goes on during the MB>data compression and could never figure out how PkZip/ARJ and LZH could MB>compress a text file from 100k to 20k.... if anyone would have to time to MB>explain the ways of compressing data I would appreciate it. Aaaa... My favorite subject (I've made many data compression programs in QuickBASIC, and I've posted some of the simpler ones). Really, I'll come strait out and say it: the theory behind data compression is simple, but the algorithms (usually) aren't. There are two main types of data compression techniques. The first, and oldest type, is statisical modeling. Put simply, this type of compression works by assigning very small codes to the characters which are used most in a file, and assigning large codes to the characters which aren't used so often. In other words, the characters with the most priority have the smallest codes, so compression results. There are many algorithms available which can do this: Shannon-Fano (sp?) coding, Huffman coding, and arithmitic coding. Shannon-Fano is the oldest, and simplest. Huffman coding came after Shannon-Fano, and is very popular. Arithmitic coding just recently became popular in the last few years, because of hardware requirements... The other main type is sliding window compression. Basically, this type of compression gave birth back in 1977, by a paper released by Lempel and Ziv (forget the first names!). Basically, this type of compression replaces groups of characters with some type of pointer into the previously seen text. Repeated phrases, which occur often in files, can be coded into only a few bits, so compression results. If the phrase isn't in the last few thousand characters or so seen, it's simply sent normally. Around a year later, in 1978, Lempel & Ziv released another paper, describing a variant to their earlier algorithm. A few years later, in the very early 80's(1982?), Terry Welch took this algorithm and published his variation, which he called LZW compression. This became a very popular type of compression, in the Unix and MS-DOS worlds. The Unix "compress" program, ARC, GIF, TIFF, PKZip's shrinking, and a zillion other programs use some variation of LZW compression... But, someone from the peanut gallery saw that LZ77 could do better than LZW, so LZW bit the dust when a variant of LZ77 compression, called LZSS, gained popularity. All of the popular data compression programs today use a combination of two distinct compression algorithms. The first is a variant of LZ77, called LZSS. LZSS is basically an opimization of LZ77, in speed and efficiency(there's more to it, of course, but you get the idea). After the file is codes with LZSS, the LZSS output is then coded with a statistical modeling scheme, such as Shannon-Fano (like PKZip) or Huffman (like ARJ and LHA) coding, to achieve "state of the art" compression. This was all off the top of my head, but you should hopefully get the general idea of it. Rich --- * SLMR 2.0 * Nothing is so smiple that it can't get screwed up. --- MsgToss 2.0b * Origin: Computer Co-Op - Voorhees, NJ | Ted Hare (1:266/29)
Books at Amazon:
Back to BASIC: The History, Corruption, and Future of the Language
Hackers: Heroes of the Computer Revolution (including Tiny BASIC)
Go to: The Story of the Math Majors, Bridge Players, Engineers, Chess Wizards, Scientists and Iconoclasts who were the Hero Programmers of the Software Revolution
The Advent of the Algorithm: The Idea that Rules the World
Moths in the Machine: The Power and Perils of Programming
Mastering Visual Basic .NET