BBS: Inland Empire Archive Date: 09-17-82 (10:59) Number: 239 From: RICH GELDREICH Refer#: NONE To: MARK BARBA Recvd: NO Subj: Data Compression Conf: (2) Quik_Bas
MB> Hi, I would like to know if anyone out there could give me an example of
MB>how to use Data Compression. I never figured out what goes on during the
MB>data compression and could never figure out how PkZip/ARJ and LZH could
MB>compress a text file from 100k to 20k.... if anyone would have to time to
MB>explain the ways of compressing data I would appreciate it.
Aaaa... My favorite subject (I've made many data
compression programs in QuickBASIC, and I've posted some of the
simpler ones). Really, I'll come strait out and say it: the theory
behind data compression is simple, but the algorithms (usually)
aren't.
There are two main types of data compression techniques.
The first, and oldest type, is statisical modeling. Put simply,
this type of compression works by assigning very small codes to the
characters which are used most in a file, and assigning large codes
to the characters which aren't used so often. In other words,
the characters with the most priority have the smallest codes, so
compression results. There are many algorithms available which can
do this: Shannon-Fano (sp?) coding, Huffman coding, and arithmitic
coding. Shannon-Fano is the oldest, and simplest. Huffman coding
came after Shannon-Fano, and is very popular. Arithmitic coding
just recently became popular in the last few years, because of
hardware requirements...
The other main type is sliding window compression.
Basically, this type of compression gave birth back in 1977, by a
paper released by Lempel and Ziv (forget the first names!).
Basically, this type of compression replaces groups of characters
with some type of pointer into the previously seen text. Repeated
phrases, which occur often in files, can be coded into only a few
bits, so compression results. If the phrase isn't in the last few
thousand characters or so seen, it's simply sent normally.
Around a year later, in 1978, Lempel & Ziv released another
paper, describing a variant to their earlier algorithm. A few years
later, in the very early 80's(1982?), Terry Welch took this
algorithm and published his variation, which he called LZW
compression. This became a very popular type of compression, in the
Unix and MS-DOS worlds. The Unix "compress" program, ARC, GIF,
TIFF, PKZip's shrinking, and a zillion other programs use some
variation of LZW compression...
But, someone from the peanut gallery saw that LZ77 could do
better than LZW, so LZW bit the dust when a variant of LZ77
compression, called LZSS, gained popularity.
All of the popular data compression programs today use a
combination of two distinct compression algorithms. The first is a
variant of LZ77, called LZSS. LZSS is basically an opimization of
LZ77, in speed and efficiency(there's more to it, of course, but you
get the idea). After the file is codes with LZSS, the LZSS output
is then coded with a statistical modeling scheme, such as
Shannon-Fano (like PKZip) or Huffman (like ARJ and LHA) coding, to
achieve "state of the art" compression.
This was all off the top of my head, but you should
hopefully get the general idea of it.
Rich
---
* SLMR 2.0 * Nothing is so smiple that it can't get screwed up.
--- MsgToss 2.0b
* Origin: Computer Co-Op - Voorhees, NJ | Ted Hare (1:266/29)

Books at Amazon:
Back to BASIC: The History, Corruption, and Future of the Language
Hackers: Heroes of the Computer Revolution (including Tiny BASIC)
Go to: The Story of the Math Majors, Bridge Players, Engineers, Chess Wizards, Scientists and Iconoclasts who were the Hero Programmers of the Software Revolution
The Advent of the Algorithm: The Idea that Rules the World
Moths in the Machine: The Power and Perils of Programming
Mastering Visual Basic .NET