LZW data compression Mark Nelson (in Dr Dobbs) Jonathan Ledlie (jonathan@eecs) October 16, 2001 The two main variants of Lempel-Ziv compression are LZ77 and LZ78. Different compression programs use different forms: Zip uses the former, Unix compress the later. Why? Patent disputes. According to an article I found on the Web: "Variations have to do with factors such as establishing how many bits to use for certain boundaries." Although the Dr. Dobbs article does not say this explicitly, perhaps this is the 12, 13, and 14 bit code sizes discussed in "The Implementation Blues" section. This seems like little to base a patent on, however. Another potential difference between the variants is the use of variable length codes, which programs like ARC use. I find it interesting that the Burrows-Wheeler seems so much more elegant and yet both apparently can compress to near optimal (near the entropy). LZ needs long inputs to achieve this, however. The Dr Dobbs article unfortunately did not include any performance information. Note: I found an excellent short overview on this topic which included an applet showing how the compression works (I also found the page with the new search engine, Vivisimo, from question #1 of the homework): http://www.CS.McGill.CA/~cs251/OldCourses/1997/topic23/