A Block-Sorting Lossless Data Compression Algorithm. by Burrows and Wheeler (1994) (and a second explanation of the algorithm in "Data Compression with the Burrows-Wheeler Transform". by Mark Nelson Jonathan Ledlie (jonathan@eecs) October 4, 2001 As with many brilliant ideas, the Burrows-Wheeler compression algorithm seems like a fairly intuitive idea: some text is more easily compressible than others; see if you can transform the original text into a more compressible version, do the simple and fast compression you already know how to do, and then reverse the process to decompress. The key part of their algorithm is storing as little additional information about the transformation as possible in order to allow the reverse operation to occur. From my understanding of the two papers, they were able to only incur the addition of one additional integer, which described the row of the original string in the alphabetically sorted matrix, which was made up of rotations of the original string. As the Dr Dobbs article pointed out, this matrix is only virtual and actually just consists of pointers to indexes in the original input. The key observation in their transformation is that the first column of the matrix (F) (which would itself yield to the best compression because it is sorted) is easily gotten from the last column (L), and that L plus the index of the original string is both (1) sufficient to recover the original string and (2) itself highly compressible due to the fact that many letters tend to precede the same other letters (e.g. 't' before 'h'). Their algorithm left me wondering how well it would function on non-human generated inputs, e.g. program executables. My guess would be as long as the input showed fairly good regularity, it would yield good compression (which is probably helpful in most compression techniques).