Shannon–Fano coding

In the field of data compression, Shannon–Fano coding, named after Claude Shannon and Robert Fano, is one of two related techniques for constructing a prefix code based on a set of symbols and their probabilities (estimated or measured).

Shannon–Fano codes are suboptimal in the sense that they do not always achieve the lowest possible expected codeword length, as Huffman coding does.^[1] However, Shannon–Fano codes have an expected codeword length within 1 bit of optimal. Fano's method usually produces encoding with shorter expected lengths than Shannon's method. However, Shannon's method is easier to analyse theoretically.

Shannon–Fano coding should not be confused with Shannon–Fano–Elias coding (also known as Elias coding), the precursor to arithmetic coding.

Fano's code: binary splitting[edit]

Outline of Fano's code[edit]

In Fano's method, the symbols are arranged in order from most probable to least probable, and then divided into two sets whose total probabilities are as close as possible to being equal. All symbols then have the first digits of their codes assigned; symbols in the first set receive "0" and symbols in the second set receive "1". As long as any sets with more than one member remain, the same process is repeated on those sets, to determine successive digits of their codes. When a set has been reduced to one symbol this means the symbol's code is complete and will not form the prefix of any other symbol's code.

The algorithm produces fairly efficient variable-length encodings; when the two smaller sets produced by a partitioning are in fact of equal probability, the one bit of information used to distinguish them is used most efficiently. Unfortunately, Shannon–Fano coding does not always produce optimal prefix codes; the set of probabilities {0.35, 0.17, 0.17, 0.16, 0.15} is an example of one that will be assigned non-optimal codes by Shannon–Fano coding.

Fano's version of Shannon–Fano coding is used in the IMPLODE compression method, which is part of the ZIP file format.^[11]

The Shannon–Fano tree[edit]

A Shannon–Fano tree is built according to a specification designed to define an effective code table. The actual algorithm is simple:

Fano, R.M. (1949). . Technical Report No. 65. Cambridge (Mass.), USA: Research Laboratory of Electronics at MIT.

"The transmission of information"

Shannon, C.E. (July 1948). . Bell System Technical Journal. 27: 379–423. doi:10.1002/j.1538-7305.1948.tb01338.x.

Shannon–Fano coding

Fano's code: binary splitting[edit]

Outline of Fano's code[edit]

The Shannon–Fano tree[edit]

"The transmission of information"

"A Mathematical Theory of Communication [reprint with corrections]"