[Glean] Entropy Coding and Tunstall Coding
Entropy Coding and Tunstall Coding
Entropy Coding
An entropy encoding is a lossless data compression scheme that is independent of the specific characteristics of the medium.
Two of the most common entropy encoding techniques are Huffman coding and arithmetic coding.
Tunstall Coding1
- Select the symbol with largest probability in the table. Call it
S
. - Remove
S
and add theN
substringsSx
wherex
goes over all theN
symbols. This step increase the table size byN−1
symbols (some of them may be substrings). Thus, after iterationk
, the table size will beN + k(N − 1)
elements. - If
N + k(N − 1) ≤ 2n
, perform another iteration.
An important property of the Tunstall codes is their reliability. If one bit becomes corrupt, only one code will get bad. Normally, variable-size codes do not feature any reliability.
-
Salomon D. Data compression: the complete reference[M]. Springer Science & Business Media, 2004. ↩