The idea of image compression algorithms is similar to the one behind the Burrows-Wheeler-transform. The text to be compressed is transformed to a format which is suitable for application of the techniques presented in the previous sections, such as Huffman coding or arithmetic coding. There are several procedures based on the type of image (for instance, black/white, greyscale or colour image) or compression (lossless or lossy). We shall present the basic steps—representation of data, discrete cosine transform, quantisation, coding—of lossy image compression procedures like the standard JPEG.
A greyscale image is represented as a two-dimensional array , where each entry represents the intensity (or brightness) at position of the image. Each is either a signed or an unsigned -bit integers, i. e., or .
A position in a colour image is usually represented by three greyscale values , , and per position corresponding to the intensity of the primary colours red, green and blue.
In order to compress the image, the three arrays (or channels) , , are first converted to the luminance/chrominance space by the -transform (performed entry–wise)
is the luminance or intensity channel, where the coefficients weighting the colours have been found empirically and represent the best possible approximation of the intensity as perceived by the human eye. The chrominance channels and contain the colour information on red and blue as the differences from . The information on green is obtained as big part in the luminance .
A first compression for colour images commonly is already obtained after application of the -transform by removing irrelevant information. Since the human eye is less sensitive to rapid colour changes than to changes in intensity, the resolution of the two chrominance channels and is reduced by a factor of in both vertical and horizontal direction, which results after sub-sampling in arrays of of the original size.
The arrays then are subdivided into blocks, on which successively the actual (lossy) data compression procedure is applied.
Let us consider the following example based on a real image, on which the steps of compression will be illustrated. Assume that the block of -bit unsigned integers below is obtained as a part of an image.
Each block , say, is transformed into a new block . There are several possible transforms, usually the discrete cosine transform is applied, which here obeys the formula
The cosine transform is applied after shifting the unsigned integers to signed integers by subtraction of .
DODCT - coefficient of matrix 4
The coefficients need not be calculated according to the formula above. They can also be obtained via a related Fourier transform (see Exercises) such that a Fast Fourier Transform may be applied. JPEG also supports wavelet transforms, which may replace the discrete cosine transform here.
The discrete cosine transform can be inverted via
where and are normalisation constants.
In our example, the transformed block is
where the entries are rounded.
The discrete cosine transform is closely related to the discrete Fourier transform and similarly maps signals to frequencies. Removing higher frequencies results in a less sharp image, an effect that is tolerated, such that higher frequencies are stored with less accuracy.
Of special importance is the entry , which can be interpreted as a measure for the intensity of the whole block.
The discrete cosine transform maps integers to real numbers, which in each case have to be rounded to be representable. Of course, this rounding already results in a loss of information. However, the transformed block will now be much easier to manipulate. A quantisation takes place, which maps the entries of to integers by division by the corresponding entry in a luminance quantisation matrix . In our example we use
The quantisation matrix has to be carefully chosen in order to leave the image at highest possible quality. Quantisation is the lossy part of the compression procedure. The idea is to remove information which should not be “visually significant”. Of course, at this point there is a tradeoff between the compression rate and the quality of the decoded image. So, in JPEG the quantisation table is not included into the standard but must be specified (and hence be encoded).
This quantisation transforms block to a new block with , where is the closest integer to . This block will finally be encoded. Observe that in the transformed block besides the entry all other entries are relatively small numbers, which has the effect that mainly consists of s .
Coefficient , in this case , deserves special consideration. It is called DC term (direct current), while the other entries are denoted AC coefficients (alternate current).
Matrix will finally be encoded by a Huffman code. We shall only sketch the procedure. First the DC term will be encoded by the difference to the DC term of the previously encoded block. For instance, if the previous DC term was 12, then will be encoded as .
After that the AC coefficients are encoded according to the zig-zag order , , , , , , , etc.. In our example, this yields the sequence followed by 55 zeros. This zig–zag order exploits the fact that there are long runs of successive zeros. These runs will be even more efficiently represented by application of run-length coding, i. e., we encode the number of zeros before the next nonzero element in the sequence followed by this element.
Integers are written in such a way that small numbers have shorter representations. This is achieved by splitting their representation into size (number of bits to be reserved) and amplitude (the actual value). So, has size , and have size . , , , and have size , etc.
In our example this yields the sequence for the DC term followed by , , , , , and a final as an end-of-block symbol indicating that only zeros follow from now on. , for instance, means that there is 1 zero followed by an element of size and amplitude .
These pairs are then assigned codewords from a Huffman code. There are different Huffman codes for the pairs (run, size) and for the amplitudes. These Huffman codes have to be specified and hence be included into the code of the image.
In the following pseudocode for the encoding of a single -block we shall denote the different Huffman codes by encode-1, encode-2, encode-3.
1 2 3 4
ELSE12 13 14
At the decoding end matrix will be reconstructed. Finally, by multiplication of each entry by the corresponding entry from the quantisation matrix we obtain an approximation to the block , here
To the inverse cosine transform is applied. This allows to decode the original –block of the original image – in our example as
3.5-1 Find size and amplitude for the representation of the integers , , and .
3.5-2 Write the entries of the following matrix in zig – zag order
How would this matrix be encoded if the difference of the DC term to the previous one was ?
3.5-3 In our example after quantising the sequence , , , , , , has to be encoded. Assume the Huffman codebooks would yield to encode the difference from the preceding block's DC, , , and for the amplitudes , , and , respectively, and , , , and for the pairs , , , and , respectively. What would be the bitstream to be encoded for the block in our example? How many bits would hence be necessary to compress this block?
3.5-4 What would be matrices , and , if we had used
for quantising after the cosine transform in the block of our example?
3.5-5 What would be the zig-zag-code in this case (assuming again that the DC term would have difference from the previous DC term)?
3.5-6 For any sequence define a new sequence by
This sequence can be expanded to a Fourier-series via
Show how the coefficients of the discrete cosine transform
arise from this Fourier series.
Dynamic and adaptive Huffman-coding is based on the following property. A binary code tree has the sibling property if each node has a sibling and if the nodes can be listed in order of nonincreasing probabilities with each node being adjacent to its sibling. Show that a binary prefix code is a Huffman-code exactly if the corresponding code tree has the sibling property.
Generalisations of Kraft's inequality
In the proof of Kraft's inequality it is essential to order the lengths . Show that the construction of a prefix code for given lengths is not possible if we are not allowed to order the lengths. This scenario of unordered lengths occurs with the Shannon-Fano-Elias-code and in the theory of alphabetic codes, which are related to special search problems. Show that in this case a prefix code with lengths exists if and only if
If we additionally require the prefix codes to be also suffix-free i. e., no codeword is the end of another one, it is an open problem to show that Kraft's inequality holds with the on the right–hand side replaced by 3/4, i. e.,
Redundancy of Krichevsky-Trofimov-estimator
Show that using the Krichevsky-Trofimov-estimator, when parameter of a discrete memoryless source is unknown, the individual redundancy of sequence is at most for all sequences and all .
Alternatives to move-to-front-codes
Find further procedures which like move-to-front-coding prepare the text for compression after application of the Burrows-Wheeler-transform.
The frequency table of the letters in English texts is taken from [ 272 ]. The Huffman coding algorithm was introduced by Huffman in [ 119 ]. A pseudocode can be found in [ 54 ], where the Huffman coding algorithm is presented as a special Greedy algorithm. There are also adaptive or dynamic variants of Huffman-coding, which adapt the Huffman-code if it is no longer optimal for the actual frequency table, for the case that the probability distribution of the source is not known in advance. The “3/4-conjecture” on Kraft's inequality for fix-free-codes is due to Ahlswede, Balkenhol, and Khachatrian [ 4 ].
Arithmetic coding has been introduced by Rissanen [ 212 ] and Pasco [ 199 ]. For a discussion of implementation questions see [ 158 ],[ 277 ]. In the section on modelling we are following the presentation of Willems, Shtarkov and Tjalkens in[ 275 ]. The exact calculations can be found in their original paper [ 274 ] which received the Best Paper Award of the IEEE Information Theory Society in 1996. The Krichevsky-Trofimov-estimator had been introduced in [ 153 ].
We presented the two original algorithms LZ77 and LZ78 [ 281 ],[ 282 ] due to Lempel and Ziv. Many variants, modifications and extensions have been developed since that – concerning the handling of the dictionary, the pointers, the behaviour after the dictionary is complete, etc. For a description, see, for instance, [ 25 ] or [ 26 ]. Most of the prominent tools for data compression are variations of Ziv-Lempel-coding. For example “zip” and “gzip” are based on LZ77 and a variant of LZ78 is used by the program “compress”.
The Burrows-Wheeler transform was introduced in the technical report [ 36 ]. It became popular in the sequel, especially because of the Unix compression tool “bzip” based on the Burrows-Wheeler-transform, which outperformed most dictionary—based tools on several benchmark files. Also it avoids arithmetic coding, for which patent rights have to be taken into consideration. Further investigations on the Burrows-Wheeler-transform have been carried out, for instance in [ 21 ],[ 70 ],[ 155 ].
We only sketched the basics behind lossy image compression, especially the preparation of the data for application of techniques as Huffman coding. For a detailed discussion we refer to [ 256 ], where also the new JPEG2000 standard is described. Our example is taken from [ 268 ].
JPEG—short for Joint Photographic Experts Group—is very flexible. For instance, it also supports lossless data compression. All the topics presented in the section on image compression are not unique. There are models involving more basic colours and further transforms besides the -transform (for which even different scaling factors for the chrominance channels were used, the formula presented here is from [ 256 ]). The cosine transform may be replaced by another operation like a wavelet transform. Further, there is freedom to choose the quantisation matrix, responsible for the quality of the compressed image, and the Huffman code. On the other hand, this has the effect that these parameters have to be explicitly specified and hence are part of the coded image.
The ideas behind procedures for video and sound compression are rather similar to those for image compression. In principal, they follow the same steps. The amount of data in these cases, however, is much bigger. Again information is lost by removing irrelevant information not realizable by the human eye or ear (for instance by psychoacoustic models) and by quantising, where the quality should not be reduced significantly. More refined quantising methods are applied in these cases.
Most information on data compression algorithms can be found in literature on Information Theory, for instance [ 55 ],[ 104 ], since the analysis of the achievable compression rates requires knowledge of source coding theory. Recently, there have appeared several books on data compression, for instance [ 26 ],[ 106 ],[ 191 ],[ 224 ],[ 225 ], to which we refer to further reading. The benchmark files of the Calgary Corpus and the Canterbury Corpus are available under [ 37 ] or [ 38 ].
The book of I. Csiszár and J. Körner [ 58 ] analyses different aspects of information theory including the problems of data compression too.