Algoritmo lempel ziv
337
A Universal Algorithm for Sequential Data Compression
JACOB ZIV, FELLOW, IEEE, AND ABRAHAM LEMPEL, MEMBER, IEEE
then show that the efficiency of our universal code with no a priori knowledge of the source approaches those bounds. The proposed compression algorithm is an adaptation of a simple copying procedure discussed recently [10] in a study on the complexity of finite sequences. Basically, we employ the concept of encoding future segments of the source-output via maximum-length copying from a buffer I. INTRODUCTION containing the recent past output. The transmitted N MANY situations arising in digital com- codeword consists of the buffer address and the length of munications and data processing, the encountered the copied segment. With a predetermined initial load of strings of data display various structural regularities or are the buffer and the information contained in the codewords, otherwise subject to certain constraints, thereby allowing the source data can readily be reconstructed at the defor storage and time-saving techniques of data compres- coding end of the process. sion. Given a discrete data source, the problem of data The main drawback of the proposed algorithm is its compression is first to identify the limitations of the source, susceptibility to error propagation in the event of a channel and second to devise a coding scheme which, subject to error. certain performance criteria, will best compress the given source. Once the relevant source parameters have been identified, the problem reduces to one of minimum-redundancy coding. This phase of the problem has received extensive treatment in the literature [l]-[7]. When no a priori knowledge of the source characteristics is available, and if statistical tests are either impossible or unreliable, the problem of data compression becomes considerably more complicated. In order to overcome these difficulties one must