四种压缩算法－金锄头文库

资源描述

《四种压缩算法》由会员分享，可在线阅读，更多相关《四种压缩算法（42页珍藏版）》请在金锄头文库上搜索。

1、Data Compression Algorithms of LARC and LHarcHaruhiko Okumura*The author is the sysop of the Science SIG of PV-VAN. His address is: 12-2-404 Green Heights, 580 Nagasawa, Yokosuka 239 Japan1. IntroductionIn the spring of 1988, I wrote a very simple data compression program named LZSS in C language, a

2、nd uploaded it to the Science SIG (forum) of PC-VAN, Japans biggest personal computer network.That program was based on Storer and Szymanskis slightly modified version of one of Lempel and Zivs algorithms. Despite its simplicity, for most files its compression outperformed the archivers then widely

3、used.Kazuhiko Miki rewrote my LZSS in Turbo Pascal and assembly language, and soon made it evolve into a complete archiver, which he named LARC.The first versions of LZSS and LARC were rather slow. So I rewrote my LZSS using a binary tree, and so did Miki. Although LARCs encoding was slower than the

4、 fastest archiver available, its decoding was quite fast, and its algorithm was so simple that even self-extracting files (compressed files plus decoder) it created were usually smaller than non-self-extracting files from other archivers.Soon many hobby programmers joined the archiver project at the

5、 forum. Very many suggestions were made, and LARC was revised again and again. By the summer of 1988, LARCs speed and compression have improved so much that LARC-compressed programs were beginning to be uploaded in many forums of PC-VAN and other networks.In that summer I wrote another program, LZAR

6、I, which combined the LZSS algorithm with adaptive arithmetic compression. Although it was slower than LZSS, its compression performance was amazing.Miki, the author of LARC, uploaded LZARI to NIFTY-Serve, another big information network in Japan. In NIFTY-Serve, Haruyasu Yoshizaki replaced LZARIs a

7、daptive arithmetic coding with a version of adaptive Huffman coding to increase speed. Based on this algorithm, which he called LZHUF, he developed yet another archiver, LHarc.In what follows, I will review several of these algorithms and supply simplified codes in C language.2. Simple coding method

8、sReplacing several (usually 8 or 4) space characters by one tab character is a very primitive method for data compression. Another simple method is run-length coding, which encodes the message AAABBBBAACCCC into 3A4B2A4C, for example.3. LZSS codingThis scheme is initiated by Ziv and Lempel 1. A slig

9、htly modified version is described by Storer and Szymanski 2. An implementation using a binary tree is proposed by Bell 3. The algorithm is quite simple: Keep a ring buffer, which initially contains space characters only. Read several letters from the file to the buffer. Then search the buffer for t

10、he longest string that matches the letters just read, and send its length and position in the buffer.If the buffer size is 4096 bytes, the position can be encoded in 12 bits. If we represent the match length in four bits, the pair is two bytes long. If the longest match is no more than two character

11、s, then we send just one character without encoding, and restart the process with the next letter. We must send one extra bit each time to tell the decoder whether we are sending a pair or an unencoded character.The accompanying file LZSS.C is a version of this algorithm. This implementation uses mu

12、ltiple binary trees to speed up the search for the longest match. All the programs in this article are written in draft-proposed ANSI C. I tested them with Turbo C 2.0.4. LZW codingThis scheme was devised by Ziv and Lempel 4, and modified by Welch 5.The LZW coding has been adopted by most of the exi

13、sting archivers, such as ARC and PKZIP. The algorithm can be made relatively fast, and is suitable for hardware implementation as well.The algorithm can be outlined as follows: Prepare a table that can contain several thousand items. Initially register in its 0th through 255th positions the usual 25

14、6 characters. Read several letters from the file to be encoded, and search the table for the longest match. Suppose the longest match is given by the string ABC. Send the position of ABC in the table. Read the next character from the file. If it is D, then register a new string ABCD in the table, an

15、d restart the process with the letter D. If the table becomes full, discard the oldest item or, preferably, the least used.A Pascal program for this algorithm is given in Storers book 6.5. Huffman codingClassical Huffman coding is invented by Huffman 7. A fairly readable accound is given in Sedgewic

16、k 8.Suppose the text to be encoded is ABABACA, with four As, two Bs, and a C. We represent this situation as follows:421| ABCCombine the least frequent two characters into one, resulting in the new frequency 2 + 1 = 3:43| / ABCRepeat the above step until the whole characters combine into a tree:7/3ABCStart at the top (root) of this

展开阅读全文

四种压缩算法

最新文档