LZ77 factorization algorithms
[back to the main page]This webpage is devoted to algorithms computing LempelZiv factorization (also known as LempelZiv or LZ77 parsing). We present C++ implementations of different algorithms varying in speed and space consumption.
Name  Space^{a}  Complexity^{b}  Comments  Paper  Code 
KKP3  13n  O(n)  The fastest algorithm if the text is not highly repetitive.  [1][2]  download 
KKP2  9n  O(n)  The most spaceefficient O(n)time algorithm for integer alphabet.  
KKP1s  5n  O(n)  A version of KKP2 that streams the suffix array from the disk.  
ISA9  9n  O(n log σ)  One of the fastest algorithms at this memory usage.  [1][3]  download 
ISA6s  6n  O(n log σ)  Slower, but more spaceefficient version of ISA9.  
ISA6r  6n  O(n log σ)  A version of ISA6s specialized for highly repetitive inputs.  
LZscan  n+O(n/d)  O(nd × log(n/d))  Parameter d controls the time/space tradeoff.  [4]  download 
^{a }Summarizes the practical space consumption assuming 32bit integers and byte alphabet.
^{b }Assuming the input text is of length n and consists of symbol drawn from an alphabet of size σ.
News
 20140116: Fixed a bug in ISA6s algorithm reported here.
See also
References

Juha Kärkkäinen, Dominik Kempa, Simon J. Puglisi.
Lazy LempelZiv Factorization Algorithms.
ACM Journal of Experimental Algorithmics (JEA), Volume 21(1), 2016, Article 2.4.
[ACM] 
Juha Kärkkäinen, Dominik Kempa, Simon J. Puglisi.
Linear Time LempelZiv Factorization: Simple, Fast, Small.
In Proc. 24th Symposium on Combinatorial Pattern Matching (CPM 2013), Springer, 2013, pp. 189200.
[Springer] 
Dominik Kempa, Simon J. Puglisi.
LempelZiv Factorization: Simple, Fast, Practical.
In Proc. 2013 Algorithm Engineering and Experimentation (ALENEX 2013), SIAM, 2013, pp. 103112.
[SIAM] 
Juha Kärkkäinen, Dominik Kempa, Simon J. Puglisi.
Lightweight LempelZiv Parsing.
In Proc. 12th Symposium on Experimental Algorithms (SEA 2013), Springer, 2013, pp. 139150.
[Springer]