Get-Together for String Algorithms Researchers in Helsinki

Event type: 
Seminar
Event time: 
08.12.2011 - 14:15 - 16:00
Place: 
Exactum DK118
Description: 

The program consists of a guest lecture by Simon Gog from Ulm University and a talk by Dominik Kempa:

Simon Gog: Compressed Suffix Trees in Practice

Abstract: Compressed Suffix Trees (CSTs) are space-efficient full-text index data structures which provide more functionality than their uncompressed counterparts while at the same time use only a fraction of space in practice. However a robust, efficient, and easy-to-use implementation of CSTs is a real challenge. In this talk we show that such an implementation cannot be done by only combining the best known solutions for the three logical parts of a CST: a compressed suffix array (CSA), a compressed representation of the longest common prefix array (LCP), and a structure for the tree topology and navigation (NAV). The components of an optimal combination should reduce redundant information and optimize the speed of the operations by exploiting fast operations of each component. We therefore present a new space-efficient LCP structure which was designed in that spirit and provides new relevant time-space trade-offs. An experimental study shows that our new implementation outperforms other CST implementations but also is competitive with uncompressed suffix trees in various scenarios. I will update the experiments by using also compression boosted CSAs.

Dominik Kempa: Slashing the Time for BWT Inversion

Abstract: Inverting the Burrows-Wheeler transform (BWT) is a bottleneck in BWT-based decompressors. The state-of-the-art inversion algorithm runs in linear time but is slow in practice due to CPU-cache misses. For more than a decade these cache misses have been thought to be inherent to BWT inversion. We show how to reduce the number and cost of cache misses to obtain a speed up by a factor of 2-4. Additionally, we present an algorithm that, for repetitive data, achieves an asymptotic reduction in cache misses in theory and is the fastest algorithm in practice for such data.

02.12.2011 - 17:28 Leena Salmela
02.12.2011 - 17:25 Leena Salmela