##
Approximate Matching of Run-length Compressed Strings

####
Veli Mäkinen, Gonzalo Navarro and Esko Ukkonen

We focus on the problem of approximate matching of strings that have been
compressed using run-length encoding. Previous studies have concentrated on the
problem of computing the longest common subsequence (LCS) between two strings
of length *m* and *n* compressed to *m'* and *n'* runs. We extend
an existing algorithm for the LCS to the Levenshtein distance achieving
*O(m'n+n'm)* complexity. This approach gives also an algorithm for
approximate searching of a pattern of *m* letters (*m'* runs) in a text
of *n* letters (*n'* runs) in *O(mm'n')* time, both for LCS and
Levenshtein models. Then we propose improvements for a greedy algorithm for
the LCS, and conjecture that the improved algorithm has *O(m'n')* expected
case complexity. Experimental results are provided to support the conjecture.