rNA - a fast and accurate short DNA reads numerical aligner
The mandatory first step for a wide range of biological analyses is the alignment of short DNA sequences into a reference genome. We first discuss a method, extending Rabin and Karp's algorithm, to find all occurrences of a pattern P with up to k mismatches inside a text T, with average complexity O(|P| + |T|)--assuming an uniform distribution over T. We then show how to employ this idea in conjunction with an index over T, allowing to search P with mismatches in time proportional to its length. Our novel tool--rNA (randomized Numerical Aligner)--is in general faster and more accurate than other widely used tools like SOAP2, BWA, and BOWTIE.
Joint work with C. Del Fabbro, A. Policriti, F. Vezzi
Alexandru Tomescu, graduate of the University of Bucharest (Romania), recently obtained his PhD in Computer Science from the University of Udine (Italy); he is currently visiting the TU Berlin (Germany). Alexandru's PhD thesis focuses on graphs and sets with the goal of a transfer of structural and algorithmic notions and results across the two areas. He also collaborated with the bioinformatics group at the Institute of Applied Genomics in Udine on designing a method to rapidly and accurately map (short) DNA sequences into reference genomes.