University homepage Suomenkielinen versio puuttuu Inte på svenska In english
University of Helsinki Department of Computer Science
 

Department of Computer Science

Recent trends in stringology - study group

News

  • Some new papers added 14th September to complete the list of possible subjects for the subsequent seminar to be held in the autumn semester.
  • The name in parantheses in the end of each paper listed below refera to the person who introduced the paper. Papers not associated with any name were not covered.
  • The last meeting of the study group was held Thursday 10th June.

Working policy

  • Two persons give introductory talks on two different topics on each Monday.
  • 1-2 papers are selected based on the introductions to be read by the whole group between Monday-Thursday.
  • On Thursday the group discusses the papers in more detail.

Schedule

  • Starting from thursday 13.5 all meetings in room B227b (end of library)
  • monday 3.5 14-16: organization
  • monday 10.5 14-16: 2 introductory talks by Kjell and Veli
  • thursday 13.5 10-13: in detail discussion
  • monday 17.5 14-16: 2 introductory talks by Juha and Ari
  • wednesday 19.5 10-13: in detail discussion
  • monday 24.5 14-16: introductory talks by Stefan
  • thursday 27.5 10-13: in detail discussion
  • monday 31.5 14-16: 2 introductory talks by Aristides and Juho
  • thursday 3.6 10-13: in detail discussion
  • monday 7.6 14-16: introductory talk by Kimmo
  • thursday 10.6 10-13: in detail discussion and introductory talk by Tomi

Participants

  • Esko Ukkonen
  • Juha Kärkkäinen
  • Stefan Burkhardt
  • Shunsuke Inenaga
  • Veli Mäkinen (organizer)
  • Margus Lukk
  • Aristides Gionis
  • Taneli Mielikäinen
  • Kjell Lemström
  • Tomi Pasanen
  • Ari Rantanen
  • Kimmo Palin
  • Panayiotis Tsaparas
  • Yoan Pinzon (visiting from Kings College, London)
  • Juho Rousu (visiting 31.5 from Royal Holloway, London)

Topics

  • Pattern discovery
  • Inverse problems in stringology
  • Compressed text indexes & Burrows-Wheeler transformation
  • Metric space indexing
  • Music information retrieval
  • Detection of virus DNA in Human Genome
  • 3D matching under rotation invariance with applications
  • Gapped q-grams
  • Unexpected words discovery (based on Apostolico's work)
  • Support-vector machines, string kernels
  • New trends in information retrieval
  • External memory string structures
  • Other?

Papers

Pattern Discovery

  • G. S. Brodal, R. B. Lyngsø, C. N. S. Pedersen, J. Stoye: Finding Maximal Pairs with Bounded Gap.
    In CPM 1999.
    [SpringerLink] (Veli)
  • Pelfrene, Abdeddaiam, Alexandre: Extracting Approximate Patterns.
    In CPM 2003.
    [SpringerLink] (Kjell)
  • Crochemore et al.: Longest Repeats with a Block of Dont Cares.
    In LATIN 2004.
    [SpringerLink.] (Veli)
  • M. Takeda, S. Inenaga, H. Bannai, A. Shinohara, and S. Arikawa: Discovering Most Classificatory Patterns for Very Expressive Pattern Classes.
    In Proc. DS'03, LNCS 2843, pp. 486-493.
    [SpringerLink]
  • S. Inenaga, H. Bannai, A. Shinohara, M. Takeda, S. Arikawa: Discovering Best Variable-Length-Don't-Care Patterns.
    In Proc. DS'02, LNCS 2534, pp. 86-97.
    [SpringerLink]
  • J. K. Lanctot, M. Li, B. Ma, S. Wang, and L. Zhang: Distinguishing string selection problems.
    Information and Computation 185(1):41-55, 2003.
    [.pdf]
  • N. Pisanti, M. Crochemore, R. Grossi, and M.-F. Sagot: A Basis of Tiling Motifs for Generating Repeated Patterns and Its Complexity for Higher Quorum.
    In Proc. MFCS'03, LNCS 2747, pp. 622-631.
    [SpringerLink] (Kjell)
  • P. Nicodeme, Salvy, Flajolet: Motif Statistics.
    Technical report 2004.
    [.ps ]
  • Pavel A. Pevzner and Sing-Hoi Sze. Combinatorial Approaches to Finding Subtle Signals in DNA Sequences. In Proc. ISMB'00, AAAI 2000, pp. 269-278.
    [.pdf] (Juha)
  • Jeremy Buhler and Martin Tompa. Finding Motifs Using Random Projections.
    Journal of Computational Biology 9 (2), 2002, pp. 225-242.
    [.ps (preliminary version?)] [ACM Portal (shorter conference version)] (Juha)
  • Uri Keich and Pavel A. Pevzner. Finding motifs in the twilight zone.
    Bioinformatics 18 (10), 2002, pp. 1374-1381.
    [Bioinformatics] (Ari)
  • Uri Keich and Pavel A. Pevzner. Subtle motifs: defining the limits of motif finding algorithms.
    Bioinformatics 18 (10), 2002, pp. 1382-1390.
    [Bioinformatics] (Ari)
  • Yuh-Jyh Hu. Finding subtle motifs with variable gaps in unaligned DNA sequences.
    Computer Methods and Programs in Biomedicine. Volume 70, Issue 1 , January 2003, Pages 11-20.
    [ScienceDirect]
  • Motif Discovery Assestment

Inverse problems

  • H. Bannai, S. Inenaga, A. Shinohara, and M. Takeda: Inferring Strings from Graphs and Arrays.
    In Proc. MFCS'03, LNCS 2747, pp. 208-217.
    [SpringerLink]
  • J.-P. Duval, T. Lecroq and A. Lefebvre: Border Array on Bounded Alphabet.
    In Proc. PSC'02,
    [.ps]

Compressed text indexes & Burrows-Wheeler

  • Giovanni Manzini: An Analysis of the Burrows-Wheeler Transform.
    JACM 2001 .
    [.ps]
  • R. Giancarlo and M. Sciortino: Optimal Partitions of Strings: A New Class of Burrows-Wheeler Compression Algorithms.
    In Proc. CPM 2003.
    [SpringerLink]
  • P. Ferragina and G. Manzini: On Compressing and Indexing Data.
    Technical report on FOCS 2000 & SODA 2001 papers.
    [link]
  • Wing-Kai Hon, Kunihiko Sadakane, Wing-Kin Sung. Breaking a Time-and-Space Barrier in Constructing Full-Text Indices.
    In Proc. FOCS'03, IEEE 2003, pp. 251-260.
    [IEEE Xplore] [.ps]
  • Wing-Kai Hon, Tak-Wah Lam, Kunihiko Sadakane and Wing-Kin Sung. Constructing Compressed Suffix Arrays with Large Alphabets
    In Proc. ISAAC'03, LNCS 2906, Springer 2003, pp. 240-249.
    [SpringerLink]
  • Tak-Wah Lam, Kunihiko Sadakane, Wing-Kin Sung and Siu-Ming Yiu. A Space and Time Efficient Algorithm for Constructing Compressed Suffix Arrays
    In Proc. COCOON'02, LNCS 2387, Springer 2002, pp. 401-410.
    [SpringerLink]
  • Roberto Grossi, Ankur Gupta and Jeffrey Scott Vitter. High-order entropy-compressed text indexes.
    In Proc. SODA'03, ACM-SIAM 2003, pp. 841-850.
    [ACM Portal]
  • Kunihiko Sadakane. New text indexing functionalities of the compressed suffix arrays.
    Journal of Algorithms 48 (2), 2003, pp. 294-313.
    [Science Direct]
  • Kunihiko Sadakane. Compressed Suffix Trees with Full Functionality.
    Manuscript, 2003.
    [.ps]
  • Wing-Kai Hon, Tak-Wah Lam, Kunihiko Sadakane, Wing-Kin Sung and Siu-Ming Yiu. Compressed index for dynamic text.
    In DCC'04, 2004, pp. 102-111.
    [IEEE Xplore]
  • Kendall Willets. Full-Text Searching & the Burrows-Wheeler Transform.
    Dr. Dobb's Journal, December 2003.
    [Willets' compressed index page]
  • Jeong Seop Sim, Dong Kyue Kim, Heejin Park and Kunsoo Park. Linear-time search in suffix arrays
    InProc. AWOCA'03, pp. 139-146.
    [.ppt]
  • Veli Mäkinen and Gonzalo Navarro. New search algorithms and time/space tradeoffs for succinct suffix arrays
    Technical report C-2004-20, Univ. Helsinki, Dept. CS, April 2004.
    [.ps]
  • Paolo Ferragina, Giovanni Manzini, Veli Mäkinen, and Gonzalo Navarro. An Alphabet-Friendly FM-Index.
    To appear in Proc. SPIRE'04, October 5-8, Padova, Italy, 2004.
    [.ps]

Analysis of longest common subsequences

  • The following paper solved an old conjecture on the expected length of longest common subsequences. See the references in that paper for the history of this problem.
  • Marcos Kiwi, Martin Loebl, and Jiri Matousek.Expected Length of the Longest Common Subsequence for Large Alphabets.
    In Proc. LATIN, LNCS 2976, pp. 302-311, 2004.
    [SpringerLink]

Metric space indexes

  • Edgar Chávez and Gonzalo Navarro. A Metric Index for Approximate String Matching.
    In Proc. LATIN'02, LNCS 2286, Springer 2002, pp. 181-195.
    [SpringerLink] [.ps] (Aristides)

External memory string structures

  • Juha Kärkkäinen and S. Srinivasa Rao. Full-Text Indexes in External Memory.
    Chapter 7 in U. Meyer, P. Sanders, J. Sibeyn (eds.), Algorithms for Memory Hierarchies (Advanced Lectures). LNCS 2625, Springer 2003, pp. 149-170.
    [SpringerLink (chapter)] [SpringerLink (volume)] [.pdf (local)]
  • Paolo Ferragina and Roberto Grossi. The string B-tree: a new data structure for string search in external memory and its applications
    JACM 46 (2), 1999, pp. 236-280.
    [ACM Portal] (Kimmo)

String Kernels

  • Vishwanathan and A. Smola. Fast Kernels for String and Tree Matching.
    Advances in Neural Information Processing Systems 15, 2003, pp. 569-576. (Juho)
  • C. Leslie and R. Kuang. Fast Kernels for Inexact String Matching.
    In Proc. 16th Conference on Computational Learning Theory and 7th Kernel Workshop, COLT/Kernel'2003. Lecture Notes in Computer Science 2777 (2003), pp. 114 - 128.
    [SpringerLink] (Juho)
  • Juho Rousu, John Shawe-Taylor: Efficient Computation of Gap-Weighted String Kernels on Large Alphabets.
    In PASCAL workshop, January 2004.
    [.ps] (Juho)
  • Lodhi, Saunders, Shawe-Taylor, Cristianini, Watkins: Text Classification using String Kernels.
    Journal of Machine Learning Research 2(2002):419-444. (Juho)

Information Retrieval

  • Holger Bast: Dimension Reduction: A Powerful Principle for Automatically Finding Concepts in Unstructured Data
    Survey manuscript, 2004.
    [.pdf]
  • Thomas Hoffman: Unsupervised Learning by Probabilistic Latent Semantic Analysis.
    Machine Learning, 42, 177–196, 2001.
    [.pdf ]

Music Information Retrieval

Miscellaneous

  • Adam Kalai. Efficient Pattern-Matching with Don't Cares.
    In Proc. SODA'02, ACM-SIAM 2002, pp. 655-656.
    [ACM Portal] (Stefan)
  • Richard Cole, Lee-Ad Gottlieb, Moshe Lewenstein: Dictionary matching and indexing with errors and don't cares.
    To appear in STOC 2004.
    [.ps] (Stefan)

Veli Mäkinen
Last updated 14.9.2004