Results

Ph.D. Degrees obtained within the Research Unit 1996 -

Antoine Doucet 2005 Advanced Document Description, a Sequential Approach
Taneli Mielikäinen 2005 Summarization Techniques for Pattern Collections in Data Mining
Hellis Tamm 2004 On minimality and size reduction of one-tape and multitape finite automata
Teemu Kivioja 2004 Computational Tools for a Novel Transcriptional Profiling Method
Matti Kääriäinen 2004 Learning Small Trees and Graphs that Generalize
Janne Ravantti 2004 Computational Methods for Reconstructing Macromolecular Complexes from Cryo-Electron Microscopy Images
Petteri Sevon 2004 Algorithms for Association-Based Gene Mapping
Kari Vasko 2004 Computational methods and models for paleoecology
Mikko Koivisto 2004 Sum-Product Algorithms for the Analysis of Genetic Risks
V. Mäkinen 2003 Parameterized Approximate String Matching and Local-Similarity-Based Point-Pattern Matching
Jaak Vilo 2002 Pattern Discovery from Biosequences
Vesa Ollikainen 2002 Simulation Techniques for Disease Gene Localization in Isolated Populations
Kimmo Fredriksson 2001 Rotation Invariant Template Matching
Marko Salmenkivi 2001 Computational Methods for Intensity Models
Juho Rousu 2001 Efficient Range Partitioning in Classification Learning
Kjell Lemström 2000 String Matching Techniques for Music Retrieval
Barbara Heikkinen 2000 Generalization of Document Structures and Document Assembly
Pirjo Moen 2000 Attribute, Event Sequence, and Event Type Similarity Notions for Data Mining
Mika Klemettinen 1999 A Knowledge Discovery Methodology for Telecommunication Network Alarm Databases
Juha Kärkkäinen 1999 Repetition-based Text Indexes
Erkki Sutinen 1998 Approximate Pattern Matching with the q-gram Family
G. Lindén 1997 Structured Document Transformations
Matti Nykänen 1997 Querying String Databases in Modal Logic
Tapio Elomaa 1996 Tools and techniques for decision tree learning
Helena Ahonen 1996 Generating Grammars for Structured Documents Using Grammatical Inference Methods
H.T.T. Toivonen 1996 Discovery of Frequent Patterns in Large Data Collections

Publications 1996 - 2005

1996 | 1997 | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 |2006

2006

I. Autio, J. Lindgren: Online Learning of Discriminative Patterns from Unlimited Sequences of Candidates. Proc. 18th International Conference on Pattern Recognition, ICPR, 2006.

P. Bas, J. Hurri: Vulnerability of DM watermarking of non-iid host signals to attacks utilising the statistics of independent components. Accepted IEE Proceedings - Information Security, 2006.

E. Bingham, A. Gionis, N. Haiminen, H. Hiisilä, H. Mannila, E. Terzi: Segmentation and dimensionality reduction. 2006 SIAM Conference on Data Mining, April 20-22, 2006, Bethesda, Maryland, USA, pp. 372-383.

M. E. Califf, M. A. Greenwood, M. Stevenson, R. Yangarber (eds.): Proceedings of the COLING/ACL 2006 Workshop on Information Extraction Beyond The Document, July 2006. Sydney, Australia. Url: http://nlp.shef.ac.uk/result/iebd06/

A. Doucet, H. Ahonen-Myka: "Fast extraction of discontiguous sequences in text: a new approach based on maximal frequent sequences". Accepted in IS-LTC'06, Information Society, Language Technology Conference, Ljubljana, Slovenia.

A. Doucet and H. Ahonen-Myka: "Probability and Expected Document Frequency of Discontinued Word Sequences, an efficient method for their exact computation". To appear in the TAL journal, special issue on "Scaling of Natural Language Processing: Complexity, Algorithms and Architectures, 46 (2): 25 pages, 2005. To appear in 2006!

T. Elomaa, J. Kujala, J. Rousu: Practical Approximation of Optimal Multivariate Discretization. Proc. 16th International Symposium on Methodologies for Intelligent Systems (ISMIS-2006), to appear

P. Ferragina, G. Manzini, V. Mäkinen, G. Navarro: Compressed Representations of Sequences and Full-Text Indexes. To appear in ACM Transactions on Algorithms, 2006.

M. Fortelius, A. Gionis, J. Jernvall, H. Mannila: Spectral Ordering and Biochronology of European Fossil Mammals. Paleobiology 32(2), 2006

K. Fredriksson, V. Mäkinen, G. Navarro: Flexible Music Retrieval in Sublinear Time. Accepted to International Journal of Foundations of Computer Science (IJFCS), July 2006.

A. Gionis, H. Mannila, T. Mielikäinen, P. Tsaparas: Assessing Data Mining Results via Swap Randomization, 12th International Conference on Knowledge Discovery and Data Mining (KDD) 2006. KDD Best Paper Runner-up.

A. Gionis, H. Mannila, K. Puolamaki, A. Ukkonen; Algorithms for Discovering Bucket Orders from Data, 12th International Conference on Knowledge Discovery and Data Mining (KDD) 2006

S. Grabowski, G. Navarro, R. Przywarski, A. Salinger, V. Mäkinen: A Simple Alphabet-Independent FM-Index. Accepted to International Journal of Foundations of Computer Science (IJFCS), July 2006.

M. Greig, H. Haanpää, P. Kaski: On the coexistence of conference matrices and near resolvable 2-(2k+1,k,k-1) designs. Journal of Combinatorial Theory, Series A 113 (2006), 703-711.

R. Gwadera, A. Gionis, H. Mannila: Optimal Segmentation using Tree Models 2006 IEEE International Conference on Data Mining, to appear.

H. Haanpää, M. Järvisalo, P. Kaski, I. Niemelä: Hard satisfiable clause sets for benchmarking equivalence reasoning techniques. Journal on Satisfiability, Boolean Modeling and Computation 2 (2006), 27-46.

O. Hallikas, K. Palin, N. Sinjushina, R. Rautiainen, J. Partanen, E. Ukkonen, J. Taipale: Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding affinity. Cell 124 (January 13, 2006), 47-59.

H. Heikinheimo, H. Mannila, J. Seppänen: Finding Trees from Unordered 0-1 Data. To appear in 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD) 2006.

M. Heinonen, A. Rantanen, T. Mielikäinen, E. Pitkänen, J. Kokkonen, J. Rousu: Ab Initio Prediction of Molecular Fragments from Tandem Mass Spectrometry Data. In German Conference on Bioinformatics 2006 (GCB'06). 2006.

P. Hintsanen, P. Sevon, P. Onkamo, L. Eronen, H. Toivonen: An empirical comparison of case-control and trio-based study designs in high-throughput association mapping, Journal of Medical Genetics 2006:43: 617-624

P. O. Hoyer, S. Shimizu, A. Hyvärinen, Y. Kano, A.J. Kerminen: New permutation algorithms for causal discovery using ICA, Proc. International Symposium on Independent Component Analysis and Blind Signal Separation (ICA2006), Charleston, SC, USA, 2006. In press.

P. O. Hoyer, S. Shimizu, A. J. Kerminen: Estimation of linear, non-gaussian causal models in the presence of confounding latent variables In press.

J. Hurri: Learning Cue-Invariant Visual Responses. Advances in Neural Information Processing Systems (NIPS), pages 539-546, 2006.

A. Hyvärinen: Consistency of pseudolikelihood estimation of fully visible Boltzmann machines. Neural Computation 18(10): 2283-2292 (2006).

A. Hyvärinen, U. Köster: FastISA: A fast fixed-point algorithm for independent subspace analysis. In Proc. European Symposium on Artificial Neural Networks, Bruges, Belgium, 2006.

A. Hyvärinen, J. Perkiö: Learning to segment any random vector. In Proc. IEEE Int. Joint Conf. on Neural Networks, Vancouver, Canada, 2006.

A. Hyvärinen, S. Shimizu: A quasi-stochastic gradient algorithm for variance-dependent component analysis. In Proc. Int. Conf. on Artificial Neural Networks, Athens, Greece, 2006. In press.

S. Hyvönen, H. Junninen, L. Laakso, M. Dal Maso, T. Grönholm, B. Bonn, P. Keronen, P. Aalto, V. Hiltunen, T. Pohja, S. Launiainen, P. Tunved, HC Hanssen, P. Hari, H. Mannila, M. Kulmala. Data mining approaches to explaining aerosol formation. In: Voinov, A., Jakeman, A., Rizzoli, A. (eds). Proceedings of the iEMSs Third Biennial Meeting: "Summit on Environmental Modelling and Software". International Environmental Modelling and Software Society, Burlington, USA, July 2006. CD ROM. Internet: http://www.iemss.org/iemss2006/sessions/all.html

S. Hyvönen, A. Leino, M. Salmenkivi: Multivariate Analysis of Finnish Dialect Data. To appear in Literary and Linguistic Computing.

A. Kaban, E. Bingham: ICA-based Binary Feature Construction. Independent Component Analysis and Blind Signal Separation: 6th International Conference, ICA 2006, Charleston, SC, USA, March 5-8, 2006. Proceedings, Justinian Rosca, Deniz Erdogmus, Jose C. Principe, Simon Haykin (eds.). pp. 140 - 148.

P. Kaski, P. R. J. Östergård: Classification Algorithms for Codes and Designs. Springer-Verlag, Berlin Heidelberg, 2006.

P. Kaski, P. R. J. Östergård: There exists no symmetric configuration with 33 points and line size 6. Australasian Journal of Combinatorics, to appear.

P. Kaski, P. R. J. Östergård, O. Pottonen: The Steiner quadruple systems of order 16. Journal of Combinatorial Theory, Series A, 113 (2006) 1764-1770.

P. Kaski, P. R. J. Östergård, S. Topalova, R. Zlatarski: Steiner triple systems of order 19 and 21 with subsystems of order 7. Discrete Mathematics, to appear.

J. Kivinen, M. K. Warmuth, B. Hassibi: The p-norm generalization of the LMS algorithm for adaptive filtering. IEEE Transactions on Signal Processing 54(5):1782-1793, May 2006.

M. Koivisto: An O(2^n) algorithm for graph coloring and other partitioning problems via inclusion-exclusion. Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS-2006), to appear.

M. Koivisto: Advances in exact Bayesian structure discovery in Bayesian networks. In: R. Dechter and T. Richardson (eds.), Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI-2006), pp. 241--248, AUAI Press, 2006.

M. Koivisto: Parent assignment is hard for the MDL, AIC, and NML costs. In: G. Lugosi, H.U. Simon (eds.), The 19th Annual Conference on Learning Theory (COLT-2006), LNAI 4005, pp. 289--303, Springer, 2006.

M. Koivisto: Optimal 2-constraint satisfaction via sum-product algorithms. Information Processing Letters, 98(1): 22-24, 2006.

J. Kollin, Mikko Koivisto: Bayesian learning with mixtures of trees. In: J. Fürnkranz, T. Scheffer, M. Spiliopoulou (eds.), Proceedings of the 17th European Conference on Machine Learning (ECML-2006), to appear.

M. Korpela, J. Hollmén: Extending an algorithm for clustering gene expression time series. In J. Rousu, S. Kaski, E. Ukkonen (eds.) Proceedings of the Workshop on Probabilistic Modeling and Machine Learning in Structural and Systems Biology, University of Helsinki, Department of Computer Science, Series of Publications B, Report B-2006-4, pages 120-124, June 2006.

K. Kulovesi, J. Muhonen, I. Lappalainen, P. T. Riikonen, M. Vihinen, H. Toivonen, T. A. Pasanen: Visualisation of Associations Between Nucleotides in SNP Neighbourhoods, IDAMAP 2006.

I. Kurki, A. Hyvärinen, P.L. Laurinen: Collinear context (and learning) change the profile of the perceptual filter. Vision Research, 46(13):2009-2014, 2006.

I. Kurki, J. Saarinen: Detection of irregular spatial structures. Spatial Vision 19(5): 375-388 (2006).

J. Kärkkäinen: Fast BWT in Small Space by Blockwise Suffix Sorting. Accepted to Theoretical Computer Science.

J. Kärkkäinen, P. Sanders, S. Burkhardt: Linear work suffix array construction. J. ACM. In press.

M. Kääriäinen: Active Learning in the Non-realizable Case, to appear at ALT, 2006.

M. Kääriäinen: Semi-Supervised Model Selection Based on Cross-Validation, Special Session on Model Selection, IEEE International Joint Conference on Neural Networks, IJCNN'06.

M. Kääriäinen, J. Langford: Lower bounds for reductions, oral presentation at the Atomic Learning workshop at TTI-C, March 2006.

N. Landwehr, T. Mielikäinen, L. Eronen, H. Toivonen, H. Mannila: Constrained Hidden Markov Models for Population-based Haplotyping, PMSB 2006, to appear.

S. Laur, H. Lipmaa, T. Mielikäinen: Cryptographically Private Support Vector Machines. In Mark Craven and Dimitrios Gunopulos (Eds.): The Twelfth Annual SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006). ACM, 2006.

M. Lehtonen: Designing User Studies for XML Retrieval. In Proceedings of the SIGIR 2006 Workshop on XML Element Retrieval Methodology, Seattle, USA, pages 28-34. Department of Computer Science, University of Otago, New Zealand, 2006.

M. Lehtonen. Preparing Heterogeneous XML for Full-Text Search. To appear in ACM Transactions on Information Systems (TOIS) 24, 3 (October). ACM Press, 2006.

A. Leino, S. Hyvönen, M. Salmenkivi: Mitä murteita Suomessa onkaan? Murresanaston levikin kvantitatiivista analyysiä. Virittäjä 1/2006, 26-45.

K. Lemström, A. Pienimäki: Approaches for content- based retrieval of symbolically encoded polyphonic music. In 9th Intermational Conference on Music Perception and Cognition (ICMPC9), Bologna, Italy, August 22-26, 2006. (Extended version to appear in Musicae Scientiae).

J. Lindgren, A. Hyvärinen: Emergence of conjunctive visual features by quadratic independent component analysis, NIPS, 2006. In press.

T. Mielikäinen: Frequency-Based Views to Pattern Collections. Discrete Applied Mathematics, 154(7):1113-1139. Elsevier, 2006.

T. Mielikäinen: Transaction databases, Frequent Itemsets, and Their Condensed Representations. In Francesco Bonchi and Jean-François Boulicaut (Eds.): Fourth International Workshop on Knowledge Discovery in Inductive Databases (KDID 2005). LNCS 3933:139-164. Springer, 2006.

T. Mielikäinen, P. Panov, S. Džeroski: Itemset Support Queries using Frequent Itemsets and Their Condensed Representations. In N. Lavrač and L. Todorovski (Eds.): Discovery Science (DS) - 9th International Conference on Discovery Science (DS 2006), LNAI 4265, Springer, 2006.

T. Mielikäinen, E. Terzi, P. Tsaparas; Aggregating Time Partitions. In Mark Craven and Dimitrios Gunopulos (Eds.): The Twelfth Annual SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006). ACM, 2006.

T. Mielikäinen, E. Ukkonen: The complexity of matroid-greedoid intersection and weighted greedoid maximization. Discrete Applied Mathematics 154 (2006), 684-691.

P. Miettinen, T. Mielikäinen, A. Gionis, G. Das, H. Mannila: The Discrete Basis Problem. In Johannes Fürnkranz, Tobias Scheffer, and Myra Spiliopoulou (Eds.): Knowledge Discovery in Databases: PKDD 2006 - 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, LNAI, Springer, 2006. PKDD Best Paper.

J. Muhonen, H. Toivonen: Closed Non-Derivable Itemsets, PKDD 2006.

S. Myllykangas, J. Himberg, T. Böhling, B. Nagy, J. Hollmén, S. Knuutila: DNA copy number amplification profiling of human neoplasms. Oncogene. In press.

V. Mäkinen: Peak Alignment using Restricted Edit Distances. Accepted to Biomolecular Engineering, June 2006.

V. Mäkinen, G. Navarro: Rank and Select Revisited and Extended. Accepted to Theoretical Computer Science, July 2006.

V. Mäkinen, G. Navarro: Dynamic Entropy-Compresssed Sequences and Full-Text Indexes. In Proc. 17th Annual Symposium on Combinatorial Pattern Matching (CPM 2006), Springer-Verlag LNCS 4009, pp. 306-317, Barcelona, Spain, July 5-7, 2006.

V. Mäkinen, G. Navarro: Position-Restricted Substring Searching. In Proc. 7th Latin American Symposium on Theoretical Informatics (LATIN 2006), Springer-Verlag LNCS 3887, pp. 703-714, Valdivia, Chile, March 20-24, 2006.

G. Navarro, Veli Mäkinen: Compressed Full-Text Indexes (survey, 2nd revised version), Technical report TR/DCC-2006-6, Department of Computer Science, University of Chile, April 2006.

P. Nymark, H. Wikman, S. Ruosaari, J. Hollmén, E. Vanhala, A. Karjalainen, S. Anttila, S. Knuutila: Identification of specific gene copy number changes in asbestos-related lung cancer. Cancer Research, 66(11):5737-5743, June 2006.

P. Onkamo, H. Toivonen: A survey of data mining methods for linkage disequilibrium mapping, Human Genomics 2006, 2(5): 336-340.

K. Palin, J. Taipale, E. Ukkonen: Locating potential enhancer elements by comparative genomics using the EEL software. Nature Protocols 1 (2006), 368-374.

P. Parikka, E. Pitkänen, A. Rantanen, A. Åkerlund, E. Ukkonen: Pathway Assistant: a web portal for metabolic modelling. Network Tools and Applications in Biology (NETTAB 2006) In press.

A. Pienimäki, K. Lemström: Organising Symbolic Music Collections. Accepted to Journal of New Music Research.

R. Pulkkinen ja M. Salmenkivi: Nimistöntutkimus ja data-analyysi apuna Suomen varhaisen asutuksen selvittelyssä - mahdollisuuksia ja ongelmia. CD-ROM -julkaisu Polku sydänmaalle - Satakunnan varhaisen lappalaisasutuksen jäljillä -seminaarin esitelmistä, kesäkuu 2006.

R. Pulkkinen, M. Salmenkivi: Mitä paikannimet kertovat Suomen karhusta? In Elisa Bruk (ed.): Karhun kannoilla. Esitelmät Porin Yliopistokeskuksen monitieteisessä Karhun kannoilla -symposiumissa marraskuussa 2005. Pori 2006. In press.

K. Puolamäki, M. Fortelius, H. Mannila: Seriation in Paleontological Data Using Markov Chain Monte Carlo Methods. PLoS Comput Biol 2(2): e6

K. Puolamäki, S. Kaski (eds.): Proceedings of the NIPS 2005 workshop on machine learning for implicit feedback and user modeling, Otaniemi, Finland, May 2006.

A. Rantanen, H. Maaheimo, E. Pitkänen, J. Rousu, E. Ukkonen: Equivalence of metabolite fragments and flow analysis of isotopomer distributions for flux estimation. Transactions on Computational Systems Biology. In press.

A. Rantanen, T. Mielikäinen, J. Rousu, H. Maaheimo, E. Ukkonen: Planning optimal measurements of isotopomer distributions for estimation of metabolic fluxes. Bioinformatics 22 (2006), 1198-1206.

A. Rasinen, J. Hollmén, H. Mannila: Analysis of Linux evolution using aligned source code segments. In N. Lavrač, L. Todorovski, K.P. Jantke (eds.), In Proceedings of the Ninth International Conference on Discovery Science, volume 4265 of Lecture Notes in Artificial Intelligence, pages 209-218. Springer-Verlag, 2006. Barcelona, Spain.

J. Rousu, S. Kaski, E. Ukkonen (eds.): Probabilistic Modeling and Machine Learning in Structural and Systems Biology. Workshop proceedings. Series of publications B, report B-2006-4, Department of Computer Science, University of Helsinki, 2006

J. Rousu, C. Saunders, S. Szedmak, J. Shawe-Taylor. Efficient algorithms for max-margin structured classification. In G. Bakir, T. Hofman, B. Schölkopf, A. Smola, B. Taskar, S.V.N Vishwanathan (eds.): Predicting Structured Data, MIT Press, in press

J. Rousu, C. Saunders, S. Szedmak, J. Shawe-Taylor: Kernel-based Learning of Hierarchical Multilabel Classification Models. Journal of Machine Learning Research 7 (2006), pp. 1601 - 1626

E. Salmela, O. Taskinen, J. K. Seppänen, P. Sistonen, M. J. Daly, P. Lahermo, M.-L. Savontaus, J. Kere: Subpopulation difference scanning: a strategy for exclusion mapping of susceptibility genes. Journal of Medical Genetics, Vol. 43, pp. 590-597, 2006. Published Online First: 27 January 2006.

M. Salmenkivi: Efficient mining of correlation patterns in spatial point data. In Proc.of 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-06), pages 359-370. Berlin, September 2006 (forthcoming).

M. Salmenkivi: Finding representative sets of dialect words for geographical regions. In Proceedings of 5th International Conference on Language Resources and Evaluation (LREC 2006), pages 1980-1985, Genoa, Italy, May 2006.

M. Salmenkivi, Interestingness Measures for Spatial Co-location Patterns. To appear in: Shashi Shekhar and Hui Xiong (eds.): Encyclopedia of Geographic Information Science. Springer-Verlag, Berlin Heidelberg, July 2007.

M. Salmenkivi, R. Pulkkinen, H. Tuominen: Leksikaalisten ja syntaktisten nimenosien yleisyydestä ja levinneisyydestä peruskartan paikannimistössä. Virittäjä 110 (2), 190-228, 2006.

M. Salmenkivi, S. Hyvönen, A. Leino, H. Tuominen. Computational survey of clustering in Finnish place name elements. In: Proceedings of the 22nd International Congress of Onomastic Sciences (ICOS XXII), Pisa, Italy, August-September 2005 (forthcoming).

J. K. Seppänen: Using and extending itemsets in data mining: query approximation, dense itemsets, and tiles. Doctoral dissertation, Department of Computer Science and Engineering, Helsinki University of Technology, 2006.

J. K. Seppänen, H. Mannila: Boolean Formulas and Frequent Sets. In Jean-François Boulicaut, Luc Raedt, Heikki Mannila, eds., Constraint-Based Mining and Inductive Databases, pp. 348-361, LNCS 3848, 2006.

P. Sevon, L. Eronen, P. Hintsanen, K. Kulovesi, H. Toivonen: Link discovery in graphs derived from biological databases, DILS 2006, LNBI 4075, 35-49, 2006.

P. Sevon, H. Toivonen, V. Ollikainen: TreeDT: Tree pattern mining for gene mapping, IEEE/ACM Transactions on Computational Biology and Bioinformatics 2006, 174-185.

S. Shimizu, P.O. Hoyer, A. Hyvärinen, A.J. Kerminen: A linear, non-gaussian acyclic model for causal discovery. Accepted to Journal of Machine Learning Research.

S. Shimizu, A. Hyvärinen, P. O. Hoyer, Y. Kano: Finding a causal ordering via independent component analysis. Computational Statistics & Data Analysis, 50/51:3278-3293, 2006.

H. Tamm, M. Nykänen, E. Ukkonen: On size reduction techniques for multitape automata. Theoretical Computer Science. In press.

N. Tatti, T. Mielikäinen, A. Gionis, H. Mannila: What is the dimension of your binary data? 2006 IEEE International Conference on Data Mining, to appear.

J. Tikka, A. Lendasse, J. Hollmén: Analysis of fast input selection: Application in time series prediction. In Proceedings of the 16th International Conference on Artificial Neural Networks (ICANN'06), Lecture Notes in Computer Science. Springer-Verlag, 2006. Athens, Greece.

P. Tsaparas, L. Marino-Ramirez, O. Bodenreider, E.V. Koonin, I.K. Jordan: Global similarity and local divergence in human and mouse gene co-expression networks. BMC Evolutionary Biology 2006, 6:70 (12 September 2006).

A. Vicente, P. O. Hoyer, A. Hyvärinen: Equivalence of some common linear feature extraction techniques for appearance-based object recognition tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence. In press.

R. Yangarber: Verification of Facts across Document Boundaries. Proceedings International Workshop on Intelligent Information Access (IIIA-2006).

2005

F. Afrati and G. Das and A. Gionis and H. Mannila and T. Mielikäinen and P. Tsaparas: Mining chains of relations. 5th International Conference on Data Mining (ICDM) 2005.

H. Ahonen-Myka and A. Doucet: Data mining meets collocations discovery. Inquiries into words, constraints and contexts : Festschrift in the honour of Kimmo Koskenniemi on his 60th birthday pp. 194-203. 2005.

H. Ahonen-Myka: Mining all maximal frequent word sequences in a set of sentences. Proceedings of the 14th ACM International Conference on Information and Knowledge Management, CIKM 2005, October 31- November 5, 2005, Bremen, Germany pp. 255-256.

H. Ahonen-Myka: Text analysis by discovering frequent phrases. Modernin informaatioteknologian menetelmätutkimusta 5 (2005), pp. 96-112.

L. Aunimo: A Question Typology and Feature Set for QA. Proceedings of the Workshop for Knowledge and Reasoning for Answering Questions, held in conjuction with IJCAI-05, July 2005, Edinburgh, Great Britain.

L. Aunimo and R. Kuuskoski: Reformulations of Finnish Questions for Question Answering. Proceedings of the 15th Nordic Conference on Computational Linguistics (NoDaLiDa 2005), May 2005, Joensuu, Finland.

L. Aunimo and R. Kuuskoski: Question Answering Experiments for Finnish and French. Proceedings of the 6th Workshop of the Cross-Language Evaluation Forum, CLEF 2005, Vienna, Austria, September 21-23, 2005. To appear in the LNCS series in 2006.

L. Aunimo, R. Kuuskoski and J. Makkonen: Finnish as Source Language in Bilingual Question Answering. Multilingual Information Access for Text, Speech and Images: 5th Workshop of the Cross-Language Evaluation Forum, CLEF 2004, Bath, UK, September 15-17, 2004, Revised Selected Papers, C. Peters, P.D. Clough, G.J.F.Jones, J. Gonzalo, M.Kluck and B.Magnini, editors. Lecture Notes in Computer Science 3491. Springer 2005.

L. Aunimo and R. Kuuskoski: Overview of the CLEF 2005 Multilingual Question Answering Track. Proceedings of the 6th Workshop of the Cross-Language Evaluation Forum, CLEF 2005, Vienna, Austria, September 21-23, 2005.

I. Autio, J. Borrás, I. Immonen, P. Jalli and E. Ukkonen: A voting margin approach for the detection of retinal microaneurysms. Proceedings of the Fifth IASTED International Conference on Visualization, imagine, and Image Processing, September 7-9, 2005, Benidorm, Spain pp. 511-517.

E. Bertino, A. Kamra, E. Terzi, A. Vakali: Intrusion detection in RBAC-administered databases, 21st Annual Computer Security Applications Conference (ACSAC) 2005.

S. Böcker and V. Mäkinen: Maximum Line-Pair Stabbing Problem and its Variations. In Proc. 21st European Workshop on Computational Geometry (EWCG'05), pp. 183-186, Eindhoven, Netherlands, March, 2005.

A. Borodin, J. S.. Rosenthal, G. O. Roberts, P. Tsaparas: Link Analysis Ranking: Algorithms, Theory and Experimets , ACM Transactions on Internet Technologies (TOIT), Vol 5, No 1, February 2005.

C. Bounsaythip, E. Lindfors, Peddinti V. Gopalacharyulu, Jaakko Hollmén, and Matej Oresic: Network-based representation of biological data for enabling context-based mining. In Proceedings of KRBIO'05, International Symposium of the Knowledge Representation in Bioinformatics, pages 1-6, June 2005.

C. Bounsaythip, J. Hollmén, S. Kaski and M. Oresic, editors. Proceedings of KRBIO'05, International Symposium of the Knowledge Representation in Bioinformatics. Helsinki University of Technology, 2005. Espoo, Finland. 2005

R. Dementiev, J. Kärkkäinen, J. Mehnert and P. Sanders: Better external memory suffix array construction. In the Joint Proceedings of the 7th Workshop on Algorithm Engineering and Experiments (ALENEX 2005) and the 2nd Workshop on Analytic Algorithmics and Combinatorics (ANALCO 2005), SIAM, 2005.

D. Donato, D. S. Leonardi and P. Tsaparas: Stability and Similarity of Link Analysis Ranking Algorithms.  ICALP, Lisbon, Portugal, 2005.

A. Doucet and H. Ahonen-Myka: A method to calculate probability and expected document frequency of discontinued word sequences. ELECTRA Workshop on Methodologies and Evaluation of Lexical Cohesion Techniques in Real-World Applications (Beyond Bag of Words) : ACM 2005 pp. 33-40.

A. Doucet: Advanced Document Description, a Sequential Approach. University of Helsinki 2005.

L. Eronen, F. Geerts, H.T.T. Toivonen: Efficient Markovian algorithms for haplotype reconstruction. Modernin informaatioteknologian menetelmätutkimusta 6 (2005), pp. 16-39.

P. Floréen, G. Lindén, T. Niklander and K. Raatikainen: Proceedings of the workshop on context awareness for proactive systems CAPS 2005. Helsingin yliopisto, tietojenkäsittelytieteen laitos   2005, vi, 187 s.

P. Floréen and G. Lindén: Vet din telefon vad du gör och vad du vill?. Universitas Helsingiensis 24 (2005) : 3, pp. 17-19.

M. Fontaine, S. Burkhardt and J. Kärkkäinen: BDD-Based Analysis of Gapped q-Gram Filters. International Journal of Foundations of Computer Science, 6(16), pp. 1121-1134, 2005.

K. Fredriksson, V. Mäkinen, and G. Navarro: Flexible Music Retrieval in Sublinear Time. In Proceedings of the 10th Prague Stringology Conference (PSC'05), pp. 174-188, Prague, Czech Republic, August, 2005.

A. Gionis and H. Mannila and P. Tsaparas: Clustering aggregation. The 21st International Conference on Data Engineering (ICDE 2005) April 5-8, 2005 National Center of Sciences, Tokyo, Japan 12 pp. 2005

A. Gionis, A. Hinnenburg, S. Papadimitriou, P. Tsaparas: Dimension-induced clustering, 11th International Conference on Knowledge Discovery and Data Mining (KDD) 2005.

B. Goethals and S. Laur, H. Lipmaa and T. Mielikäinen: On private scalar product computation for privacy-preserving data mining. Information security and cryptology--ICISC 2004 pp. 104-120. 2005

B. Goethals and Muhonen, J. and H.T.T. Toivonen: Mining non-derivable association rules. Proceedings of the Fifth SIAM International Conference on Data Mining pp. 239-249. 2005

R. González, S. Grabowski, V. Mäkinen and G. Navarro: Practical Implementation of Rank and Select Queries. In Poster Proceedings of 4th International Workshop on Efficient and Experimental Algorithms (WEA'05), CTI Press and Ellinika Grammata, Santorini Island, Greece, May 10-13, 2005.

P. V. Gopalacharyulu, E. Lindfors, Catherine Bounsaythip, Teemu Kivioja, Laxman Yetukuri, Jaakko Hollmén, and Matej Oresic: Data integration and visualization system for enabling conceptual biology. Bioinformatics, 21(Suppl.1):i177-i185, 2005.

S. Grabowski,V. Mäkinen, G. Navarro and A. Salinger: A Simple Alphabet-Independent FM-Index. In Proceedings of the 10th Prague Stringology Conference (PSC'05), pp. 230-244, Prague, Czech Republic, August, 2005.

P. Hintsanen, P. Sevon, P. Onkamo, L. Eronen and H.T.T. Toivonen: An empirical comparison of case-control and trio-based study designs in high-throughput association mapping. Journal of Medical Genetics, Published Online First: 28 October 2005. doi:10.1136/jmg.2005.036020. 2005

S. Hyvönen, H. Junninen, L. Laakso, M. Dal Maso, T. Grönholm, B. Bonn, P. Keronen, P. Aalto, V. Hiltunen, T. Pohja, S. Launiainen, P. Hari, H. Mannila and M. Kulmala: A look at aerosol formation using datamining techniques. Atmospheric chemistry and physics 5 (2005), pp. 3345-3356.

G. Iachello, M. Raento, and I. Smith: Mobile HCI 2004 Location Systems Privacy and Control Workshop, IEEE Pervasive Computing, 4, (1): 90, 2005.

G. Lindén: Hyvät ja pahat tyypit. Kolumni, Apropos 2, 2005, p. 25.

K. Laasonen: Route Prediction from Cellular Data. In Proceedings of the workshop on Context Awareness for Proactive Systems, CAPS 2005, 147-158. 2005

E. Kettunen, A.G. Nicholson, B. Nagy, J.K. Seppänen, T. Ollikainen, G. Ladas, V. Kinnula, M. Dusmet, pp. Nordling, J. Hollmén, D. Kamel, P. Goldstraw, and pp. Knuutila: L1CAM, INP10, P-cadherin, tPA and ITGB4 over-expression in malignant pleural mesotheliomas revealed by combined use of cDNA and tissue microarray. Carcinogenesis, 26(1):17-25, 2005.

T. Kivioja, M. Arvas, M. Saloheimo, M. Penttilä and E. Ukkonen: Optimization of cDNA-AFLP experiments using genomic sequence data. Bioinformatics 21(11): 2573-2579 (2005).

M. Koivisto and K. Sood: Computational aspects of Bayesian partition models. Proceedings : ACM cop. 2005 pp. 433-440 .

J. Kollin and M. Koivisto: Bayesian Learning with Mixtures of Trees. Helsingin yliopisto, tietojenkäsittelytieteen laitos 2005, 11 pp. 2005

J. Kärkkäinen: Alphabets in Generic Programming. In the Proceedings of the Prague Stringology Conference (PSC '05), Czech Technical University, Prague, 2005, pp. 163-173.

M. Kääriäinen: On generalization error bounds using unlabeled data. Helsingin yliopisto, tietojenkäsittelytieteen laitos 2005, 18 pp.

K. Lassfolk and A. Pienimäki: Studiolaitteiden kloonaus ja äänenlaadun arviointi kytkentätopologioiden valossa. Akustiikkapäivät 2005, pp. 48-53.

S. Laur, S. and H. Lipmaa and Mielikäinen: Private itemset support counting. In: ICICS'05 - Proceedings of Seventh International Conference on Information and Communications Security, Beijing, China, December 10-13, 2005, Volume 3783 of Lecture Notes in Computer Science, pages 97-111. Springer, 2005.

M. Lehtonen: EXTIRP 2004: towards heterogeneity. Advances in XML information retrieval pp. 372-381. 2005

A. Leino: In search of naming patterns: a survey of Finnish lake names. Rivista italiana di onomastica; Supplemento al n.XI, 1 di RIOn (QuadRIOn) 1 (2005), pp. 355-367.

K. Lemström and V. Mäkinen: On minimizing pattern splitting in multi-track string matching. Journal of discrete algorithms 3 (2005), pp. 248-266.

K. Lemström, G. Navarro and Y. Pinzon:. Bit-Parallel Algorithms for Transposition-Invariant Multi-Track String-Matching. Journal of Discrete Algorithms, 3, (2-4), 267-292, 2005.

S. Luyssaert, M. Sulkava, H. Raitio, and J. Hollmén: Are N and S deposition altering the chemical composition of norway spruce and scots pine needles in finland? Environmental Pollution, 138(1):5-17, 2005.

H. Mannila, and M. Salmenkivi: Intensity Modeling of genome Data. J. Wang, D. Shasha, H.T.T. Toivonen, and M. Zaki (eds): Data Mining in Bioinformatics, Springer-Verlag, London, 2005.

T. Mielikäinen: An automata approach to pattern collections. Knowledge discovery in inductive databases 2005, pp. 130-149.

T. Mielikäinen: Implicit enumeration of patterns. Knowledge discovery in inductive databases 2005, pp. 150-172.

T. Mielikäinen: Summarization techniques for pattern collections in data mining. University of Helsinki 2005, viii, 202 pp.

T. Mielikäinen and J. Ravantti: Sinogram denoising of cryo-electron microscopy images. Computational science and its applications, ICCSA 2005 4 (2005), pp. 1251-1261.

V. Mäkinen: Peak Alignment using Restricted Edit Distances. In Proc. 6th International Symposium on Computational Biology and Genome Informatics (CBGI 2005), Salt Lake City, Utah, USA, July, 2005.

V. Mäkinen and G. Navarro: Succinct suffix arrays based on run-length encoding. Nordic journal of computing 12 (2005) : 1, pp. 44-66.

V. Mäkinen and G. Navarro: Succinct Suffix Arrays based on Run-Length Encoding. In Proc. 16th Annual Symposium on Combinatorial Pattern Matching (CPM 2005), LNCS 3537, pp. 45-56, Jeju Island, Korea, June, 2005.

V. Mäkinen, G. Navarro and E. Ukkonen: Transposition invariant string matching. Journal of algorithms 56, pp. 124-153. 2005

G. Navarro and V. Mäkinen: Compressed Full-Text Indexes (survey), Technical report TR/DCC-2005-7, Department of Computer Science, University of Chile, June 2005.

S. Papadimitriou, A. Gionis, P. Tsaparas, A. Väisänen, H. Mannila and C. Faloutsos: Parameter-free spatial data mining using MDL, 5th International Conference on Data Mining (ICDM) 2005.

A. Pienimäki: Musiikkitietokannan selailu musiikillisen avaruuden ulottuvuuksien avulla. Musiikki 35 (2005) 1-2, pp. 46-62.

E. Pitkänen, A. Rantanen, J. Rousu and E. Ukkonen: Finding feasible pathways in metabolic networks. Advances in informatics pp. 123-133. 2005

M. Raento, A. Oulasvirta, H.T.T. Toivonen ja M. Mäntylä: Sosiaalista tilatietoa kontekstipuhelimella. Prosessori 1/2005, pp. 54-56.

M. Raento, A. Oulasvirta, R. Petit, H.T.T. Toivonen: ContextPhone: a prototyping platform for context-aware mobile applications. IEEE pervasive computing 4 (2005) : 2, pp. 51-59.

A. Rantanen, T. Mielikäinen, J. Rousu, and E. Ukkonen: Planning isotopomer measurements for estimation of metabolic fluxes. In: Proceedings of the German Conference on Bioinformatics, Hamburg, Germany, October 5-7 , 2005, pages 177-191.

A. Rantanen, J. Rousu, E. Pitkänen, H. Maaheimo and E. Ukkonen: Flow analysis of metabolite fragments for flux estimation. Third international workshop on computational methods in systems biology : 2005 pp. 242-255.

P. Rastas,M. Koivisto, H. Mannila and E. Ukkonen: A hidden Markov technique for haplotype reconstruction. Algorithms in bioinformatics pp. 140-151. 2005

J. Rousu, A. Rantanen, R. A. Ketola, and J. T. Kokkonen: Isotopomer distribution computation from tandem mass spectrometric data with overlapping fragment spectra. Spectroscopy 19 (2005), pp. 53-67.

M. Salmenkivi, M. and H. Mannila: Piecewise Constant Modeling of Sequential Data Using Reversible Jump Markov Chain Monte Carlo: chapter 5. Data mining in bioinformatics pp. 85-103. 2005

M. Salmenkivi, M. and H. Mannila: Using Markov chain Monte Carlo and dynamic programming for event sequence data. Knowledge and information systems : 7 (2005) : 3, pp. 267-288.

J. K. Seppänen and H. Mannila: Boolean formulas and frequent sets. In Jean-Francois Boulicaut, Luc de Raedt, H. Mannila (eds): Constraint-based mining and inductive databases, Springer-Verlag LNCS Volume 3848, ISBN: 3-540-31331-1, Springer 2005, p. 348-361.

J. K. Seppänen: Upper Bound for the Approximation Ratio of a Class of Hypercube Segmentation Algorithms. Information Processing Letters, 2005. Vol. 93, nro 3, pp. 139-141. 2005

P. Sevon, L. Eronen, P. Hintsanen, K. Kulovesi and H.T.T. Toivonen: Link discovery in graphs derived from biological databases. 3rd International Workshop on Data Integration in the Life Sciences 2006 (DILS'06), Hinxton, UK, accepted for publication. 2005

P. Sevon, H.T.T. Toivonen and P. Onkamo: Gene Mapping by Pattern Discovery. In J. Wang et al (eds), Data Mining in Bioinformatics. Springer, 105-126. 2005

M. Sulkava, P. Rautio, and J. Hollmén: Combining measurement quality into monitoring trends in foliar nutrient concentrations. In W. Duch et al, editor, Proceedings of the International Conference on Artificial Neural Networks (ICANN'05), volume 3697 of Lecture Notes in Computer Science, pages 761-767. Springer-Verlag, 2005.

H. Tamm, M. Nykänen and E. Ukkonen,: Size reduction of multitape automata. Implementation and application of automata pp. 329-330. 2005

J. Tikka, J. Hollmén, and A. Lendasse: Input Selection for Long-Term Prediction of Time Series. In Joan Cabestany, Alberto Prieto, and Francisco Sandoval, editors, Proceedings of the 8th International Work-Conference on Artificial Neural Networks (IWANN 2005), volume 3512 of Lecture Notes in Computer Science, pages 1002-1009. Springer-Verlag, June 2005. Vilanova i la Gelt´u, Barcelona, Spain.

H.T.T. Toivonen, S. Hyvönen and P. Sevon,: Combining phenotypic and genotypic data to discover multiple disease genes. Proceedings of KRBIO'05, International Symposium on Knowledge Representation in Bioinformatics, Helsinki University of Technology, Espoo, Finland, June 15-17, 2005 pp. 7-14.

H.T.T. Toivonen, H., P. Onkamo, P. Hintsanen, E. Terzi and P. Sevon: Data mining for gene mapping. Next generation of data-mining applications, Wiley pp. 263-293. 2005

A. Ukkonen, M. Fortelius and H. Mannila: Finding partial orders from unordered 0-1 data. KDD-2005 pp. 285-293. 2005

J. T. L. Wang, M. J. Zaki, H.T.T. Toivonen, D. and Shasha: Introduction to data mining in bioinformatics: chapter 1. Data mining in bioinformatics pp. 3-8. 2005

J. Wang, M. Zaki, H.T.T. Toivonen, and D. Shasha (Eds): Data mining in bioinformatics. Springer 2005.

R. Yangarber, L. Jokipii. A. Rauramo and S. Huttunen: Information Extraction from Epidemiological Reports. In Proceedings of the Human Language Technology Conference/ Conference on Empirical Methods in Natural Language Processing: HLT/EMNLP-2005, demonstration; (2005) Vancouver, Canada. 2005.

R. Yangarber, L. Jokipii, A. Rauramo and S. Huttunen: Mining the Semantics of Text via Counter-Training. Workshop on Text Mining and Applications TEMA-2005, at the 12th Portuguese Conference on Artificial Intelligence EPIA-2005, Covilhã, Portugal. LNAI Vol 3808, Springer 2005.

R. Yangarber and L. Jokipii: Redundancy-based Correction of Automatically Extracted Facts. In Proceedings of the Human Language Technology Conference/ Conference on Empirical Methods in Natural Language Processing: HLT/EMNLP-2005, (2005) Vancouver, Canada. 2005.

2004

F. Afrati, A. Gionis and H. Mannila. Approximating a collection of frequent sets. KDD-, 2004. : Proceedings 10th International Conference on Knowledge Discovery and Data Mining (KDD), 2004, August 22-25, 2004. Seattle, WA 8 sivua, 2004.

L. Aunimo, J. Makkonen and R. Kuuskoski: Cross-language question answering for Finnish. Web intelligence, 2004, pp. 35-50.

I. Autio and J. T. Lindgren: Attention-Driven Parts-Based Object Detection. ECAI, 2004. pp. 917-921.

S. Burkhardt, K. Fredriksson, T. Ojamies, J. Ravantti and E. Ukkonen: Local approximate 3D matching of proteins in viral cryo-EM density maps. 2nd International symposium on 3D data processing, visualization, and transmission, Thessaloniki Greece, 6-9 September, 2004. pp. 979-986.

A. Bykowski, J. K. Seppänen, and J. Hollmén. Model independent bounding of the supports of Boolean formulae in binary data. In Rosa Meo, Pier Luca Lanzi, and Mika Klemettinen, editors, Database Support for Data Mining Applications - Discovering Knowledge with Inductive Queries, volume 2682 of Lecture Notes in Artificial Intelligence, pages 234-249. Springer-Verlag, 2004.

A. Doucet. Utilisation de sequences frequentes maximales en recherche d'information. JADT, 2004. : le poinds des mots, actes des 7es journees internationales d'analyse statistique des donnees textuelles : Proceedings of the 7th International Conference on the Statistical Analysis of Textual Data (JADT, 2004), Louvain-la-Neuve, Belgium, March 10-12, 2004. pp. 334-345.

A. Doucet and H. Ahonen-Myka: Non-contiguous word seguences for information retrieval. Second ACL Workshop on Multiword Expressions : Integrating Processing ; Proceedings of the Workhop pp. 88-95", 2004.

A. Doucet, L. Aunimo, M. Lehtonen and R. Petit. Accurate Retrieval of XML Document Fragments using EXTIRP. INEX 2003 : Proceedings of the Second Annual Workshop of the Initiative for the Evaluation of XML retrieval (INEX), Schloss Dagstuhl, Germany, December 15-17, 2003, ERCIM Workshop Proceedings, 2004. pp. 73-80.

T. Elomaa, and J. Rousu: Efficient multisplitting revisited: Optima-preserving elimination of partition candidates. Data Mining and Knowledge Discovery 8, 2 (March, 2004), 97-126, 2004.

L. Eronen, F. Geerts and H.T.T. Toivonen: A Markov chain approach to reconstruction of long haplotypes. Pacific Symposium on Biocomputing (PSB, 2004), 104-115, Hawaii, USA, January, 2004. World Scientific, 2004.

P. Ferragina, G Manzini, V. Mäkinen, and G Navarro: An Alphabet-Friendly FM-index. In Proc. 11th Symposium on String Processing and Information Retrieval (SPIRE, 2004), Springer-Verlag LNCS 3246, pp. 150-160, Padova, Italy, October 5-8, 2004.

K. Fredriksson, V. Mäkinen, and Gonzalo Navarro:. Rotation and Lighting Invariant Template Matching. In Proc. Latin American Theoretical Informatics (LATIN, 2004), Springer-Verlag LNCS 2976, pp. 39-48, Buenos Aires, Argentina, April 5-9, 2004.

F. Geerts and B. Goethals and T. Mielikäinen. Tiling databases. Discovery science pp. 278-289, 2004.

F. Geerts and H. Mannila and Terzi, Evimaria. Relational link-based ranking. 30.VLDB: Proceedings of the Thirtieth International Conference on Very Large Data Bases Toronto,Canada August 31 - September 3, 2004. : Morgan Kaufmann, 2004, 552-563, 2004.

A. Gionis and H. Mannila and Seppänen, Jouni K. Geometric and combinatorial tiles in 0-1 data. Knowledge discovery in databases: PKDD, 2004. 12 sivua, 2004.

A. Gionis and H. Mannila and Terzi, Evimaria. Clustered segmentations. 3rd Workshop on Mining Temporal and Sequential Data (TDM), 2004, August 22, 2004. Seattle, WA 11 sivua, 2004.

S. Grabowski, V. Mäkinen, and G. Navarro: First Huffman, then Burrows-Wheeler: A Simple Alphabet-Independent FM-Index. In Proc. 11th Symposium on String Processing and Information Retrieval (SPIRE, 2004), Springer-Verlag LNCS 3246, pp. 210-211, Padova, Italy, October 5-8, 2004.

N. Haiminen and A. Gionis. Unimodal segmentation of sequences. IEEE International conference on data mining pp. 106-113, 2004.

H. Hiisilä and E. Bingham: Dependencies Between Tanscription Factor Binding Sites: Comparison Between ICA, NMF, PLSA and Frequent Sets. 4th IEEE International Conference on Data Mining, Brighton, UK, November 1-4, 2004. pp. 114-121, 2004.

P. O. Hoyer: Non-negative matrix factorization with sparseness contraints. Journal of machine learning research 5 (2004), pp. 1457-1469, 2004.

W. Hämäläinen, H.T.T. Toivonen and V. Porosin: Mining relaxed graph properties in internet. IADIS International Conference : IADIS Press, 2004. pp. 152-159, 2004.

G. Iachello, M. Raento, and I. E. Smith: MobileHCI, 2004.: workshop on location systems privacy and control: held at MobileHCI 04, 13 september, 2004, University of Strathclyde in Glasgow, Scotland. [University of Strathclyde], 2004, [40 s.], 2004.

A. Kaban, E. Bingham, T. Hirsimäki: Learning to Read Between the Lines: The Aspect Bernoulli Model. 4th SIAM International Conference on Data Mining, Lake Buena Vista, Florida, USA, April 22-24, 2004. pp. 462-466, 2004.

E. Kettunen, S. Anttila, J. K. Seppänen, A. Karjalainen, H. Edgren, I. Lindström, R. Salovaara, A.-M. Nissén, J. Salo, K. Mattson, J. Hollmén, S. Knuutila, and H. Wikman. Differentially expressed genes in nonsmall cell lung cancer: expression profiling of cancer-related genes in squamous cell lung cancer. Cancer Genetics and Cytogenetics, 149(2):98-106, 2004.

T. Kivioja: Computational tools for a novel transcriptional profiling method. PhD thesis, Helsingin yliopisto, 2004, viii, 98 pp. : kuv, 2004.

M. Koivisto: Sum-product algorithms for the analysis of genetic risks. PhD thesis, University of Helsinki, 2004, 155 s, 2004.

A. Korhola, H.T.T. Toivonen and K. Vasko. Paleoekologia: kadonneen aarteen metsästys. Tietoyhteys 7 (2004) : 3, pp. 26-28, 2004.

S. Koskenmies, E. Widen, P. Onkamo, P. Sevon, H. Julkunen and J. Kere: Haplotype associations define target regions for susceptibility loci in systemic lupus erythematosus. European journal of human genetics 12 (2004) : 6, pp. 489 - 494.

M. Kääriäinen: Learning small trees and graphs that generalize. PhD thesis, Helsingin yliopisto, 2004, v, 204 pp..

M. Kääriäinen: Relating the Rademacher and VC Bounds. Helsingin yliopisto, tietojenkäsittelytieteen laitos, 2004, 12 s.

M. Kääriäinen, T. Malinen and T. Elomaa: Selective rademacher penalization and reduced error pruning of decision trees. Journal of machine learning research 5 (2004), pp. 1107-1126.

K. Laasonen, M. Raento and H.T.T. Toivonen: Adaptive on-device location recognition. Pervasive Computing: Second International Conference (Pervasive, 2004), LNCS 3001, 287-304, Vienna, Austria, April, 2004. Springer Verlag.

A. Leino: Computational overview og Finnish hydronyms. Onomastica Lettica. 2 pp. 239-268, 2004.

A. Leino: Pikes and perches go together: a data-analytical view on Finnish lake names. Papers from the 30th Finnish conference of linguistics pp. 79-84, 2004.

K. Lemström and G. Navarro and Y. Pinzon: Algorithm for Transposition Invariant LCS. 11th Intl Symp String processing and information retrieval 3246 (2004), pp. 74-75.

G. Lindén (ed.). Proceedings of the proactive computing workshop PROW :, 2004. 25-26 November, 2004, Helsinki, Finland,. Helsinki Institute for Information Technology HIIT, 2004, 129 pp.

S. Luyssaert, M. Sulkava, H. Raitio, and J. Hollmén: Evaluation of forest nutrition based on large-scale foliar surveys: are nutrition profiles the way of the future? Journal of Environmental Monitoring, 6(2):160-167, 2004.

J. Makkonen, H. Ahonen-Myka and M. Salmenkivi. Simple semantics in topic detection and tracking. Information retrieval 7 (2004), pp. 347-368, 2004.

T. Mielikäinen: Discovery of serial episodes from streams of events. SSDBM, 2004. pp. 447-448, 2004.

T. Mielikäinen: Inductive databases as ranking. Data warehousing and knowledge discovery pp. 149-158, 2004.

T. Mielikäinen: Privacy problems with anonymized transaction databases. Discovery science pp. 219-229, 2004.

T. Mielikäinen: Separating structure from interestingness. Advances in knowledge discovery and data mining pp. 476-485, 2004.

T. Mielikäinen and J. Ravantti and E. Ukkonen: The computational complexity of orientation search in cryo-electron microscopy. Computational science-- ICCS, 2004. pp. 231-238.

T. Mielikäinen and J. Ravantti and E. Ukkonen: The computational complexity of orientation search problems in cryo-electron microscopy. Helsingin yliopisto, tietojenkäsittelytieteen laitos, 2004, 21 s.

T. Mielikäinen and E. Ukkonen: The complexity of maximum matroid-greedoid intersection and weighted greedoid maximization. Helsingin yliopisto, tietojenkäsittelytieteen laitos, 2004, [13] lehteä.

V. Mäkinen: Sub-quadratic algorithm for weighted k-mismatches problem. Helsingin yliopisto, tietojenkäsittelytieteen laitos, 2004, [3] lehteä, 2004.

V. Mäkinen and G. Navarro: New search algorithms and time/space tradeoffs for suffinct suffix analysis. Helsingin yliopisto, tietojenkäsittelytieteen laitos, 2004, 36 s, 2004.

V. Mäkinen and G. Navarro: Compressed Compact Suffix Arrays. In Proc. 15th Annual Symposium on Combinatorial Pattern Matching (CPM, 2004), Springer-Verlag LNCS 3109, pp. 420-433, Istanbul, Turkey, July 5-7, 2004.

V. Mäkinen, G. Navarro, and K. Sadakane: Advantages of Backward Searching - Efficient Secondary Memory and Distributed Implementation of Compressed Suffix Arrays. In Proc. 15th Annual Symposium on Algorithms and Computation (ISAAC, 2004), Springer-Verlag LNCS 3341, pp. 681-692, Hong Kong, December 20-22, 2004.

M. Peyrard-Janvid, H. Anthoni, P. Onkamo, P. Lahermo, M. Zucchelli, N. Kaminen, K. Hannula-Jouppi, J. Nopola-Hemmi, A. Voutilainen,H. Lyytinen and J. Kere: Fine mapping of the 2p11 dyslexia locus and exclusion of TACR1 as a candidate gene. Human genetics 114 (2004) : 5, pp. 510-516 Pitkäniemi, Janne and Onkamo, Päivi and Tuomilehto, Jaakko and Arjas, Elja, 2004.

A. Pienimäki and K. Lemström: Clustering symbolic music using paradigmatic and surface level analyses. ISMIR, 2004. : proceedings of the 5th International Conference on Music Information Retrieval, October 10-14, 2004, Barcelona, Spain : Proceedings pp. 262-265, 2004.

R. Pulkkinen, M. Salmenkivi, A. Leino and H. Mannila: What was the Finnish Hiisi?: applying computational methods to the study of folk religion. Temenos : studies in comparative religion 39-40 (2003-, 2004), pp. 209-233, 2004.

M. Raento: Context software: a prototype platform for contextual mobile applications. Proceedings of the proactive computing workshop PROW :, 2004. 25-26 November, 2004, Helsinki, Finland, pp. 103-111, 2004.

M. Raento: Kill your personal data dead. MobileHCI, 2004. 3 sivua, 2004.

M. Raento: Mobile communication and context dataset. Proceedings of the Benchmarks and a Database for Context Recognition workshop, 2004, 5 sivua, 2004.

J. Ravantti: Computational methods for reconstructing macromolecular complexes from cryo-electron microscopy images. PhD thesis, Helsingin yliopisto, 2004, ix, 97 pp. : kuv.

M. Salmenkivi: Evaluating attraction in spatial point patterns with an application in the feild of cultural history. IEEE International conference on data mining pp. 511-514, 2004.

J. K. Seppänen and H. Mannila: Dense itemsets. KDD-, 2004. : Proceedings 10th International Conference on Knowledge Discovery and Data Mining (KDD), 2004, August 22-25, 2004. Seattle, WA 6 sivua.

P. Sevon: Algorithms for association-based gene mapping. PhD thesis, Helsingin yliopisto, 2004, vi, 100 s, [73] liites. : kuv, 2004.R

I. Shunsuke and Kivioja, Teemu and V. Mäkinen: Finding missing patterns. Algorithms in bioinformatics pp. 463-474, 2004.

M. Sulkava, J. Tikka, and J. Hollmén: Sparse regression for analyzing the development of foliar nutrient concentrations in coniferous trees. In Saso Dzeroski, Bernard Zenko, and Marko Debeljak, editors, Proceedings of the Fourth International Workshop on Environmental Applications of Machine Learning (EAML, 2004), pages 57-58, 2004.

H. Tamm: On minimality and size reduction of one-tape and multitape finite automata. PhD thesis, Helsingin yliopisto, 2004, viii, 80 pp. : kuv.

H. Tamm and E. Ukkonen: Bideterministic automata and minimal representations of regular languages. Theoretical computer science 328 (2004) : 1-2, pp. 135-149.

J. Tikka and Jaakko Hollmén: Learning linear dependency trees from multivariate time-series data. In Proceedings of the Workshop on Temporal Data Mining: Algorithms, Theory and Applications (in conjunction with The Fourth IEEE International Conference on Data Mining), Brighton, U.K, 2004.

K. Vasko: Computational methods and models for paleoecology. PhD thesis, University of Helsinki, 2004, 182 pp. : kuv.

K. Vasko, H.T.T. Toivonen and A. Korhola: Segmentation of paleoecological spatio-temporal count data. The Fourth International Workshop on environmental applications of machine learning, EAML, 2004, Bled, Slovenia, September 27-October 1 pp. 61-62, 2004.

J. Vesanto and Jaakko Hollmén: An automated report generation tool for the data understanding phase. In Ajith Abraham, Lakhmi Jain, and Berend J. van der Zwaag, editors, Innovations in Intelligent Systems: Design, Management and Applications, volume 140 of Studies in Fuzziness and Soft Computing, chapter 5. Springer (Physica) Verlag, 2004.

H. Wikman, J. K. Seppänen, V. K. Sarhadi, E. Kettunen, K. Salmenkivi, E. Kuosma, K. Vainio-Siukola, B. Nagy, A. Karjalainen, T. Sioris, J. Salo, J. Hollmén, S. Knuutila, and S. Anttila. Caveolins as tumor markers in lung cancer detected by combined use of cDNA and tissue microarrays. Journal of Pathology, 203:584-593, 2004.

2003

S. Agrawal, S. Chaudhuri, G. Das, and A. Gionis: Automated Ranking of Database Query Results. In Proceedings (electronic) of the First Biennial Conference on Innovative Data Systems Research (CIDR 2003), Asilomar, CA, USA, 2003.

L. Aunimo, O. Heinonen, R. Kuuskoski, J. Makkonen, R. Petit, and O. Virtanen: Question Answering System for Incomplete and Noisy Data - Methods and Measures for its Evaluation. In Proceedings of 25th European Conference on Information Retrieval Research (ECIR 2003), April 2003, Pisa, Italy, Lecture Notes in Computer Science 2633, pp. 193-206, Springer 2003.

S. Burkhardt, and J. Kärkkäinen: Better filtering with gapped q-grams. Fundamenta Informaticae 56, 1-2(2003), pp. 51-70.

S. Burkhardt, and J. Kärkkäinen: Fast lightweight suffix array construction and checking. In proceedings of the 14th Symposium on Combinatorial Pattern Matching (CPM 2003), June 2003, Morelia, Mexico. Lecture Notes in Computer Science 2676, Springer, 2003, pp. 55-69.

A. Bykowski, J. Seppänen, and J. Hollmén: Model-independent bounding of the supports of Boolean formulae in binary data. In P. Lanzi, and R. Meo (eds.): Database technologies for data mining. Springer-Verlag, 2003. To appear.

T. Calders, and B. Goethals: Minimal k-free Representations of Frequent Sets. In Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'03). Lecture Notes in Artificial Intelligence, Volume 2838, Springer-Verlag, pp. 71-82. September 22-26, 2003, Cavtat, Croatia.

M. Datar, T. Feder, A. Gionis, R. Motwani, and R. Panigrahy: A combinatorial algorithm for MAX CSP. Information Processing Letters 85(6): 307-315 (2003).

A. Doucet, L. Aunimo, M. Lehtonen, and R. Petit: Accurate retrieval of XML document fragments using EXTIRP. To appear in the Proceedings of the Second Annual Workshop of the Initiative for the Evaluation of XML Retrieval (INEX 2003), December 15-17, 2003, Schloss Dagstuhl, Germany, 2003.

A. Doucet, and H. Ahonen-Myka: Naïve clustering of a large XML document collection. In Proceedings of the First Annual Workshop of the Initiative for the Evaluation of XML retrieval (INEX), Schloss Dagstuhl, Germany, 9-11 December 2002, pp. 81-87. European Research Consortium for Informatics and Mathematics (ERCIM) Workshop Proceedings, 2003.

T. Elomaa, and J. Rousu: Necessary and Sufficient Preprocessing in Numerical Range Discretization. Knowledge and Information Systems 5, 2 (2003), pp. 162-182.

T. Elomaa, and J. Rousu: On Decision Boundaries of Naive Bayes in Continuous Domains. Principles of Data mining and Knowledge Discovery, PKDD-2003, Lecture Notes in Computer Science 2838 (2003), 144-155.

L. Eronen, F. Geerts, and H.T.T. Toivonen: A Markov chain approach to reconstruction of long haplotypes. Pacific Symposium on Biocomputing (PSB2004), 104-115, Hawaii, USA, January 2004. World Scientific.

M. N. Garofalakis, A. Gionis, R. Rastogi, S. Seshadri, and K. Shim: XTRACT: Learning Document Type Descriptors from XML Document Collections. Data Mining and Knowledge Discovery 7(1): 23-56 (2003).

F. Geerts, B. Goethals, and T. Mielikäinen: What you store is what you get (extended abstract). In Jean-Francois Boulicaut and Saso Dzeroski (eds.): Proceedings of the 2nd International Workshop on Knowledge Discovery in Inductive Databases, pages 60-69. 2003.

F. Geerts: Expressing the box cone radius in the relational calculus with real polynomial constraints. Discrete and Computational Geometry 30, 4(2003), pp. 607-622.

F. Geerts, and B. Kuijpers: Deciding termination of query-evaluation in transitive closure logics for constraint databases. In Proceedings of the 9th International Conference on Database Theory (ICDT 2003), January 2003, Sienna, Italy, pp. 190-206.

A. Gionis, T. Kujala, H. Mannila: Fragments of orders. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), Washington, DC, USA, August 2003, pp. 129-136.

A. Gionis, and H. Mannila: Finding recurrent sources in sequences. In the 7th Annual International Conference on Research in Computational Molecular Biology - RECOMB 2003. In: W. Miller, M. Vingron, S. Istrail, P. Pevzner, and M. Waterman (eds.); pp 123-130.

B. Goethals, and M.J. Zaki: Advances in Frequent Itemset Mining Implementations, FIMI03. In Proceedings of the FIMI'03 Workshop on Frequent Itemset Mining Implementations. November 19, 2003, Melbourne, Florida, USA.

G. Grahne, R. Hakli, M. Nykänen, H. Tamm, and E. Ukkonen: Design and implementation of a string database query language. Information Systems 28 (2003), pp. 347-369.

D. Gunopulos, R. Khardon, H. Mannila, S. Saluja, H. Toivonen, and R.S. Sharma: Discovering all most specific sentences. ACM Transactions on Database Systems 28 (2): 140-174, 2003.

J. Heino and H.T.T. Toivonen: Automated Detection of Epidemics from the Usage Logs of a Physicians' Reference Database. 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-2003), 180-191, Cavtat-Dubrovnik, Croatia, September 2003. Springer.

J. Hollmen, J. Seppänen, and H. Mannila: Mixture models and frequent sets: combining global and local methods for 0-1 data. In D. Barbara, and C. Kamath (eds.): Proceedings of the Third SIAM International Conference on Data Mining, pages 289-293. Society of Industrial and Applied mathematics, 2003.

M. Koivisto, M. Perola, T. Varilo, W. Hennah, J. Ekelund, M. Lukk, L. Peltonen, E. Ukkonen, and H. Mannila: An MDL method for finding haplotype blocks and for estimating the strength of haplotype block boundaries. In Pacific Symposium on Biocomputing 2003 (PSB'03), R.B. Altman, A.K. Dukner, L. Hunter, T.A. Jung, and T.E. Klein, eds., World Scientific 2002, pp. 502-513.

J. Kärkkäinen, G. Navarro, and E. Ukkonen: Approximate String Matching over Ziv-Lempel Compressed Text. Journal of Discrete Algorithms 1, 3/4(2003), pp. 313-338.

J. Kärkkäinen, and P. Sanders: Simple linear work suffix array construction. In Proceecings of the 30th International Colloquium on Automata, Languages and Programming (ICALP 2003), June-July 2003, Eindhoven, The Netherlands. Lecture Notes in Computer Science 2719, Springer, 2003, pp. 943-955.

J. Kärkkäinen, and S. S. Rao: Full-Text Indexes in External Memory. Chapter 7 in U. Meyer, P. Sanders, J. Sibeyn (eds.), Algorithms for Memory Hierarchies. Lecture Notes in Computer Science 2625, Springer 2003, pp. 149-170.

M. Kääriäinen, R. Nock, and T. Elomaa: Reduced Error Pruning of Branching Programs Cannot Be Approximated to within a Logarithmic Factor, Information Processing Letters 87, 2 (2003), pp. 73-78.

M. Kääriäinen, and T. Elomaa: Rademacher penalization over decision tree prunings. In N. Lavrac, D. Gamberger, H. Blockeel and L. Todorovski (eds.), Machine Learning: ECML 2003, Proc. 14th European Conf. (pp. 193-204). LNAI 2837. Springer, 2003.

A. Leino, H. Mannila, and R-L Pitkänen: Rule discovery and probabilistic modeling for onomastic data. In Knowledge discovery in databases: the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD) 2003, pp. 291-302. Springer. ISBN 3-540-20085-1.

K. Lemström, V. Mäkinen, A. Pienimäki, M. Turkia, and E. Ukkonen: The C-BRAHMS project. In Proceedings of the 4th International Conference on Music Information Retrieval (ISMIR 2003), October 2003, Baltimore, Maryland, USA, pp. 237-238.

K. Lemström, and G. Navarro: Flexible and efficient bit-parallel techniques for transposition invariant approximate matching in music retrieval. In Proceedings of 10th International Symposium on String Processing and Information Retrieval (SPIRE'2003) LNCS 2857, October 2003, Manaus, Brazil, pp. 224-237.

K. Lemström, and J. Tarhio: Transposition Invariant Pattern Matching for Multi-Track Strings. Nordic Journal of Computing, 10, 3(2003), pp. 185-205.

K. Lemström, and L. Hella: Approximate Pattern Matching and Transitive Closure Logics. Theoretical Computer Science, 299 1-3(2003), pp. 387-412.

K. Lemström, and V. Mäkinen: On Finding Minimum Splitting of Pattern in Multi-Track String Matching. In Proceedings of 14th Annual Symposium on Combinatorial Pattern Matching (CPM 2003), Springer-Verlag LNCS 2676, pp. 237-253, Morelia, Mexico, June, 2003.

J.E. Litton, J. Muilu, A. Bjorklund, A. Leinonen, and N.L. Pedersen: Data modeling and data communication in GenomEUtwin. Twin Res. 2003 October 6(5): 383-90.

J. Makkonen, H. Ahonen-Myka: Utilizing Temporal Expressions in Topic Detection and Tracking. In Proceedings of 7th European Conference on Research and Advanced Technology for Digital Libraries (ECDL03), August 2003, Trondheim, Norway, pp. 393-404.

J. Makkonen, H. Ahonen-Myka, and M. Salmenkivi: Topic Detection and Tracking with Spatio-temporal Evidence. In Proceedings of the 25th European Conference on Information Retrieval Research (ECIR 2003), April 2003, Pisa, Italy, Lecture Notes in Computer Science 2633, pp. 251-256, Springer 2003.

J. Makkonen: Investigations on Event Evolution in TDT. In Proceedings of HLT-NAACL 2003 Student Workshop, May 2003, Edmonton, Canada, pp. 43-48.

H. Mannila, and M. Salmenkivi: Intensity Modeling of genome Data. To appear in: J. Wang, D. Shasha, H.T.T. Toivonen, and M. Zaki (eds.): Data Mining in Bioinformatics, Springer-Verlag, London, 2003.

T. Mielikäinen: Chaining patterns. In Gunter Grieser, Yuzuru Tanaka, and Akihiro Yamamoto (eds.): Discovery Science - 6th International Conference, DS 2003, Sapporo, Japan, October 17-19, 2003, Proceedings, Volume 2843 of Lecture Notes in Artificial Intelligence, pp. 233-244. Springer, 2003.

T. Mielikäinen: Change profiles. In Xindong Wu and Alex Tuzhilin (eds.): Proceedings of the 2003 IEEE International Conference on Data Mining (ICDM 2003), November 19-22, 2003, Melbourne, Florida, pp. 219-226, USA. IEEE Computer Society, 2003.

T. Mielikäinen: Finding all occurring sets of interest. In Jean-Francois Boulicaut and Saso Dzeroski (eds.): Proceedings of the 2nd International Workshop on Knowledge Discovery in Inductive Databases, pp. 97-106. 2003.

T. Mielikäinen: Frequency-based views to pattern collections. In Peter L. Hammer (eds.): Proceedings of the IFIP/SIAM Workshop on Discrete Mathematics and Data Mining, SIAM International Conference on Data Mining (2003), May 1-3, 2003, San Francisco, CA, USA. SIAM, 2003.

T. Mielikäinen: Intersecting data to closed sets with constraints. In Bart Goethals and Mohammed J. Zaki (eds.): Proceedings of the FIMI'03 Workshop on Frequent Itemset Mining Implementations, Melbourne, Florida, USA, November 19, 2003. Volume 90 of CEUR Workshop Proceedings, ISSN 1613-0073, online CEUR-WS.org/Vol-90/, 2003.

T. Mielikäinen: On inverse frequent set mining. In Wenliang Du and Chris Clifton (eds.): Proceedings of the 2nd Workshop on Privacy Preserving Data Mining (PPDM), pp. 18-23. IEEE Computer Society, 2003.

T. Mielikäinen, and H. Mannila: The pattern ordering problem. In Nada Lavrac, Dragan Gamberger, Ljupco Todorovski, and Hendrik Blockeel (eds.): Knowledge Discovery in Databases: PKDD 2003 - 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Cavtat-Dubrovnik, Croatia, September 22-26, 2003, Proceedings, Volume 2838 of Lecture Notes in Artificial Intelligence, pp. 327-338. Springer, 2003.

V. Mäkinen: Compact Suffix Array - A Space-Efficient Full-Text Index. Fundamenta Infomaticae, Special Issue - Computing Patterns in Strings, 56(1-2): 191-210, 2003.

V. Mäkinen, G. Navarro, and E. Ukkonen: Algorithms for Transposition Invariant String Matching. In Proceedings of 20th International Symposium on Theoretical Aspects of Computer Science (STACS 2003), Springer-Verlag LNCS 2607, pp. 191-202, Berlin, February, 2003.

V. Mäkinen, G. Navarro, and E. Ukkonen: Approximate Matching of Run-length Compressed Strings. Algorithmica 35: 4(347-369), 2003.

V. Mäkinen, G. Navarro, and E. Ukkonen: Matching Numeric Strings under Noise. In Proceedings of Prague Stringology Conference (PSC'03), Czech Technical University, Prague, September, 2003, pp. 99-110.

D. Pavlov, H. Mannila, and P. Smyth: Beyond independence: probabilistic methods for query approximation on binary transaction data. IEEE Transactions on Data and Knowledge Engineering 15 (6): 1409-1421, 2003.

J. Rousu, A. Rantanen, H. Maaheimo, E. Pitkanen, K. Saarela, and E. Ukkonen: A method for estimating metabolic fluxes from incomplete isotopomer information. International Workshop on Computational methods in Systems Biology, Lecture Notes in Computer Science 2602 (2003), pp. 88-103.

J. Rousu, L. Flander, M. Suutarinen, K. Autio, A. Rantanen, and P. Kontkanen: Novel Computational Tools in Bakery Process Data Analysis: a Comparative Study. Journal of Food Engineering 57, 1 (2003), pp. 45-56.

Th. Schlitt, K. Palin, J. Rung, S. Diekmann, M. Lappe, E. Ukkonen, and A. Brazma: From gene networks to gene function. Genome Research 13 (2003), pp. 2568-2576.

J. Seppanen, E. Bingham, and H. Mannila: A simple algorithm for topic identification in 0-1 data. In 7th European Conference on Principles and Practice of Knowledge discovery in Databases (PKDD'03), Dubrovnik, Croatia, September 2003.

P. Sevon, H. Toivonen, and P. Onkamo: Gene Mapping by Pattern Discovery. To appear in J. Wang et al (eds.), Data Mining in Bioinformatics. Springer.

M. Sulkava, and J. Hollmén: Finding profiles of forest nutrition by clustering of the self-organizing map. In Proceedings of the Workshop on Self-organizing Maps (WSOM'03), pages 243-248, Hibikino, Kitakyushu, Japan, September 2003.

H. Tamm, and E. Ukkonen: Bideterministic Automata and Minimal Representations of Regular Languages. In Proceedings of 8th International Conference on Implementation and Application of Automata (CIAA 2003), July 16-18, 2003, Santa Barbara, CA, USA, pp. 61-71.

T.A. Thanaraj, F. Clark, and J. Muilu: Conservation of human alternative splice events in mouse. Nucleic Acids Res. 2003 May 15; 31(10): 2544-52.

T.A. Thanaraj, S. Stamm, F. Clark, J.J. Riethoven, V. Le Texier, and J. Muilu: ASD: the Alternative Splicing Database. Nucleic Acids Res. 2004 Jan 1;32(1): D64-9.

H.T.T. Toivonen, A. Srinivasan, R.D. King, S. Kramer, and C. Helma: Statistical evaluation of the predictive toxicology challenge 2000-2001. Bioinformatics 19 (10): 1183 - 1193, 2003.

H.T.T. Toivonen, P. Onkamo, P. Hintsanen, E. Terzi, and P. Sevon. Data mining for gene mapping. To appear in J. Zurada and M. Kantardzic (eds.), New Generation of Data Mining Applications. IEEE Press.

E. Ukkonen, K. Lemström, and V. Mäkinen: Geometric Algorithms for Transposition Invariant Content-Based Music Retrieval. In Proceedings of 4th International Conference on Music Information Retrieval (ISMIR 2003), pp. 193-199, Baltimore, Maryland, USA, October, 2003.

E. Ukkonen, K. Lemström, and V. Mäkinen: Sweepline the Music! In Computer Science in Perspective (LNCS 2598), R. Klein, H.-W. Six, L. Wegner (eds.), 2003, pp. 330-342.

A. Vakali, E. Terzi, E. Bertino, and A.K. Elmagarmid: Hierarchical data placement for navigational multimedia applications. Data Knowledge Eng. 44(1): 49-80 (2003).

J. Vesanto, M. Sulkava, and J. Hollmén: On the decomposition of the self-organizing map distortion measure. In Proceedings of the Workshop on Self-Organizing Maps (WSOM'03), pages 11-16, Hibikino, Kitakyushu, Japan, September 2003.19.I. Autio, and T. Elomaa: Flexible view recognition for indoor navigation based on Gabor filters and support vector machines. Pattern Recognition vol. 36, issue 12 (2003), pp. 2769-2779.

J. Vesanto, and J. Hollmen: An automated report generation tool for the data understanding phase. In A. Abraham, and L. Jain (eds.): Innovations in Intelligent Systems: Design, Management and Applications, Studies in Fuzziness and Soft Computing, chapter 5. Springer (Physica) Verlag, 2003.

M. Zaki, J.T.L. Wang, and H.T.T. Toivonen: BIOKDD 2002: Recent Advances in Data Mining for Bioinformatics. SIGKDD Explorations 4(2): 112-114, 2003.

2002

H. Ahonen-Myka: Discovery of frequent word sequences in text. The ESF Exploratory Workshop on Pattern Detection and Discovery in Data Mining, Imperial College, London, 16-19 September, 2002. Lecture Notes in Artificial Intelligence 2447, Springer, 2002.

E. Bingham, H. Mannila, and J. K. Seppänen: Topics in 0-1 data. In D. Hand, D. Keim, and R. Ng, eds, Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002), Edmonton, July 2002.

A. Bykowski, J. K. Seppänen, and J. Hollmén: Model-independent bounding of the supports of Boolean formulae in binary data. In M. Klemettinen, R. Meo, F. Giannotti and L. De Raedt, editors, Knowledge Discovery in Inductive Databases (KDID'02), First International Workshop, University of Helsinki Department of Computer Science Series of Publications B, Report B-2002-7, pages 20-31, 2002.

L. De Raedt, M. Jaeger, S. D. Lee, and H. Mannila: A Theory of Inductive Query Answering. In Proceedings of ICDM'02. To appear.

A. Doucet: Améliorer les descripteurs de documents semi-structurés en utilisant les informations contextuelles. INFORSID 2002, Nantes, France, June 4-7, 2002, p. 401-402. ISBN: 2-906855-18-9.

A. Doucet: Extracting More Relevant Document Descriptors using Hierarchical Information. To appear in Proceedings of XML Finland 2002, October 21-22, 2002, Helsinki.

I. Días, and J. Hollmén: Residuals generation and visualization for understanding novel process conditions. In Proceedings of the IEEE 2002 International Joint conference on Neural Networks (IJCNN'02), volume 3, pages 2070-2075. IEEE Press, 2002.

T. Elomaa: Partition-refining algorithms for learning finite state automata. In M.-S. Hacid, Z. W. Ras, D. A. Zighed, and Y. Kodratoff (eds.), Foundations of Intelligent Systems, Proc. Thirteenth International Symposium, ISMIS'02 (Lyon, France). Lecture Notes in Artificial Intelligence 2366. Springer-Verlag, Berlin Heidelberg, 2002, pp. 232-243.

T. Elomaa, and M. Kääriäinen: Progressive Rademacher sampling. Proc. Eighteenth National Conference on Artificial Intelligence, AAAI-2002 (Edmonton, Canada). AAAI Press, Menlo Park, CA and MIT Press, Cambridge, MA, 2002, pp. 140-145.

T. Elomaa, and M. Kääriäinen: The difficulty of reduced error pruning of leveled branching programs. Proc. Seventh International Symposium on Artificial Intelligence and Mathematics, AMAI 2002 (Fort Lauderdale, FL). In press.

T. Elomaa, and J. Lindgren: Experiments with projection learning. Discovery Science, Proc. Fifth International Conference, DS '02 (Lübeck, Germany). Lecture Notes in Artificial Intelligence 2534. Springer-Verlag, Berlin Heidelberg, 2002. To appear.

T. Elomaa, and J. Rousu: Efficient multisplitting revisited: Optima-preserving elimination of partition candidates. Data Mining and Knowledge Discovery. To appear.

T. Elomaa, and J. Rousu: Fast minimum error discretization. In C. Sammut and A. Hoffmann (eds.), Proc. Nineteenth International Conference on Machine Learning, ICML'02 (Sydney, Australia). Morgan Kaufmann, San Francisco, CA, 2002, pp. 131-138.

T. Elomaa, and J. Rousu: Linear-time preprocessing in optimal numerical range partitioning. Journal of Intelligent Information Systems 18, 1: 55-70, 2002.

T. Elomaa, and J. Rousu:  Necessary and Sufficient Preprocessing in Numerical Range Discretization. Knowledge and Information Systems. To appear.

M. Fluch, G. Lindén, and A. Popescu: A journalist's tool for writing and retrieving news stories. To appear in Proceedings of XML Finland 2002, October 21-22, 2002, Helsinki.

K. Fredriksson: Faster string matching with super-alphabets. In Proceedings of SPIRE'2002, Lecture Notes in Computer Science 2476, pages 44-57, Springer Verlag, Berlin, 2002.

K. Fredriksson, G. Navarro, and E. Ukkonen: Optimal Exact and Fast Approximate Two Dimensional Pattern Matching Allowing Rotations. In Proceedings of CPM'2002, Lecture Notes in Computer Science 2373, pages 235-248, Springer-Verlag, Berlin, 2002.

G. Grahne, R. Hakli, M. Nykänen, H. Tamm, and E. Ukkonen: Design and implementation of a string database query language. Information Systems. In press.

J. Han, R.B. Altman, V. Kumar, H. Mannila, and D. Pregibon: Emerging Scientific Applications in Data Mining. Communications of the ACM 45, 8: 54-58, August 2002.

C. Iliopoulos, K. Lemström, M. Niyad, and Y. Pinzon: Evolution of Musical Motifs in Polyphonic Passages. In Proc: AISB'2002; Symposium on AI and Creativity in Arts and Science, pp. 67-75, London, United Kingdom, April 2-5, 2002.

T. Kivioja, M. Arvas, K. Kataja, M. Penttilä, H. Söderlund, and E. Ukkonen: Assigning probes into a small number of pools separable by electrophoresis. Bioinformatics 18 Suppl. 1 (ISMB 2002 special issue): 199-206, 2002.

A. Korhola, K. Vasko, H.T.T. Toivonen, and H. Olander: Holocene temperature changes in northern Fennoscandia reconstructed from chironomids using Bayesian modeling. Quaternary Science Reviews 21(16-17): 1841 - 1860, 2002.

J. Kärkkäinen, and S. Burkhardt: One-gapped q-gram filters of Levenshtein distance. Proc. CPM 2002, LNCS 2373, pp. 225-234, Springer-Verlag 2002.

M. Lehtonen, R. Petit, O. Heinonen, and G. Lindén: A Dynamic User Interface for Document Assembly. To appear in the proceedings of the ACM Document Engineering (DocEng) '02, November 8-9, 2002, McLean, Virginia, USA.

K. Lemström: Content-Based Retrieval of Symbolic Music. To appear in Proc: FSKD'02; 1st International Conference on Fuzzy Systems and Knowledge Discovery, Singapore, November 18-22, 2002.

K. Lemström: Polyfonisen musiikin haku sisällön perusteella (Content-bases rettrieval of polyphonic music). In: Tietojenkäsittelytiede, (17), 48-65, 2002.

K. Lemström, and L. Hella: Approximate Pattern Matching and Transitive Closure Logics. Theoretical Computer Science. In press.

C.K. Leung, R. Ng, and H. Mannila: OSSM: A Segmentation Approach to Optimize Frequency Counting. ICDE 2002.

J. Makkonen, H. Ahonen-Myka, and M. Salmenkivi: Applying Semantic Classes in Event Detection and Tracking. Accepted for The International Conference on Natural Language Processing (ICON-2002), December 18-21, 2002, Mumbai, India.

H. Mannila: Global and local methods in data mining: basic techniques and open problems. In ICALP 2002, 29th International Colloquium on Automata, Languages, and Programming, Malaga, Spain, July 2002; Springer-Verlag.

H. Mannila, A. Patrikainen, J. K. Seppänen, and J. Kere: Long-range control of expression in yeast. Bioinformatics 18: 482-483, 2002.

D. Meredith, K. Lemström, and G. Wiggins: Algorithms for Discovering Repeated Patterns in Multidimensional Representations of Polyphonic Music. Journal of New Music Research. In press.

V. Mäkinen, and E. Ukkonen: Local Similarity Based Point-Pattern Matching, In Proc. 13th Annual Symposium on Combinatorial Pattern Matchin (CPM 2002), Springer-Verlag LNCS VOL. 2373, pp. 115-132, Fukuoka, Japan, July 2002.

T. Niini, K. Vettenranta, J. Hollmén, M. L. Larramendy, Y. Aalto, H. Wikman, B. Nagy, J. K. Seppänen, A. Ferrer Salvador, H. Mannila, U. M. Saarinen-Pihkala, and S. Knuutila: Expression of myeloid-specific genes in childhood acute lymphoblastic leukemia -- a cDNA array study. Leukemia 16 (11): 2213-2221, 2002.

R.B. O'Hara, E. Arjas, H.T.T. Toivonen, and I. Hanski: Bayesian analysis of meta-population data. Ecology 83 (9): 2408-2415, 2002.

P. Onkamo, V. Ollikainen, P. Sevon, H.T.T. Toivonen, H. Mannila, and J. Kere: Association analysis for quantitative traits by data mining: QHPM. The Annals of Human Genetics. In press.

K. Palin, E. Ukkonen, A. Brazma, and J. Vilo: Correlating gene promoters and expression in gene disruption experiments. Bioinformatics 18, Supplement 2 (ECCB 2002 Proceedings): 172-180, 2002.

D. Pavlov, H. Mannila, and P. Smyth: Beyond independence: probabilistic methods for query approximation on binary transaction data. IEEE Transactions on Data and Knowledge Engineering. To appear.

A. Pienimäki: Indexing Music Databases Using Automatic Extraction of Frequent Phrases. In Third International conference on Music Information Retrieval (ISMIR 2002), Paris, France, October 13-17, 2002, pp. 25-30.

J. Rousu, L. Flander, J. Suutarinen, K. Autio, A. Rantanen, and  P. Kontkanen: Novel Computational Tools in Bakery Process Data Analysis: a Comparative Study. Journal of Food Engineering. In press.

J. Rousu, A. Rantanen, R. Ketola, J. Kokkonen, and V. Tarkiainen: Computing Positional Isotopomer Distributions from Tandem Mass Spectrometric Data. Metabolic Engineering. To appear.

S. Ruosaari, and J. Hollmén: Image analysis for classifying faulty spots from microarray images. In Proceedings of the 5th International Conference on Discovery Science, Lecture Notes in Artificial Intelligence. Springer, 2002.

M. Salmenkivi, J. Kere, and H. Mannila: Genome Segmentation using Piecewise Constant Intensity Models and Reversible Jump MCMC. Bioinformatics 18, Supplement 2 (ECCB 2002 Proceedings): 211-218, 2002.

M. Salmenkivi, J. Makkonen, and H. Ahonen-Myka: Topic detection and tracking based on extracting words with meaning of the same type. Accepted for Suomen Tekoälypäivät (STeP'02), Finnish AI Conference, December 16-17, 2002, Oulu.

P. Salo, and S. Huhmarniemi: Syntactic Linking: A Computational Implementation of a minimalistic Grammar. Computational Linguistics. To appear.

H.T.T. Toivonen, A. Srinivasan, R.D. King, S. Kramer, and C. Helma: Statistical evaluation of the predictive Toxicology Challenge 2000-2001. Bioinformatics. Accepted for publication.

E. Ukkonen: Finding founder sequences from a set of recombinants. In: Algorithms in Bioinformatics (WABI-2002), Lect. Notes in Computer Science 2452, pp. 277-286, Springer-Verlag 2002.

K. Vasko, and H.T.T. Toivonen: Estimating the number of segments in time series data using permutation tests. To appear in Proceedings of IEEE International Conference on Data Mining 2002 (ICDM'02).

J. Vesanto, and J. Hollmén: An automated report generation tool for the data understanding phase. In A. Abraham and M. Koeppen, editors, Hybrid Information Systems, pages 611-626. Physica-Verlag (Springer), Heidelberg, 2002. Proceedings of the First International Workshop on Hybrid Intelligent Systems (HIS'01).

J. Vesanto, and J. Hollmén: Recent Advances in Intelligent Paradigms, chapter An Automated Report Generation Tool for the Data Understanding Phase. Studies in Fuzziness and Soft Computing. Physica (Springer) Verlag, 2002.

G. Wiggins, K. Lemström, and D. Meredith: SIA(M)ESE: An Algorithm for Transposition Invariant, Polyphonic Content-Based Music Retrieval. In Proc: ISMIR'02; Third International Conference on Music Information Retrieval, pp. 283-284, Paris, France, October 13-17, 2002.

H. Wikman, E. Kettunen, J. K. Seppänen, A. Karjalainen, J. Hollmén, S. Anttila, and S. Knuutila: Identification of differentially expressed genes in pulmonary adenocarcinoma by using a cDNA array. Oncogene 21 (37): 5804-5813, 2002.

N. Woolley, P. Holopainen, V. Ollikainen, K. Mustalahti, M. Mäki, J. Kere, and J. Partanen: A new locus for coeliac disease mapped to chromosome 15 in a population isolate. Human Genetics, 111: 40-45, 2002.

M.J. Zaki, J.T.L. Wang, and H.T.T.Toivonen: BIOKDD01: Workshop on Data Mining in Bioinformatics. SIGKDD Explorations 3 (2): 71 - 73, January 2002.

2001

R. Aarts and J. Rousu. An integrated approach to bioprocess recipe design. Integrated Computer-Aided Engineering 8, 4 (2001), 363-373.

I. Autio, T. Elomaa, and T. Kurppa. Support vector learning of landmarks for a mobile robot. In H.R. Arabnia (ed.), Proc. 2001 International Conference on Artificial Intelligence, IC-AI'2001 (pp. 151-157). CSREA Press, 2001.

I. Autio, T. Elomaa, and T. Kurppa. Robot landmark learning with SVMs. In H. H. Lund, B Mayoh and J. Perram (eds.), Proc. 7th SCAI (pp. 157-158). IOS Press, 2001.

E. Bingham and H. Mannila. Random projection in dimensionality reduction: applications to image and text data. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2001), F. Provost and R. Srikant (eds.), 245-250.

I. Cadez, P. Smyth and H. Mannila. Probabilistic Modeling of Transaction Data with Applications to Profiling, Visualization, and Prediction. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2001), F. Provost and R. Srikant (eds.), 37-46.

L. Dehaspe and H.T.T. Toivonen. Discovery of relational association rules. In N. Lavrac and S. Dzeroski, editors, Relational Data Mining, 189 - 212. Springer-Verlag, 2001.

T. Elomaa and M. Kääriäinen. An analysis of reduced error pruning. Journal of Artificial Intelligence Research15 (Sept. 2001) 163-187.

T. Elomaa and M. Kääriäinen. On the practice of branching program boosting. In L. De Raedt and P. Flach (eds.), Machine Learning: ECML 2001, Proc. Twelfth European Conference (pp. 133-144). Lecture Notes in Artificial Intelligence 2167. Springer , 2001. (With T. Elomaa.)

T. Elomaa and J. Rousu. Linear-time preprocessing in optimal numerical range partitioning. Journal of Intelligent Information Systems 18, 1 (Jan. 2002) 55-70.

T. Elomaa and J. Rousu. On the computational complexity of optimal multisplitting. Fundamenta Informaticae 47, 1-2 (Aug./Sept. 2001) 35-52.

T. Elomaa and J. Rousu. Preprocessing opportunities in optimal numerical range partitioning. In Proc. 1st IEEE International Conference on Data Mining, ICDM '01(to appear). IEEE Computer Society Press, 2001.

K. Fredriksson. Rotation Invariant Template Matching. PhD Thesis, Report A-2001-3, Department of Computer Science, University of Helsinki, 2001.

K. Fredriksson, G. Navarro and E. Ukkonen. Faster than FFT: rotation invariant combinatorial template matching. To appear in the book Recent Research Developments in Pattern Recognition.

K. Fredriksson and E. Ukkonen: Faster template matching without FFT. In Proceedings of ICIP'2001, IEEE CS Press, 678-681.

D. Hand, H. Mannila and P. Smyth. Principles of Data Mining. MIT Press 2001. ISBN 1-57735-027-8.

J. Himberg, K. Korpiaho, H. Mannila, J. Tikanmäki, and H.T.T. Toivonen. Time series segmentation for context recognition in mobile devices. In The 2001 IEEE International Conference on Data Mining (ICDM'01), San Jose, California, November-December 2001. To appear.

Paula Kauppi, Kerstin Lindblad-Toh, Petteri Sevon, Hannu T. T. Toivonen, John D. Rioux, Anu Villapakkam, Lauri A. Laitinen, Thomas J. Hudson, Juha Kere, and Tarja Laitinen. A second-generation association study of the 5q31 cytokine gene cluster and the interleukin-4 receptor in asthma. Genomics 77 (1-2). 35 - 42, September 2001.

M. Koivisto and H. Mannila. Offspring risk and sibling risk for multilocus traits. Human Heredity 2001, 5 (4),209-216.

A. Korhola, K. Vasko, H.T.T. Toivonen H. Olander. Holocene temperature changes in northern Fennoscandia reconstructed from chironomids using Bayesian modeling. Quaternary Science Review, 2001 (to appear).

T. Laitinen, V. Ollikainen, C. Lazaro, P. Kauppi, R de Cid, J.M. Anto, X. Estivill, H. Lokki, H. Mannila, L.A. Laitinen and J. Kere. Association study of the chromosomal region containing the FCER2 gene suggests regulatory role for serum immunoglobulin E levels. American Journal on Respiratory and Critical Care Medicine 161, 700-706, 2000.

K. Lemström, G. A. Wiggins and D. Meredith. A Three-Layer Approach for Music Retrieval in Large Databases. In Proc. ISMIR 2001 2nd Annual International Symposium on Music Information Retrieval, pp. 13-14, Bloomington, Indiana, October 15 - 17, 2001.

J. Makkonen. News-feed categorization. To appear in Proc. FDPW01.

J. Makkonen and J. Piitulainen. Expanding document vectors in text categorization. In Proceedings of IR2001, Oulu, Finland, 2001, 53-60.

H. Mannila, A. Patrikainen, J. Seppänen, and J. Kere: Long-range control of expression in yeast. Bioinformatics, to appear

H. Mannila and D. Rusakov. Decomposing event sequences into independent components. First SIAM Conference on Data Mining, 2001.

H. Mannila and M. Salmenkivi. Finding simple intensity descriptions from event sequence data. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2001), F. Provost and R. Srikant (eds.), 341-346.

H. Mannila and J. Seppänen. Recognizing similar situations from event sequences. First SIAM Conference on Data Mining, 2001.

D. Meredith, G. A. Wiggins and K. Lemström. Pattern Induction and Matching in Polyphonic Music and Other Multi-Dimensional Datasets. In Proc. the 5th World Multi-Conference on Systemics, Cybernetics and Informatics (SCI2001), Volume X, pp. 61 - 66, Orlando, Florida, July 22 - 25, 2001.

T. Mielikäinen and E. Ukkonen. The Complexity of Maximum Matroid-Greedoid Intersection. In R. Freivalds (Ed.), Fundamentals of Computation Theory, volume 2138 of Lecture Notes in Computer Science, pages 535-539, Berlin-Heidelberg, 2001. Springer Verlag.

V. Mäkinen. Trade off Between Compression and Search Times in Compact Suffix Array. In Proceeding of the 3rd Workshop on Algorithm Engineering and Experiments (ALENEX 01). January 5-6, 2001, Washington, DC, In press.

V. Mäkinen. Using Edit Distance in Point-Pattern Matching. In Proc. 8th Workshop on String Processing and Information Retrieval (SPIRE 2001), Laguna De San Rafael, Chile, November 12-15, 2001.

V. Mäkinen, G. Navarro and E. Ukkonen: Approximate Matching of Run-length Compressed Strings. In Proc. 12th Annual Symposium on Combinatorial Pattern Matching (CPM 2001), Springer-Verlag LNCS VOL. 2089, pp. 31-49, Jerusalem, July 2001.

J. Mäntyjärvi, J. Himberg, P. Korpipää, and H. Mannila: Extracting the context of a mobile device user. IEEE Conference on Data Mining, to appear.

J. Pöllänen, J. Kronlöf and J. Rousu. A Neural Network Tool for Brewery Fermentations. Automation'2001 seminar days , SAS Julkaisusarja 24 (2001), 246-251.

J. Rousu. Efficient Range Partitioning in Classification Learning. PhD Thesis, Report A-2001-1, Department of Computer Science, University of Helsinki, 2001.

P. Sevon, V. Ollikainen, P. Onkamo, H.T.T. Toivonen, H. Mannila, and J. Kere. Mining associations between genetic markers, phenotypes and covariatesGenetic Epidemiology, 21, 2001. To appear.

M. Salmenkivi. Computational Methods for Intensity Models PhD Thesis, Report A-2001-2, Department of Computer Science, University of Helsinki, 2001.

P. Sevon, H.T.T. Toivonen, and V. Ollikainen. TreeDT: Gene mapping by tree disequilibrium test. In The Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001), 365 - 370, San Francisco, California, August 2001. ACM. (Extended version: Report C-2001-32, University of Helsinki, Department of Computer Science, June 2001.)

I. Shmulevich, O. Yli-Harja, E. Coyle, D.-J. Povel and K. Lemström. Perceptual Issues in Music Pattern Recognition - Complexity of Rhythm and Key Finding. Computers and the Humanities, 35 (1), 23-35, 2001.

H.T.T. Toivonen, H. Mannila, A. Korhola, and H. Olander. Applying Bayesian statistics to organism-based environmental reconstruction. Ecological Applications 11, 2, 618-630, 2001.

M. J. Zaki, H.T.T. Toivonen, and J. Wang, editors. BIOKDD01 Workshop on Data Mining in Bioinformatics. Rensselaer Polytechnic Institute, July 2001. RPI Technical Report 01-8.

2000

H. Ahonen-Myka, B. Heikkinen, O. Heinonen, and M. Klemettinen. Printing Structured Text without Stylesheets. In XML Scandinavia 2000, May 2-4, Gothenburg, Sweden, 2000.

G. Das and H. Mannila. Context-based similarity methods for categorical attributes. In Principles of Data Mining and Knowledge Discovery, 4th European Conference (PKDD 2000) D.A. Zighed et al. (eds.), p. 201-211.

T. Elomaa and J. Rousu. Generalizing Boundary Points. 17th Natl. Conf on Artificial Intelligence, AAAI-2000, AAAI Press, 2000, pp. 570-576.

T. Elomaa and J. Rousu. On the Complexity of Optimal Multisplitting. 12th International Symposium on Methodologies for Intelligent Systems, ISMIS-2000. Lecture Notes in Artificial Intelligence 1932 (2000), pp. 552-561.

T. Elomaa. Advances in classifier learning algorithms (Invited talk). In H. R. Arabnia (ed.), Proc. 2000 International Conference on Artificial Intelligence, IC-AI'2000 (Las Vegas, NV). CSREA Press, 2000, pp.103-109.

T. Elomaa and J. Rousu. On the splitting properties of common attribute evaluation functions. In H. Hyötyniemi (ed.), STeP 2000 - Millenium of Artificial Intelligence, Proc. Ninth Finnish Artificial Intelligence Conference, Vol. 3, `AI of Tomorrow': Symposium on Theory (Espoo, Finland). The Finnish Artificial Intelligence Society, Helsinki, 2000, pp. 69-76.

T. Elomaa and J. Rousu. Applications of evaluation function convexity in data mining algorithms. Arpakannus 1/2000 4-12.

T. Elomaa and J. Rousu. Uses of convexity in numerical domain partitioning. NeuroCOLT II Technical Report NC2-TR-2000-074. Department of Computer Science, Royal Holloway, University of London, May 2000. 11 pp.

T. Elomaa and J. Rousu. On the splitting properties of common attribute evaluation functions. Report C-2000-1. Department of Computer Science, University of Helsinki, Jan. 2000. 19 pp.

K. Fredriksson, G. Navarro and E. Ukkonen: An index for two dimensional string matching allowing rotations. In: J. van Leeuwen et al. (eds.) Theoretical Computer Science (IFIP TCS 2000), Lecture Notes in Computer Science 1872, pp. 59-75, Springer 2000.

K. Fredriksson and E. Ukkonen: Combinatorial methods for approximate pattern matching under rotations and translations in 3D arrays. In: Proc. 7th International Symposium on String Processing and Information Retrieval (SPIRE 2000), September 27-29, a Coruña, Spain, IEEE Computer Society 2000, pp. 96-104.

Fredriksson, Kimmo. Rotation invariant histogram filters for similarity and distance measures between digital images. Seventh International Symposium on String Processing and Information Retrieval, SPIRE 2000, September 27-29, a Coruña, Spain, IEEE Computer Society 2000, pp. 105-115.

Hakli, Raul, Nykänen, Matti, and Tamm, Hellis. Adding string processing capabilities to data management systems. Seventh International Symposium on String Processing and Information Retrieval, SPIRE 2000, September 27-29, a Coruña, Spain, IEEE Computer Society 2000, pp. 122-131.

M. Huttunen, E. Ukkonen and B. Vehviläinen: Using trainable computing networks in the control of a physical system. Preprints of the Second AMS Conference on Artificial Intelligence, pp. 60-64, American Meteorological Society 2000.

P. Kauppi, T. Laitinen, V. Ollikainen, H. Mannila, L.A. Laitinen, and J. Kere: The ILR9 region contribution in asthma is supported by genetic association in an isolated population. European Journal of Human Genetics 8, 788-792 (2000).

T. Kivioja, J. Ravantti, A. Verkhovsky, E. Ukkonen and D. Bamford. Local average intensity-based method for identifying spherical particles in electron micrographs. J. Structural Biology 131 (2000), 126-134.

Atte Korhola, H. Mannila, Hannu TT Toivonen, and Kari Vasko. Reconstructing past climate from organism-based fossil assemblages by applying Bayesian statistics.CSC News, pages 20-22, October 2000.

J. Kärkkäinen, G. Navarro, and E. Ukkonen. Approximate string matching over Ziv-Lempel compressed text. In Proc. 11th Annual Symposium on Combinatorial Pattern Matching, volume 1848 of Lecture Notes in Computer Science, pages 195-209. Springer, 2000.

T. Laitinen, V. Ollikainen, C. Lazaro, P. Kauppi, R de Cid, J.M. Anto, X. Estivill, H. Lokki, H. Mannila, L.A. Laitinen and J. Kere: Association study of the chromosomal region containing the FCER2 gene suggests it has a regulatory role in atopic disorders. American Journal on Respiratory and Critical Care Medicine 161, 700-706, 2000.

K. Lemström. String Matching Techniques for Music Retrieval, PhD thesis, A-2000-4, University of Helsinki, Department of Computer Science, November, 2000.

K. Lemström and J. Tarhio. Searching Monophonic Patterns within Polyphonic Sources. In Proc. Content-Based Multimedia Information Access (RIAO'2000), pp. 1261-1279 (vol 2), Paris, France, April 12-14, 2000.

K. Lemström and L. Hella. Approximate Pattern Matching is Expressible in Transitive Closure Logic. In: Proc. 15th annual Symposium on Login in Computer Science (LICS'2000), pp. 157-167, Santa Barbara, USA, June 26-29, 2000.

K. Lemström and P. Fränti. N-Candidate Methods for Location Invariant Dithering of Color Images. Image and Vision Computing, 18 (6-7), 493-500, 2000.

K. Lemström and E. Ukkonen. Including Interval Encoding into Edit Distance Based Music Comparison and Retrieval. In Proc. AISB'2000 Symposium on Creative & Cultural Aspects and Applications of AI & Cognitive Science, pp. 53-60, Birmingham, United Kingdom, April 17-20, 2000.

K. Lemström and S. Perttu. SEMEX - An Efficient Music Retrieval Prototype. In First International Symposium on Music Information Retrieval (ISMIR'2000), Plymouth, Massachusetts, October 23-25, 2000.

Kjell Lemström: In Search of a Lost Melody. Computer Assisted Music: Identification and Retrieval. Finnish Music Quarterly, (3-4), 40-45, 2000.

H. Mannila and C. Meek. Global partial orders from sequential data. In Sixth Annual Conference on Knowledge Discovery and Data Mining (KDD-2000), p. 161-168.

H. Mannila and P. Smyth: Approximate query answering using frequent sets and maximum entropy. International Conference on Data Engineering, p. 309 (2000).

H. Mannila: Theoretical frameworks for data mining. SIGKDD Explorations 1, 2 (January 2000), 30-32.

Veli Mäkinen: Compact Suffix Array, In Proc. 11th Annual Symposium on Combinatorial Pattern Matching (CPM 2000), Springer-Verlag LNCS VOL. 1848, pp. 305-319, Montréal, June 2000.

Sevon, Petteri. Using closed itemsets in association rule mining with taxonomies. Data mining and knowledge discovery: theory, tools, and technology II : 24-25 April 2000, Orlando, Florida s. 155-162.

Petteri Sevon, Vesa Ollikainen, Päivi Onkamo, Hannu TT Toivonen, H. Mannila, and Juha Kere. Mining the associations between genetic marker data and a phenotype including covariates. In Genetic Analysis Workshop 12 (GAW12), San Antonio, TX, October 2000.

H. Toivonen, P. Onkamo, K. Vasko, V. Ollikainen, P. Sevon, H. Mannila, M. Herr, and J. Kere. Data mining applied to linkage disequilibrium mapping. American Journal of Human Genetics 67(1): 133 - 145, July 2000.

H. Toivonen, P. Onkamo, K. Vasko, V. Ollikainen, P. Sevon, H. Mannila, M. Herr, and J. Kere. Data mining applied to linkage disequilibrium mapping. American Journal of Human Genetics, 67(1):133-145, July 2000.

Hannu TT Toivonen, Päivi Onkamo, Kari Vasko, Vesa Ollikainen, Petteri Sevon, H. Mannila, and Juha Kere. Gene mapping by haplotype pattern mining. In IEEE International Symposium on Bio-Informatics & Biomedical Engineering (BIBE 2000), pages 99-108, November 2000.

E. Ukkonen: Toward complete genome data mining in computational biology. In: Proc. 7th Scandinavian Workshop on Algorithm Theory (SWAT 2000), Lecture Notes in Computer Science 1851, pp. 20-21, Springer 2000.

K. Vasko, H. Toivonen, and A. Korhola. A Bayesian multinomial Gaussian response model for organism-based environmental reconstruction. Journal of Paleolimnology, 24:243-250, 2000.

J. Vilo, A. Brazma, I. Jonassen, A. Robinson and E. Ukkonen. Mining for putative regulatory elements in the yeast genome using gene expression data. In Proc. Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB-2000), pp. 384-394, AAAI Press 2000.

1999

H. Ahonen. Knowledge Discovery in Documents by Extracting Frequent Word Sequences. An invited article for the special issue of Library Trends on knowledge discovery in bibliographical databases, eds. J. Qin and M.J. Norton, 48(1), Summer 1999, 160-181.

H. Ahonen. Finding All Frequent Maximal Sequences in Text. Proceedings of the 16th International Conference on Machine Learning ICML-99 Workshop on Machine Learning in Text Data Analysis, eds. D. Mladenic and M. Grobelnik, pages 11-17, J. Stefan Institute, Ljubljana 1999.

H. Ahonen-Myka, B. Heikkinen, O. Heinonen, and Mika Klemettinen. New tools for a knowledge worker. In Proceedings XML Finland '99, September 23-24, 1999, Helsinki, Finland. p. 25-32, SGML Users' Group Finland, 1999.

H. Ahonen-Myka, Heinonen, Oskari, Klemettinen, Mika, and Verkamo, A. Inkeri. Finding co occurring text phrases by combining sequence and frequent set discovery. IJCAI -99 workshop : text mining: foundations, techniques and applications, pp. 1-9.

Y. Aumann, R. Feldman, O. Liphstat and H. Mannila: Borders: An Efficient Algorithm for Association Generation in Dynamic Databases. Journal of Intelligent Information Systems 12(1), 61-73 (1999).

J.-F. Boulicaut, M. Klemettinen and H. Mannila: Modeling KDD Processes within the Inductive Database Framework. Data Warehousing and Knowledge Discovery (DaWaK 1999), M.K. Mohania and A. M. Tjoa (eds), p. 293-302.

G. Grahne, M. Nykänen and E. Ukkonen. Reasoning about strings in databases. Journal of Computer and System Sciences 59 (1999), 116-162.

R. Hakli, M. Nykänen, H. Tamm and E. Ukkonen: Implementing a declarative string query language with string restructuring. Proc. Practical Aspects of Declarative Languages (PADL'99), Lecture Notes in Computer Science 1551, pp. 179-195, Springer 1999.

Helge G. Gyllenberg, Mats Gyllenberg, Timo Koski, Tatu Lund, H. Mannila and Christopher Meek: Singling out Ill-fit Items in a Classification. Application to the Taxonomy of Enterobacteriaceae Archives of Control Sciences 9 (1999) 97-105.

I. Hovatta, T. Varilo, J. Suvisaari, J. D. Terwilliger, V. Ollikainen, R. Arajärvi, H. Juvonen, M.-L. Kokko-Sahin, L. Väisänen, H. Mannila, J. Lönnqvist and L. Peltonen. A genomewide screen for schizophrenia genes in an isolated Finnish subpopulation, suggesting multiple susceptibility loci. American Journal of Human Genetics 65, 1114-1124, 1999.

Y. Huhtala, J. Kärkkäinen, P. Porkka and H. Toivonen. TANE: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 42(2) 100-111, 1999.

Y. Huhtala, J. Kärkkäinen, and H. Toivonen. Mining for similarities in aligned time series using wavelets. In Proc. Conference on Data Mining and Knowledge Discovery. Theory, Tools and Technology, pages 150-160. SPIE, 1999.

T. Elomaa. The biases of decision tree pruning strategies. In D. J. Hand, J. N. Kok, and M. R. Berthold (eds.), Advances in Intelligent Data Analysis, Proc. Third International Symposium, IDA-99 (Amsterdam, The Netherlands). Lecture Notes in Computer Science 1642. Springer-Verlag, Berlin Heidelberg, 1999, pp. 63-74.

T. Elomaa and J. Rousu. General and Efficient Multisplitting of Numerical Attributes. Machine Learning 36, 3 (1999), pp. 201-244.

T. Elomaa and J. Rousu. Speeding up the search for optimal partitions. In J. Zytkow & J. Rauch (eds.), Principles and Practise of Data Mining and Knowledge Discovery in Databases, Proc 3rd PKDD. Lecture Notes in Artificial Intelligence 1704 (1999), pp. 89-97.

K. Fredriksson and E. Ukkonen: Combinatorial methods for approximate image matching under translations and rotations. Pattern Recognition Letters 20 (1999), 1249-1258.

R. Khardon, H. Mannila and D. Roth. Reasoning with Examples: Propositional Formulae and Database Dependencies. Acta Informatica 36(4): 267-286 (1999).

M. Klemettinen, H. Mannila, and H. Toivonen: Exploration of interesting findings in TASA. Information and Software Technology 41, 9 (1999), 557-567.

M. Klemettinen, H. Mannila, and H. Toivonen. Rule discovery in telecommunication alarm data. Journal of Network and Systems Management 7, 4 (December 1999), 395-423.

M. Klemettinen, H. Mannila, and A. I. Verkamo, Association rule selection in a data mining environment. Proceedings of the Third European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'99), p. 372-377.

A. Korhola, J. Weckström, H. Seppä, H.J.B. Birks, S.M. Peglar, H. Toivonen, K. Vasko and H. Mannila:

Quantitative Holocene records from sedimentary remains of aquatic organisms and pollen in northern Fennoscandia. Terra Nostra 10, 49-53, 1999.

J. Kärkkäinen and E. Ukkonen. Two- and higher-dimensional pattern matching in optimal expected time. SIAM Journal on Computing, 29(2) 571-589, 1999.

J. Kärkkäinen. Repetition-based text indexes. University of Helsinki 1999, viii, 106 s.

Kjell Lemström, Pauli Laine and Sami Perttu: Using Relative Interval Slope in Music Information Retrieval. In Proc. 1999 International Computer Music Conference (ICMC '99), pp. 317-320. Beijing, China, October 23-29, 1999.

H. Mannila, D. Pavlov, and P. Smyth. Prediction with Local Patterns using Cross-Entropy. In Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (KDD 1999), p. 357-361.

H. Mannila and P. Moen: Similarity between Event Types in Sequences. Data Warehousing and Knowledge Discovery (DaWaK 1999), M.K. Mohania and A. M. Tjoa (eds), p. 271-280.

H. Mannila: Inductive Databases (Abstract). Inductive Logic Programming, Ninth International Workshop (ILP 1999), S. Dzeroski and P. A. Flach (eds.), p. 14.

M. Nykänen and E. Ukkonen: Finding paths with the right cost. In Proc. 16th Ann. Symposium on Theoretical Aspects of Computer Science (STACS'99), Lecture Notes in Computer Science 1563, pp. 345-355, Springer 1999.

Ravantti, Janne and Bamford, Dennis H. A data mining approach for analyzing density maps representing macromolecular structures. Journal of structural biology 125 (1999), pp. 216-222

J. Rousu. Adaptive Planning of Fermentation Recipes. In L. Yliniemi (ed.), Proc. TOOLMET'99 Symp. Tool environments and development methods for intelligent systems. 15 - 16 April 1999, University of Oulu, Oulun Yliopistopaino, 1999, pp. 118-122.

J. Rousu, T. Elomaa and R. Aarts. Predicting the speed of beer fermentation in laboratory and industrial scale. In J. Mira and J. Sanchez-Andrez (eds.), Engineering Applications of Bio-Inspired Artificial Neural Networks, Proc. 5th IWANN. Lecture Notes in Computer Science 1607 (1999), pp. 893-901.

Ilya Shmulevich, Olli Yli-Harja, Edward Coyle, Dirk-Jan Povel and Kjell Lemström: Perceptual Issues in Music Pattern Recognition - Complexity of Rhythm and Key Finding. In: Proc. AISB'99 Symposium on Musical Creativity, pp. 64-69. Edinburgh, United Kingdom, April 6-9, 1999.

H.T.T. Toivonen & H. Mannila & Salmenkivi, Marko & Laakso, Karri-Pekka. Specifying and simulating complex models using Bassist. Revista de la Real Academia de ciencias exactas, físicas y naturales (Esp.) 93 (1999), s. 375-380.

H. Toivonen, K. Vasko, H. Mannila, A. Korhola and H. Olander: Bayesian modeling in paleoenvironmental reconstruction. ACAI Workshop on Intelligent Techniques for Spatio-Temporal Data Analysis in Environmental Applications, p. 76-85, 1999.

H. Toivonen, H. Mannila, J. Seppänen, and K. Vasko. Bassist user's guide. Technical Report C-1999-36, Department of Computer Science, P.O. Box 26, FIN-00014 University of Helsinki, Finland, June 1999.

H.T.T. Toivonen. On knowledge discovery in graph-structured data. In Workshop on Knowledge Discovery from Advanced Databases, pages 26-31, Beijing, China, April 1999.

H.T.T. Toivonen. Challenges for knowledge discovery in databases. In Proceedings of the 1999 HeCSE Winter School, Report C-1999-1, page 1, Finland, January 1999. Department of Computer Science, University of Helsinki.

O. Yli-Harja, I. Shmulevich and K. Lemström. Graph-based Smoothing of Class Data with Applications in Musical Key Finding. In Proc. IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing, pp. 311-315. Antalya, Turkey, June 20-23, 1999.

1998

R. Aarts and J. Rousu. Model-based reasoning about cases (1998). In Multimodal Reasoning: Papers from the 1998 AAAI Spring Symposium, Eugene Freuder (ed.). AAAI Technical Report SS-98-04, AAAI press, pp. 6 – 9.

R. Aarts and J. Rousu. Case-based planning methods in biotechnical and food processes. Proc. Int. Symp. Automatic Control of food and biological processes, Göteborg, 21 - 23 Sept. 1998, Part One. SIK (1998), pp. 215-224.

H. Ahonen, O. Heinonen, M. Klemettinen, and A. I. Verkamo. Applying Data Mining Techniques for Descriptive Phrase Extraction in Digital Document Collections. Proceedings of the IEEE Forum on Research and Technology Advances in Digital Libraries (IEEE ADL '98), pages 2-11, IEEE Computer Society, 1998.

H. Ahonen, B. Heikkinen, O. Heinonen, J. Jaakkola, and M. Klemettinen. Analysis of Document Structures for Element Type Classification. In Proceedings of the International Workshop on Principles of Digital Document Processing (PODDP '98), March 29-30, St. Malo, France, pages 24-42, Lecture Notes in Computer Science 1481, Springer, 1998.

Helena Ahonen. Features of Knowledge Discovery Systems. InterChange, The Newsletter of the International SGML Users' Group, April 1998, 4(2), 15-16.

Helena Ahonen, Barbara Heikkinen, Oskari Heinonen, Jani Jaakkola, and Mika Klemettinen. Heterogeenisten dokumenttirakenteiden hallinta SGML-elementtien automaattisen luokittelun avulla (Management of heterogeneous document structures using automatic classification of SGML elements). In Proceedings SGML/XML Finland '98, October 8-9, Jyväskylä, Finland, p. 51-68, SGML Users' Group Finland, 1998.

Helena Ahonen. Tutkimuksen ja teollisuuden yhteistyö Saksassa (Cooperation between research and industry in Germany). Tietojenkäsittelytiede, the Journal of the Computer Science Society in Finland, Nr 2, p. 6, 1998.

H. Aizenstein, T. Hegedüs, L. Hellerstein, and L. Pitt. Complexity Theoretic Hardness Results for Query Learning. Computational Complexity 7 (1998) 19-53.

J.-F. Boulicaut, M. Klemettinen, and H. Mannila: Querying inductive databases: a case study on the MINE RULE operator. 2nd European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD'98), Nantes, France, September 23-26, 1998. p. 194-202.

A. Brazma, I. Jonassen, J. Vilo and E. Ukkonen. Predicting gene regulatory elements in silico on a genomic scale. Genome Research 8 (1998), 1202-1215.

A. Brazma, I. Jonassen, J. Vilo and E. Ukkonen: Pattern discovery in biosciences. In: Grammatical Inference: 4th International Colloquium (ICGI'98), Lecture Notes in Atrificial Intelligence 1433, pp. 257-270, Springer 1998.

G. Das, H. Mannila and P. Ronkainen: Similarity of attributes by external probes. Fourth Annual Conference on Knowledge Discovery and Data Mining (KDD-98), AAAI Press, p. 16-22.

G. Das, D. Lin, H. Mannila, G. Renganathan, and P. Smyth. Rule discovery from time series. Fourth Annual Conference on Knowledge Discovery and Data Mining (KDD-98), AAAI Press, p. 23-29.

Luc Dehaspe and H.T.T. Toivonen. Frequent query discovery: A unifying ILP approach to association rule mining. Technical Report CW-258, Department of Computer Science, Katholieke Universiteit Leuven, Belgium, March 1998.

Luc Dehaspe, H.T.T. Toivonen, and Ross D. King. Finding frequent substructures in chemical compounds. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD'98), pages 30-36, New York, NY, August 1998. AAAI Press.

M. Eerola, H. Mannila and M. Salmenkivi: Frailty factors and time-dependent hazards in modelling ear infections. COMPSTAT'98, Conference of the International Association for Statistical Computing, R. Payne and P. Green (eds.), Springer-Verlag 1998.

T. Elomaa and J. Rousu. Postponing the evaluation of attributes with a high number of boundary points. In M. Quafafou and J. M. Zytkow (eds.), Principles of Data Mining and Knowledge Discovery, Proc. Second European Symposium, PKDD '98 (Nantes, France). Lecture Notes in Artificial Intelligence 1510. Springer-Verlag, Berlin Heidelberg, 1998, pp. 221-229.

T. Elomaa. A review of How the Mind Works. AI Magazine 19, 3 (Fall 1998) 135-137.

T. Elomaa and J. Rousu. Boundary points as an indication of attribute relevance. In P. Koikkalainen and S. Puuronen (eds.), Human and Artificial Information Processing, Proc. STeP'98 - the Eighth Finnish Artificial Intelligence Conference (Jyväskylä, Finland). The Finnish Artificial Intelligence Society, Helsinki, 1998, pp. 21-30.

T. Elomaa and J. Rousu. Postponing the evaluation of attributes with a high number of boundary points. Report C-1998-11. Department of Computer Science, University of Helsinki, May 1998. 16 pp.

T. Elomaa and J. Rousu. General and efficient multisplitting of numerical attributes. Technical Note No. I.98.06. Institute for Systems, Informatics and Safety, Joint Research Centre, European Commission, Ispra, Italy, Jan. 1998. 46 pp.

K. Fredriksson and E. Ukkonen: A rotation invariant filter for two-dimensional string matching. In: M. Farach-Colton (ed.) Proc. Combinatorial Pattern Matching (CPM'98), Lecture Notes in Computer Science 1448, pp. 118-125, Springer 1998.

G. Grahne, R. Hakli, M. Nykänen and E. Ukkonen: AQL: An alignment based language for querying string databases. Proc. Ninth Int. Conf. on Management of Data (COMAD'98), pp. 235-251, McGraw-Hill 1998.

H. Haario, P. Vuorela, M. Nyman, E. Ukkonen, H. J. Vuorela and K. Outinen. Optimization of selectivity in high-performance liquid chromatography using desiderability functions and mixture designs according to PRISMA. European Journal of Pharmaceutical Sciences 6 (1998), 197-205.

B. Heikkinen, O. Heinonen, Jani Jaakkola, Pekka Kilpeläinen, Greger Lindén, Jyrki Niemi, Kimmo Paasiala. An assembly model for structured documents and the SAW assembly system. (In Finnish: Rakenteisten dokumenttien koostamismalli ja koostamisjärjestelmä SAW.) In Proc. of SGML/XML Finland 1998, eds. J. Löppönen and A. Haimi, pages 69-79, SGML Users' Group Finland, 1998. October 1998.

Heinonen, Oskari. Optimal multi-paragraph text segmentation by dynamic programming. COLING-ACL '98, 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, August 10-14, 1998, Université de Montréal, Montreal, Quebec, Canada : proceedings of the conference p. 1484-1486.

Ykä Huhtala, Juha Kärkkäinen, Pasi Porkka, and H.T.T. Toivonen. Efficient discovery of functional and approximate dependencies using partitions. In Proc. 14th International Conference on Data Engineering, pages 392-401. IEEE Computer Society Press, 1998.

P. Kilpeläinen, H. Ahonen, B. Heikkinen, O. Heinonen, G. Lindén and Jani Jaakkola. Design and Implementation of a Document Assembly Workbench. In Proceedings of the 7th International Conference on Electronic Publishing, EP'98, April 1-3, St. Malo, France, pages 476-486, Lecture Notes in Computer Science 1375, Springer, 1998.

K. Korpimies and E. Ukkonen. Term weighting in query-based document clustering (extended abstract). Proc. Advances in Databases and Information Systems (ADBIS'98), Lecture Notes in Computer Science 1475, pp. 151-153, Springer 1998.

K. Korpimies and E. Ukkonen. Searching for general documents. Proc. International Conf. on Flexible Query Answering Systems (FQAS'98). Lecture Notes in Computer Science 1495, pp. 203-214, Springer 1998.

J. Kärkkäinen and E. Sutinen. Lempel-Ziv index for q-grams. Algorithmica, 21(1) 137-154, May 1998.

K. Lemström and P. Laine. Musical Information Retrieval Using Musical Parameters. In Proc. 1998 International Computer Music Conference (ICMC '98), pp. 341-348. Ann Arbor, USA, October 1-6, 1998.

K. Lemström, J. Korte, P. Kuusi, P. Kyheröinen and P. Päiväkumpu. PICSearch - A Platform for Image Content-based Searching Algorithms. In Proc. Sixth International Conference in Central Europe on Computer Graphics and Visualisation 98 (WSCG '98), pp. 222-229. Plzen, Czech Republic, February 9-13, 1998.

Kjell Lemström: A Client-Server Extension to PICSearch System, Electronic Workshops in Computing: The Challenge of Image Retrieval (CIR '98), http://www.ewic.org.uk/ewic/workshop/view.cfm/CIR-98. Newcastle upon Tyne, United Kingdom, February 5-6, 1998.

Kjell Lemström, Atso Haapaniemi and Esko Ukkonen: Retrieving Music - To Index or not to Index. In Proc. Art Demos - Technical Demos - Poster Papers - The Sixth ACM International Multimedia Conference (MM '98) pp. 64-65 + loose sheet. Bristol, United Kingdom, September 13-16, 1998.

H. Mannila, H.T.T. Toivonen, Atte Korhola, and Heikki Olander. Learning, mining, or modeling? A case study in paleoecology. In Setsua Arikawa and Hiroshi Motoda, editors, Discovery Science, First International Conference, Lecture Notes in Artificial Intelligence 1532, pages 12-24, Fukuoka, Japan, 1998. Springer-Verlag. Reprinted in Japanese in the Bit journal.

H.T.T. Toivonen, H. Mannila, Marko Salmenkivi, and Karri-Pekka Laakso. Bassist - a tool for MCMC simulation of statistical models. In Kaj Juslin, editor, Proceedings of the Eurosim '98 Simulation Congress, pages 590-595, Helsinki, Finland, April 1998. The Federation of European Simulation Societies.

H.T.T. Toivonen, H. Mannila, Atte Korhola, and Heikki Olander. Applying Bayesian statistics to organism-based environmental reconstruction (extended abstract). In Modelling as a tool in environmental research, Report 42, Report Series in Aerosol Science, pages 11-14. Finnish association for aerosol research, Helsinki, Finland, December 1998.

H.T.T. Toivonen, H. Mannila, Marko Salmenkivi, Jouni Seppänen, and Kari Vasko. Bassist (version 0.8). Technical Report C-1998-31, Department of Computer Science, P.O. Box 26, FIN-00014 University of Helsinki, Finland, 1998.

1997

R. Aarts and J. Rousu. Qualitative knowledge to support reasoning about cases. Lecture notes in Artificial Intelligence Vol. 1266 (1997), pp. 489-498.

H. Ahonen. Disambiguation of SGML content models. Proceedings of the Workshop on Principles of Document Processing '96, 23 September, Palo Alto, USA, 1996. pages 27-37, Lecture Notes in Computer Science 1293, Springer-Verlag, 1997.

H. Ahonen, B. Heikkinen, O. Heinonen, and M. Klemettinen. Improving the accessibility of SGML documents - A content-analytical approach. Proceedings of SGML Europe '97 Conference, 13-15 May, Barcelona, Spain, pages 321-327, Graphic Communications Association, 1997.

Helena Ahonen, H. Mannila, and Erja Nikunen. Generating grammars for SGML tagged texts lacking DTD. Mathematical and Computer Modelling, 26(1), 1-13, 1997.

Helena Ahonen, Barbara Heikkinen, Oskari Heinonen, and Pekka Kilpeläinen. Assembling Documents from Digital Libraries. Proceedings of the 8th International Conference and Workshop on Database and Expert Systems Applications (DEXA '97), Toulouse, France, September, 1997. p. 419-429, Lecture Notes in Computer Science 1308, Springer Verlag, 1997.

Helena Ahonen, Oskari Heinonen, Mika Klemettinen, A. Inkeri Verkamo. Mining in the phrasal frontier. Principles of data mining and knowledge discovery : first European symposium, PKDD '97, Trondheim, Norway, June 24-27, 1997, proceedings s. 343-350.

Helena Ahonen, Oskari Heinonen, Mika Klemettinen, A. Inkeri Verkamo. Mining in the phrasal frontier. University of Helsinki, Department of Computer Science 1997, 10 pp.

Helena Ahonen, Barbara Heikkinen, Oskari Heinonen, and Pekka Kilpeläinen. A system for assembling specialized textbooks from a pool of documents. Report C-1997-22, Department of Computer Science, University of Helsinki, 1997.

Helena Ahonen, Oskari Heinonen, Mika Klemettinen, and A. Inkeri Verkamo. Applying data mining techniques in text analysis. Report C-1997-23, Department of Computer Science, University of

Helsinki, 1997.

H. Ahonen, H. Mannila and E. Nikunen: Generating grammars for SGML tagged texts lacking DTD.

Mathematical and Computer Modelling 26, 1 (1997), 1-13.

Ahonen, Helena, Heikkinen, Barbara, Heinonen, Oskari, and Klemettinen, Mika. Discovery of reasonably sized fragments using inter-paragraph similarities. University of Helsinki, Department of Computer Science 1997, 11 pages.

B. Bollobás, G. Das, D. Gunopulos and H. Mannila. Time-Series Similarity Problems and Well-Separated Geometric Sets. In 13th Annual ACM Symposium on Computational Geometry, 1997, p. 454-456.

A. Brazma, E. Ukkonen, J. Vilo and K. Valtonen. Data mining for regulatory elements in yeast genome. In: Terry Gaasterland et al. (eds.), Proc. Fifth International Conference on Intelligent Systems for Molecular Biology (ISMB'97), pp. 65-74, AAAI Press (Menlo Park) 1997.

A. Brazma, J. Vilo and E. Ukkonen. Finding transcription factor binding site combinations in the yeast

genome (extended abstract). In: Computer Science and Biology: Proc. German Conference on Bioinformatics (GCB'97), pp. 57-59. MIPS Munich Information Center for Protein Sequences 1997.

G. Das, R. Fleischer, L. Gasieniec, D. Gunopulos and J. Kärkkäinen. Episode matching. In Proc. 8th Annual Symposium on Combinatorial Pattern Matching, volume 1264 of Lecture Notes in Computer Science, pages 12-27. Springer, 1997.

G. Das, D. Gunopulos and H. Mannila. Finding similar time series. Principles of Data Mining and Knowledge Discovery (PKDD'97), Trondheim, Norwary, June 1997, Jan Komorowski and Jan Zytkow (eds.), p. 88-100.

T. Eiter, G. Gottlob and H. Mannila. Disjunctive Datalog. ACM Transactions on Database Systems 22, 3, September 1997, 364-418.

T. Eiter and H. Mannila. Distance measures for point sets and their computation. Acta Informatica, 34, 2 (1997), 109-133, 1997.

T. Elomaa and Juho Rousu. Well-behaved attribute evaluation functions for numerical attributes.

In Z. Ras and A. Skowron (eds.), Foundations of Intelligent Systems, Proc. Tenth International Symposium, ISMIS'97 (Charlotte, NC). Lecture Notes in Artificial Intelligence 1325. Springer-Verlag, Berlin Heidelberg, 1997, pp. 147-156.

T. Elomaa and J. Rousu. Efficient multisplitting on numerical data. In J. Komorowski and J. Zytkow (eds.), Principles of Data Mining and Knowledge Discovery, Proc. First European Symposium, PKDD '97 (Trondheim, Norway). Lecture Notes in Artificial Intelligence 1263. Springer-Verlag, Berlin Heidelberg, 1997, pp. 178-188.

T. Elomaa and J. Rousu. On the well-behavedness of important attribute evaluation functions. In G. Grahne (ed.), Proc. Sixth Scandinavian Conference on Artificial Intelligence (Helsinki, Finland). Frontiers in Artificial Intelligence and Applications 40. IOS Press, Amsterdam & Ohmsha Ltd., Tokyo, 1997, pp. 95-106.

T. Elomaa and J. Rousu. On the well-behavedness of important attribute evaluation functions. NeuroCOLT Technical Report NC-TR-97-006. Department of Computer Science, Royal Holloway, University of London, Feb. 1997. 14 pp.

Genetic algorithms and generative encoding of neural networks for some benchmark classification problems. Proceedings of the Third Nordic Workshop on Genetic Algorithms and their Applications (3NWGA), 20.-22. August 1997, Helsinki, Finland s.123-134

D. Gunopulos, R. Khardon, H. Mannila and H. Toivonen. Data mining, hypergraph transversals, and machine learning. Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'97), p. 209-216, 1997.

D. Gunopulos, H. Mannila and S. Saluja. Discovering all most specific sentences using randomized algorithms. Database Theory - ICDT'97, 6th International Conference, Delphi, Greece, January 1997, F. Afrati and Phokion Kolaitis, eds., 215-229.

T. Hegedüs and P. Indyk. On Learning Disjunctions of Zero-One Threshold Functions with Queries. in: Proceedings of the 8th International Workshop on Algorithmic Learning Theory (ALT'97), Springer-Verlag, LNCS 1316 (subseries LNAI), Berlin, 1997, pp. 446-460.

Huhtala, Ykä & Kärkkäinen, Juha & Porkka, Pasi & H.T.T. Toivonen. Efficient discovery of functional and approximate dependencies using partitions (extended version). University of Helsinki, Department of Computer Science 1997, 33 pp.

M. Huttunen, B. Vehviläinen and E. Ukkonen. Neural networks in the ice-correction of discharge observations. Nordic Hydrology 28 (1997), 283-296.

M. Huttunen, E. Ukkonen and B. Vehviläinen. Using trainable computing networks in the optimization of lake regulation. Proc. Fourth International Conference on Neural Information Processing and Intelligent Information Systems (ICONIP'97), pp. 975-978, Springer 1997.

J. Jaakkola, P. Kilpeläinen, and G. Lindén. TranSID: An SGML Tree Transformation Language. In the Proceedings of The Fifth Symposium on Programming Languages and Software Tools, Jyväskylä, Finland, June 7-8, 1997, ed. Jukka Paakki, pages 72-83, Technical Report C-1997-37, University of Helsinki, Department of Computer Science, June 1997.

Jaakkola, Jani & Kilpeläinen, Pekka & Lindén, Greger. TranSID: an SGML tree transformation language. University of Helsinki, Department of Computer Science 1997, 14 pp.

M. Klemettinen, H. Mannila, and H. Toivonen. A data-mining methodology and its application to semi-automatic knowledge acquisition. Proceedings of the 8th International Conference and Workshop on Database and Expert Systems Applications (DEXA'97), p. 670-677, Toulouse, France, September 1997.

Kärkkäinen, Juha & Sutinen, Erkki Lempel-Ziv index for q-grams. Algoritmipäivä 10.1.1997 Helsingissä 19 s.

Jarmo Laakso, Harri Aaltonen, Antti Leppävuori, H.T.T. Toivonen, Vesa Kilpi, Harri Mononen, and Jukka Viitasalo. Lasertutkan käyttömahdollisuudet suksien testauksessa . Technical report, Research Institute for Olympic Sports, FIN-40700 Jyväskylä, Finland, February 1997.

G. Lindén. Structured Document Transformations. PhD Thesis, Report A-1997-2, Department of Computer Science, University of Helsinki, June 1997.

H. Mannila. Methods and problems in data mining. Database Theory - ICDT'97, 6th International Conference, Delphi, Greece, January 1997, F. Afrati and Phokion Kolaitis, eds., p. 41-55.

H. Mannila and P. Ronkainen. Similarity of Event Sequences. Proceedings of the Fourth International Workshop on Temporal Representation and Reasoning (TIME'97), 1997, p. 136-139.

H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1, 3 (1997), 241-258.

H. Mannila, H. Toivonen and I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1,3 (1997), 259-289.

H. Mannila. Inductive databases and condensed representations: concepts for data mining. International Logic Programming Symposium, 1997, p. 21-30.

H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery, 1(3):241-258, November 1997.

H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3):259-289, November 1997.

E. Ohlebusch and E. Ukkonen: On the equivalence problem for E-pattern languages. Theoretical Computer Science 186 (1997), 231-248.

1996

R. Aarts and J. Rousu. Towards CBR for bioprocess planning. Advances in case-based reasoning. Lecture Notes in Artificial Intelligence Vol. 1168 (1996), pp. 16 – 27.

H. Ahonen. Automatic generation of SGML content models. In Allen Brown, Anne Brüggemann-Klein, and An Feng, editors, Proceedings of the Sixth International Conference on Electronic Publishing, Document Manipulation and Typography '96, 24-26 September, Palo Alto, USA, pages 195-206, Wiley Publishers, 1996.

Helena Ahonen. Generation of SGML DTDs for tagged documents. Proceedings of SGML Finland 1996, 4-5 October, Espoo, Finland, pages 40-46, SGML User's Group Finland, 1996.

Helena Ahonen, Barbara Heikkinen, Oskari Heinonen, Jani Jaakkola, Pekka Kilpeläinen, Greger Lindén, and H. Mannila. Constructing tailored SGML documents. Proceedings of SGML Finland 1996, 4-5 October, Espoo, Finland, pages 106-116, SGML User's Group Finland, 1996.

Helena Ahonen, Barbara Heikkinen, Oskari Heinonen, Jani Jaakkola, Pekka Kilpeläinen, Greger Lindén, and H. Mannila. Intelligent assembly of structured documents. Report C-1996-40, Department of Computer Science, University of Helsinki, 1996.

Helena Ahonen. Generating grammars for structured documents using grammatical inference methods. PhD thesis, Department of Computer Science, University of Helsinki, November 1996.

R. Agrawal, H. Mannila, R. Srikant, H. Toivonen and A. I. Verkamo. Fast discovery of association rules. In Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, and Ramasamy Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, 1996. AAAI Press, p. 307-328.

Elja Arjas, H. Mannila, Marko Salmenkivi, Riikka Suramo, and H.T.T. Toivonen. BASS: Bayesian analyzer of event sequences. In Albert Prat, editor, Proceedings in Computational Statistics (COMPSTAT'96), pages 199-204, Barcelona, Spain, August 1996. Physica-Verlag.

A. Brazma, I. Jonassen, E. Ukkonen and J. Vilo: Discovering patterns and subfamilies in biosequences. In: David J. States et al. (eds.), Proc. Fourth International Conference on Intelligent Systems for Molecular Biology (ISMB'96), pp. 34-43, AAAI Press (Menlo Park) 1996.

A. Brazma, E. Ukkonen and J. Vilo: Discovering unbounded unions of regular pattern languages from positive examples. In: T. Asano, Y. Igarashi, H. Nagamochi, S. Miyano and S. Suri (eds.), Proc. Seventh International Symposium on Algorithms and Computation (ISAAC'96), Lecture Notes in Computer Science 1178, pp. 95-104, Springer 1996.

T. Elomaa and J. Rousu. Finding optimal multi-splits for numerical attributes in decision tree learning. In J. Alander, T. Honkela and M. Jakobsson (eds.), STeP'96 - Genes, Nets and Symbols, Proc. Seventh Finnish

Artificial Intelligence Conference (Vaasa, Finland). The Finnish Artificial Intelligence Society, Vaasa, 1996, pp. 104-111.

T. Elomaa. Machine intelligence and learning (Invited Public Lecture). In J. Alander, T. Honkela and M. Jakobsson (eds.), STeP'96 - Genes, Nets and Symbols, Proc. Seventh Finnish Artificial Intelligence Conference (Vaasa, Finland). The Finnish Artificial Intelligence Society, Vaasa, 1996, pp. 90-95.

T. Elomaa and J. Rousu. General and efficient multisplitting of numerical attributes. Report C-1996-82. Department of Computer Science, University of Helsinki, Oct. 1996. 25 pp.

T. Elomaa. Tools and techniques for decision tree learning. Ph.D. thesis, Report A-1996-2. Department of Computer Science, University of Helsinki, May 1996. 116+26 pp.

T. Elomaa and J. Rousu. Finding optimal multi-splits for numerical attributes in decision tree learning. NeuroCOLT Technical Report NC-TR-96-041. Department of Computer Science, Royal Holloway, University of London, Mar. 1996. 15 pp.

T. Hegedüs and N. Megiddo. On the Geometric Separability of Boolean Functions. Discrete Applied Mathematics 66 (1996) 205-218.

Heinonen, Oskari and H. Mannila. Attribute-oriented induction and conceptual clustering. Department of Computer Science. University of Helsinki 1996, 6 pp.

M. Huttunen, E. Ukkonen and B. Vehviläinen: Neural networks as a part of watershed-model in ice reduction of discharge observations. Proc. Nordic Hydrological Conference 1996, Vol. 1, pp. 286-293. NHP-report no 40, Icelandic Hydrological Committee, Reykjavik 1996.

M. Huttunen, B. Vehviläinen and E. Ukkonen: Coding a conceptual model into a neural network in modeling ice-correction. In: C. H. Dagli, M. Akay, C. L. P. Chen, B. R. Fernandez and J. Ghosh (eds.), Proc. of the Artificial Neural Networks in Engineering (ANNIE'96) Conference. Intelligent Engineering Systems Through Artificial Neural Networks, Vol. 6, pp. 1001-1006, ASME Press, New York 1996.

K. Hätönen, M. Klemettinen, H. Mannila, P. Ronkainen, and H. Toivonen. Knowledge Discovery from Telecommunication Network Alarm Databases. 12th International Conference on Data Engineering (ICDE'96), New Orleans, Louisiana, February 1996, p. 115-122.

K. Hätönen, M. Klemettinen, H. Mannila, P. Ronkainen, and H. Toivonen. TASA: Telecommunications Alarm Sequence Analyzer, or "How to enjoy faults in your network". IEEE/IFIP 1996 Network Operations and Management Symposium (NOMS'96), Kyoto, Japan, April 1996, p. 520-529.

Kimmo Hätönen, Mika Klemettinen, H. Mannila, Pirjo Ronkainen, and H.T.T. Toivonen. Rule discovery in alarm databases. Technical Report C-1996-7, University of Helsinki, Department of Computer Science, P.O. Box 26, FIN-00014 University of Helsinki, Finland, March 1996.

T. Imielinski and H. Mannila. A database perspective on knowledge discovery. Communications of the ACM 39, 11 (November 1996), 58-64.

M. Jaeger, H. Mannila and E. Weydert: Data mining as selective theory extraction in probabilistic logic. SIGMOD'96 Data Mining Workshop.

P. Jokinen, J. Tarhio and E. Ukkonen. A comparison of approximate string matching algorithms.Software - Practice and Experience 26 (1996), 1439-1458.

Mika Klemettinen, H. Mannila, and H.T.T. Toivonen. Interactive exploration of discovered knowledge: A methodology for interaction, and usability studies. Technical Report C-1996-3, University of Helsinki, Department of Computer Science, P.O. Box 26, FIN-00014 University of Helsinki, Finland, February 1996.

J. Kärkkäinen and E. Ukkonen. Lempel-Ziv parsing and sublinear-size index structures for string matching. In Proc. 3rd South American Workshop on String Processing, pages 141-155. Carleton University Press, 1996.

Juha Kärkkäinen and Erkki Sutinen: Lempel-Ziv index for q-grams. In Proc. 4th Annual European Symposium on Algorithms, volume 1136 of Lecture Notes in Computer Science, pages 378-391. Springer, 1996.

J. Kärkkäinen and Esko Ukkonen. Sparse suffix trees. In Proc. 2nd Annual International Conference on Computing and Combinatorics, volume 1090 of Lecture Notes in Computer Science, pages 219-230. Springer, 1996.

K. Lemström, J. Tarhio and T. Takala: Color Dithering with n-best Algorithm. In. Proc. Fourth International Conference in Central Europe on Computer Graphics and Visualisation 96 (WSCG'96), pp. 162-169. Plzen, Czech Republic, February 12-16, 1996.

G. Lindén, H. Tirri and A. I. Verkamo. ALCHEMIST: A General Purpose Transformation Generator. Software - Practice and Experience 26, 6 (June 1996), 653-676.

H. Mannila: Data mining: machine learning, statistics, and databases. Eighth International Conference on Scientific and Statistical Database Management, Stockholm, June 18-20, 1996, p. 1-8.

H. Mannila and H. Toivonen: Discovering generalized episodes using minimal occurrences. 2nd International Conference on Knowledge Discovery and Data Mining (KDD'96), Portland, Oregon, August 1996. AAAI Press, p. 146-151.

H. Mannila and H. Toivonen: Multiple uses of frequent sets and condensed representations. 2nd International Conference on Knowledge Discovery and Data Mining (KDD'96), Portland, Oregon, August 1996. AAAI Press, p. 189-194.

H. Mannila and H.T.T. Toivonen. On an algorithm for finding all interesting sentences. In Cybernetics and Systems, Volume II, The Thirteenth European Meeting on Cybernetics and Systems Research, pages 973-978, Vienna, Austria, April 1996. Austrian Society for Cybernetic Studies.

E. Ohlebusch and E. Ukkonen: On the equivalence problem for E-pattern languages (Extended Abstract). In: W. Penczek and A. Szalas (eds.), Proc. MFCS 96. Lecture Notes in Computer Science 1113, pp. 457 468, Springer 1996.

H.T.T. Toivonen. Sampling large databases for association rules. In T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, and Nandlal L. Sarda, editors, Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB'96), pages 134-145, Mumbay, India, September 1996. Morgan Kaufmann.

H.T.T. Toivonen. Discovery of Frequent Patterns in Large Data Collections. PhD thesis, University of Helsinki, Department of Computer Science, P.O. Box 26, FIN-00014 University of Helsinki, Finland, November 1996.

Pasi.Rastas@cs.Helsinki.FI