University homepage Suomenkielinen versio puuttuu Inte på svenska In english
University of Helsinki Department of Computer Science
 

Department of Computer Science

Roman Yangarber
Publications

    Edited Collections

  1. Cover: Multi-source, Multilingual Information
                    Extraction and Summarization
    Thierry Poibeau, Horacio Saggion, Jakub Piskorski, Roman Yangarber (Eds.)
    Theory and Applications of Natural Language Processing.
    Springer-Verlag (2012) Berlin, Heidelberg

  2. MINUCS-2009: Mining User-Generated Content for Security.
    Ulf Brefeld, Jakub Piskorski, Roman Yangarber (Eds.)
    Proceedings of the Workshop at the UCMedia-2009: ICST Conference on User-Centric Media (2009) Venice, Italy

  3. High-Level Information Extraction   
    Sebastian Blohm, Ulf Brefeld, Felix Jungermann, Roman Yangarber (Eds.)
    Proceedings of the Workshop at ECML/PKDD-2008: the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (2008) Antwerp, Belgium

  4. Multi-source, Multilingual Information Extraction and Summarization   
    Thierry Poibeau, Horacio Saggion, Roman Yangarber (Eds.)
    Proceedings of MMIES-2: the Second Workshop on Multi-Lingual, Multi-Source Information Extraction and Summarization, at COLING-2008: the 22nd International Conference on Computational Linguistics (2008) Manchester, United Kingdom

  5. Information Extraction Beyond The Document   
    Mary Elaine Califf, Mark A. Greenwood, Mark Stevenson, Roman Yangarber (Eds.)
    Proceedings of the Workshop at ACL/COLING (July 2006) Sydney, Australia

    Conference, Journal Papers, Book Chapters

  6. MDL-based Models for Transliteration Generation   (pdf)
    Javad Nouri, Lidia Pivovarova, Roman Yangarber
    SLSP 2013: International Conference on Statistical Language and Speech Processing
    Springer Verlag, Lecture Notes in Artificial Intelligence (LNAI) Volume 7978 (LNCS) Tarragona, Spain (July, 2013)

  7. Automatic detection of stable grammatical features in n-grams   (pdf)
    Mikhail Kopotev, Lidia Pivovarova, Natalia Kochetkova, Roman Yangarber
    MWE 2013: The 9th Workshop on Multiword Expressions Co-located with Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL/HLT 2013) Atlanta, GA (July 2013)

  8. Event representation across genre   (pdf)
    Lidia Pivovarova, Silja Huttunen, Roman Yangarber
    Workshop on Events: Definition, Detection, Coreference, and Representation At the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL/HLT) Atlanta, GA (2013)

  9. Information-theoretic modeling of etymological sound change   
    Hannes Wettig, Javad Nouri, Kirill Reshetnikov, Roman Yangarber
    Invited chapter in Approaches to measuring linguistic differences (Lars Borin, Anju Saxena, eds.) Trends in Linguistics Series, Volume 265. (September 2013) Mouton de Gruyter

  10. Techniques for Multilingual Security-related Event Extraction from Online News   
    Martin Atkinson, Mian Du, Jakub Piskorski, Hristo Tanev, Roman Yangarber, Vanni Zavarella.
    In Computational Linguistics—Applications (A. Przepiórkowski, M. Piasecki, K. Jassem, P. Fuglewicz, eds.) Studies in Computational Intelligence, Vol. 458 (2012) Springer Verlag

  11. Information-theoretic Methods for Analysis and Inference in Etymology   
    Hannes Wettig, Javad Nouri, Kirill Reshetnikov and Roman Yangarber
    In Proceedings of the Fifth Workshop on Information-theoretic Methods in Science and Engineering (Steven de Rooij, Wojciech Kotłowski, Jorma Rissanen, Petri Myllymäki, Teemu Roos & Kenji Yamanishi, eds.) (2012) Amsterdam, the Netherlands

  12. Using Context and Phonetic Features in Models of Etymological Sound Change   
    Hannes Wettig, Kirill Reshetnikov and Roman Yangarber.
    In EACL 2012: Workshop on Visualization of Linguistic Patterns and Uncovering Language History from Multilingual Resources (2012) Avignon, France

  13. Information Extraction: Past, Present and Future   
    Jakub Piskorski, Roman Yangarber.
    Introductory Survey Chapter in "Multi-source, Multilingual Information Extraction and Summarization", Theory and Applications of Natural Language Processing (T. Poibeau et al., eds.). Springer-Verlag (2012) Berlin, Heidelberg

  14. Predicting Relevance of Event Extraction for the End User   
    Silja Huttunen, Arto Vihavainen, Mian Du, Roman Yangarber.
    In "Multi-source, Multilingual Information Extraction and Summarization", Theory and Applications of Natural Language Processing (T. Poibeau et al., eds.). Springer-Verlag (2012) Berlin, Heidelberg

  15. MDL-based modeling of etymological sound change in the Uralic language family   
    Hannes Wettig, Suvi Hiltunen, Roman Yangarber.
    WITMSE-2011: The Fourth Workshop on Information Theoretic Methods in Science and Engineering (2011) Helsinki, Finland

  16. Building support tools for Russian-language information extraction   
    Mian Du, Peter von Etter, Mikhail Kopotev, Mikhail Novikov, Natalia Tarbeeva, Roman Yangarber.
    BSNLP-2011: Balto-Slavonic Natural Language Processing (2011) Plzeň, Czech Republic. Springer-Verlag, Lecture Notes in Computer Science, 2011, Volume 6836, Text, Speech and Dialogue.

  17. Multilingual real-time event extraction for border security intelligence gathering   
    Martin Atkinson, Jakub Piskorski, Erik Van der Goot, Roman Yangarber
    Counterterrorism and Open Source Intelligence. Springer Lecture Notes in Social Networks, Vol. 2. (Uffe Kock Wiil, editor). (2011) pp. 355-390

  18. MDL-based models for aligning etymological data   
    Hannes Wettig, Suvi Hiltunen, Roman Yangarber.
    RANLP-2011: Conference on Recent Advances in Natural Language Processing (2011) Hissar, Bulgaria

  19. Relevance prediction in information extraction using discourse and lexical features
    Silja Huttunen, Arto Vihavainen, Peter von Etter, Roman Yangarber.
    Nodalida-2011: Nordic Conference on Computational Linguistics (2011) Riga, Latvia

  20. Probabilistic models for alignment of etymological data   
    Hannes Wettig, Roman Yangarber.
    Nodalida-2011: Nordic Conference on Computational Linguistics (2011) Riga, Latvia

  21. Hidden Markov models for induction of morphological structure of natural language   
    Hannes Wettig, Suvi Hiltunen, Roman Yangarber.
    WITMSE-2010: Workshop on Information Theoretic Methods in Science and Engineering (2010) Tampere, Finland

  22. Assessment of utility in Web mining for the domain of Public Health    (pdf)
    Peter von Etter, Silja Huttunen, Arto Vihavainen, Matti Vuorinen, Roman Yangarber.
    In Proceedings of LOUHI-2010: the Second Louhi Workshop on Text and Data Mining of Health Documents, at the NAACL/HLT Conference, (2010) Los Angeles, California

  23. MedISys—Medical Information System   
    Jens P. Linge, Ralf Steinberger, Flavio Fuart, Stefano Bucci, Jenya Belyaeva, Monica Gemo, Delilah Al-Khudhairy, Roman Yangarber, Erik van der Goot.
    In Advanced ICTs for Disaster Management and Threat Detection: Collaborative and Distributed Frameworks. Eleana Asimakopoulou, Nik Bessis (eds.), (2010) IGI GLobal Press, pp. 131-142.

  24. Real-time text mining in multilingual news for the Creation of a Pre-frontier Intelligence Picture    (pdf)
    Jakub Piskorski, Martin Atkinson, Jenya Belyaeva, Vanni Zavarella, Silja Huttunen, Roman Yangarber.
    In Proceedings of the 16th Conference on Knowledge Discovery and Data Mining (KDD-2010); ACM SIGKDD Workshop on Intelligence and Security Informatics. (2010) Washington, DC

  25. Filtering news for epidemic surveillance: towards processing more languages with fewer resources   
    Gael Lejeune, Antoine Doucet, Roman Yangarber, Nadine Lucas.
    CLIA: Fourth International Workshop On Cross Lingual Information Access, at COLING 2010 (2010) Beijing, China

  26. Utility evaluation of tools for collaborative development and maintenance of ontologies   
    Alex Norta, Roman Yangarber, Lauri Carlson.
    VORTE-2010: Joint 5th International Workshop on Vocabularies, Ontologies and Rules for The Enterprise / International Workshop on Metamodels, Ontologies and Semantic Technologies (MOST) at EDOC-2010: the Fourteenth IEEE International Conference On Enterprise Computing (2010) Vitória, ES, Brazil

  27. News mining for border security intelligence    (pdf)
    Jakub Piskorski, Martin Atkinson, Jenya Belayeva, Vanni Zavarella Silja Huttunen, Roman Yangarber.
    In IEEE ISI-2010: Intelligence and Security Informatics (2010) Vancouver, BC, Canada

  28. The landscape of international event-based biosurveillance    (link)
    D Hartley, N Nelson, R Walters, R Arthur, R Yangarber, L Madoff, J Linge, A Mawudeku, N Collier, J Brownstein, G Thinus, N Lightfoot.
    In Emerging Health Threats Journal, 3:e3 (2010)

  29. Automated event extraction in the domain of Border Security    (pdf)
    Martin Atkinson, Jakub Piskorski, Hristo Tanev, Eric van der Goot, Roman Yangarber, Vanni Zavarella.
    In Proceedings of MINUCS-2009: Workshop on Mining User-Generated Content for Security, at the UCMedia-2009: ICST Conference on User-Centric Media (2009) Venice, Italy

  30. Automatic epidemiological surveillance from on-line news in MedISys and PULS    (pdf)
    Roman Yangarber, Peter von Etter, Ralf Steinberger.
    In Proceedings of IMED-2009: International Meeting on Emerging Diseases and Surveillance (2009) Vienna, Austria

  31. Internet surveillance systems for early alerting of health threats    (link pdf)
    Jens P. Linge, Ralf Steinberger, Thomas P. Weber, Roman Yangarber, Erik van der Goot, Delilah H. Al-Khudhairy, Nikolaos I. Stilianakis
    In Eurosurveillance Journal, 14(13) (2009) Stockholm, Sweden

  32. Text mining from the Web for Medical Intelligence    (pdf)
    Ralf Steinberger, Flavio Fuart, Erik van der Groot, Clive Best,
    Peter von Etter, Roman Yangarber.
    In: Mining Massive Data Sets for Security, D. Perrotta, J. Piskorski, F. Soulié-Fogelman & R. Steinberger (eds.): OIS Press. (2008) Amsterdam, The Netherlands

  33. Content Collection and Analysis in the Domain of Epidemiology    (pdf)
    Roman Yangarber, Peter von Etter, Ralf Steinberger.
    In Proceedings of DrMED-2008: International Workshop on Describing Medical Web Resources, at MIE-2008: the 21st International Congress of the European Federation for Medical Informatics (2008) Göteborg, Sweden

  34. A Database of the Uralic Language Family for Etymological Research    (pdf)
    Roman Yangarber, Marko Salmenkivi, Marjaana Välisalo.
    Technical Report C-2008-38. University of Helsinki, Department of Computer Science, Series of Publications C (2008)

  35. Combining information retrieval and information extraction for medical intelligence    (pdf)
    Roman Yangarber, Ralf Steinberger, Clive Best, Peter von Etter, Flavio Fuart, David Horby.
    Mining Massive Data Sets for Security, NATO Advanced Study Institute (2007) Gazzada, Italy

  36. Combining Information about Epidemic Threats from Multiple Sources    (pdf)
    Roman Yangarber, Clive Best, Peter von Etter, Flavio Fuart, David Horby, Ralf Steinberger.
    In Proceedings Multi-source, Multilingual Information Extraction and Summarization at RANLP-2007. (2007) Borovets, Bulgaria

  37. Verification of Facts across Document Boundaries    (pdf)
    Roman Yangarber.
    In Proceedings IIIA-2006: International Workshop on Intelligent Information Access (2006) Helsinki, Finland

  38. Mining the Semantics of Text via Counter-Training    (link)
    Roman Yangarber.
    In Proceedings of the 12th Portuguese Conference on Artificial Intelligence, EPIA-2005, Thematic area: Text Mining and Applications TEMA-2005
    Springer LNCS Vol. 3808, pp. 647-657 (2005) Covilhã, Portugal

  39. Redundancy-based Correction of Automatically Extracted Facts    (pdf)
    Roman Yangarber, Lauri Jokipii.
    In Proceedings Human Language Technology Conference/ Conference on Empirical Methods in Natural Language Processing: HLT/EMNLP-2005, (2005) Vancouver, Canada

  40. Information Extraction from Epidemiological Reports    (pdf)
    Roman Yangarber, Lauri Jokipii, Antti Rauramo, Silja Huttunen.
    In Proceedings Human Language Technology Conference/ Conference on Empirical Methods in Natural Language Processing: HLT/EMNLP-2005, demonstration; (2005) Vancouver, Canada

  41. Use of Deep Syntax Parsing in Cross-Language Information Extraction   
    Konstantin Bogatyrev, Roman Yangarber.
    In Proceedings Workshop on Intelligent Linguistic Technologies, International Conference on Machine Learning; Models, Technologies and Applications MLMTA-2005, pp. 18-24 (2005) Las Vegas, NV

  42. User-Oriented Evaluation in Information Extraction   
    Roman Yangarber.
    In Proceedings Workshop on User-Oriented Evaluation of Knowledge Discovery Systems, 4th International Conference on Language Resources and Evaluation (LREC 2004) Lisbon, Portugal

  43. Information Extraction for Enhanced Access to Disease Outbreak Reports    (link)
    Ralph Grishman, Silja Huttunen, Roman Yangarber.
    In Journal of Biomedical Informatics, 35 (4) pp. 236-246, C. Friedman, ed. (2003)

  44. Acquisition of Domain Knowledge    (link)
    Roman Yangarber.
    Invited chapter In Extraction in the Web Era (M.T. Pazienza, ed.), Lecture Notes in Computer Science, Vol. 2700 Springer-Verlag Heidelberg, pp. 1-28 (2003) Rome, Italy

  45. Bootstrapped Learning of Semantic Classes from Positive and Negative Examples    (pdf)
    Winston Lin, Roman Yangarber, Ralph Grishman.
    In Proceedings of the 20th International Conference on Machine Learning: ICML 2003 Workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining (2003) Washington, D.C.

  46. Counter-Training in Discovery of Semantic Patterns    (pdf)
    Roman Yangarber.
    In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics: ACL-2003 (2003) Sapporo, Japan

  47. Unsupervised Learning of Generalized Names    (ps.gz, pdf)
    Roman Yangarber, Winston Lin, Ralph Grishman.
    In Proceedings of the 19th International Conference on Computational Linguistics: COLING-2002 (2002) Taipei, Taiwan

  48. Complexity of Event Structure in IE Scenarios    (ps, pdf)
    Silja Huttunen, Roman Yangarber, Ralph Grishman.
    In Proceedings of the 19th International Conference on Computational Linguistics: COLING-2002 (2002) Taipei, Taiwan

  49. Real-Time Event Extraction for Infectious Disease Outbreaks    (pdf)
    Ralph Grishman, Silja Huttunen, Roman Yangarber.
    In Proceedings of the 3rd Annual Human Language Technology Conference HLT-2002 (2002) San Diego, CA

  50. Diversity of Scenarios in Information Extraction    (ps)
    Silja Huttunen, Roman Yangarber, Ralph Grishman.
    In Proceedings of the 3rd International Conference on Language Resources and Evaluation LREC-2002 (2002) Las Palmas de Gran Canaria, Spain

  51. Scenario Customization for Information Extraction    (ps)
    Roman Yangarber.
    PhD Thesis. (2001) New York University, Courant Institute of Mathematical Sciences.

  52. Automatic Acquisition of Domain Knowledge for Information Extraction    (ps.gz)
    Roman Yangarber, Ralph Grishman, Pasi Tapanainen, Silja Huttunen.
    In Proceedings of the 18th International Conference on Computational Linguistics: COLING-2000 (2000) Saarbrücken, Germany

  53. Machine Learning of Extraction Patterns from Un-annotated Corpora    (pdf)
    Roman Yangarber, Ralph Grishman.
    In Proceedings of the 14th European Conference on Artificial Intelligence: ECAI-2000 Workshop on Machine Learning for Information Extraction (2000) Berlin, Germany

  54. Extraction Pattern Discovery through Corpus Analysis    (doc)
    Roman Yangarber, Ralph Grishman.
    In Proceedings of the 2nd International Conference on Language Resources and Evaluation: LREC-2000 Workshop: Information Extraction meets Corpus Linguistics (2000) Athens, Greece

  55. Unsupervised Discovery of Scenario-Level Patterns for Information Extraction    (ps.gz)
    Roman Yangarber, Ralph Grishman, Pasi Tapanainen, Silja Huttunen.
    In Proceedings of Conference on Applied Natural Language Processing ANLP-NAACL 2000 pp. 282-289, (2000) Seattle, WA

  56. Issues in Corpus-Trained Information Extraction    (doc)
    Ralph Grishman, Roman Yangarber.
    In Proceedings of International Symposium: Toward the Realization of Spontaneous Speech Engineering, pp. 107-112, (2000) Tokyo, Japan

  57. Transforming Examples into Patterns for Information Extraction    (ps.gz)
    Roman Yangarber, Ralph Grishman.
    In Proceedings of TIPSTER Text Program Phase III, Morgan Kaufmann (1998) Baltimore, MD

  58. Japanese IE System and Customization Tool    (ps.gz)
    Chikashi Nobata, Satoshi Sekine, Roman Yangarber.
    In Proceedings of TIPSTER Text Program Phase III, Morgan Kaufmann (1998) Baltimore, MD

  59. Deriving Transfer Rules from Dominance-Preserving Alignments    (ps.gz)
    Adam Meyers, Roman Yangarber, Ralph Grishman, Catherine Macleod, Antonio Moreno-Sandoval.
    In Proceedings of COLING-ACL-98 (1998) Montreal, Canada

  60. Using NOMLEX to Produce Nominalization Patterns for Information Extraction    (ps.gz)
    Adam Meyers, Catherine Macleod, Roman Yangarber, Ralph Grishman, Leslie Barrett, Ruth Reeves.
    In Proceedings of COLING-ACL-98 Workshop on Computational Treatment of Nominals, (1998) Montreal, Canada

  61. NYU: Description of the Proteus/PET System as Used for MUC-7 ST    (ps.gz)
    Roman Yangarber, Ralph Grishman.
    In Proceedings of the 7th Message Understanding Conference: MUC-7 (1998) Washington, DC

  62. Customization of Information Extraction Systems    (ps.gz)
    Roman Yangarber, Ralph Grishman.
    In Proceedings of International Workshop on Lexically-Driven Information Extraction, invited talk, pp. 1-11, (1997) Frascati, Italy

  63. Alignment of Shared Forests for Bilingual Corpora    (ps.gz)
    Adam Meyers, Roman Yangarber, Ralph Grishman.
    In Proceedings of the 16th International Conference on Computational Linguistics: COLING-96 pp. 460-465 (1996) Copenhagen, Denmark

  64. ThinkSheet: A Tool for Tailoring Complex Documents   
    Peter Piatko, Roman Yangarber, Daoi Lin, Dennis Shasha.
    ACM SIGMOD '96, demonstration (1996) Montreal, Canada

Invited Presentations

Information-theoretic modeling of etymological sound change
Invited speaker at Workshop on comparing approaches to measuring linguistic differences (2011) Gothenburg, Sweden

Discovering complex networks of events and relations in News Surveillance    (video)
Keynote speaker at the 4th International Symposium on Open Source Intelligence and Web Mining (OSINT-WM) in conjuction with the European Conference on Intelligence and Security Informatics (European ISI 2011) (2011) Athens, Greece.

Probabilistic models for aligning Uralic etymological data
Invited speaker at Biological Evolution and the Diversification of Languages (BEDLAN) Seminar: Evolutionary Perspectives Of Language Change (2011) Seili, Finland.

Discovering complex events and relations in text: Frontex real-time news event extraction framework
Invited speaker at Tutorial for Member States: Frontex news event extraction framework and Frontex Media Monitor (2011) Frontex EC Agency, Warsaw, Poland.

Finding Facts from Text—Information Extraction Technology    (slides.pdf)
Invited speaker at European Commission's Joint Research Centre (EC-JRC) European Commission's Directorate General Joint Research Centre, (2006) Ispra, Italy.

Acquisition of Domain Knowledge
Invited speaker at SCIE-2002: 3rd Summer Convention on Information Extraction (2002) University of Rome Tor Vergata, Italy.