Publications


On Google Scholar

Edited Collections

Cover: Multi-source, Multilingual Information                           Extraction and Summarization

  1. Thierry Poibeau, Horacio Saggion, Jakub Piskorski, Roman Yangarber (Eds.)
    Theory and Applications of Natural Language Processing. Springer-Verlag (2012) Berlin, Heidelberg
  2. MINUCS-2009: Mining User-Generated Content for Security.
    Ulf Brefeld, Jakub Piskorski, Roman Yangarber (Eds.)

    Proceedings of the Workshop at the UCMedia-2009: ICST Conference on User-Centric Media (2009) Venice, Italy

  3. High-Level Information Extraction   
    Sebastian Blohm, Ulf Brefeld, Felix Jungermann, Roman Yangarber (Eds.)

    Proceedings of the Workshop at ECML/PKDD-2008: the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (2008) Antwerp, Belgium

  4. Multi-source, Multilingual Information Extraction and Summarization
    Thierry Poibeau, Horacio Saggion, Roman Yangarber (Eds.)

    Proceedings of MMIES-2: the Second Workshop on Multi-Lingual, Multi-Source Information Extraction and Summarization, at COLING-2008: the 22nd International Conference on Computational Linguistics (2008) Manchester, United Kingdom

  5. Information Extraction Beyond The Document   
    Mary Elaine Califf, Mark A. Greenwood, Mark Stevenson, Roman Yangarber (Eds.)

    Proceedings of the Workshop at ACL/COLING (July 2006) Sydney, Australia


    Conference, Journal Papers, Book Chapters

  6. Information-theoretic modeling of etymological sound change
    Hannes Wettig, Kirill Reshetnikov and Roman Yangarber

    Invited chapter in Approaches to measuring linguistic differences (Lars Borin, Anju Saxena, eds.) Trends in Linguistics Series, Volume 265. (2013, to appear) Mouton de Gruyter

  7. Techniques for Multilingual Security-related Event Extraction from Online News   
    Martin Atkinson, Mian Du, Jakub Piskorski, Hristo Tanev, Roman Yangarber, Vanni Zavarella.

    In Computational Linguistics—Applications (A. Przepiórkowski, M. Piasecki, K. Jassem, P. Fuglewicz, eds.) Studies in Computational Intelligence, Vol. 458 (2012) Springer Verlag

  8. Information-theoretic Methods for Analysis and Inference in Etymology
    Hannes Wettig, Javad Nouri, Kirill Reshetnikov and Roman Yangarber

    In Proceedings of the Fifth Workshop on Information-theoretic Methods in Science and Engineering (Steven de Rooij, Wojciech Kotłowski, Jorma Rissanen, Petri Myllymäki, Teemu Roos & Kenji Yamanishi, eds.) (2012) Amsterdam, the Netherlands

  9. Using Context and Phonetic Features in Models of Etymological Sound Change
    Hannes Wettig, Kirill Reshetnikov and Roman Yangarber.

    In EACL 2012: Workshop on Visualization of Linguistic Patterns and Uncovering Language History from Multilingual Resources (2012) Avignon, France

  10. Information Extraction: Past, Present and Future
    Jakub Piskorski, Roman Yangarber.

    Introductory Survey Chapter in "Multi-source, Multilingual Information Extraction and Summarization", Theory and Applications of Natural Language Processing (T. Poibeau et al., eds.). Springer-Verlag (2012) Berlin, Heidelberg

  11. Predicting Relevance of Event Extraction for the End User
    Silja Huttunen, Arto Vihavainen, Mian Du, Roman Yangarber.

    In "Multi-source, Multilingual Information Extraction and Summarization", Theory and Applications of Natural Language Processing (T. Poibeau et al., eds.). Springer-Verlag (2012) Berlin, Heidelberg

  12. MDL-based modeling of etymological sound change in the Uralic language family
    Hannes Wettig, Suvi Hiltunen, Roman Yangarber.

    WITMSE-2011: The Fourth Workshop on Information Theoretic Methods in Science and Engineering (2011) Helsinki, Finland

  13. Building support tools for Russian-language information extraction
    Mian Du, Peter von Etter, Mikhail Kopotev, Mikhail Novikov, Natalia Tarbeeva, Roman Yangarber.

    BSNLP-2011: Balto-Slavonic Natural Language Processing (2011) Plzeň, Czech Republic. Springer-Verlag, Lecture Notes in Computer Science, 2011, Volume 6836, Text, Speech and Dialogue.

  14. Multilingual real-time event extraction for border security intelligence gathering
    Martin Atkinson, Jakub Piskorski, Erik Van der Goot, Roman Yangarber

    Counterterrorism and Open Source Intelligence. Springer Lecture Notes in Social Networks, Vol. 2. (Uffe Kock Wiil, editor). (2011) pp. 355-390

  15. MDL-based models for aligning etymological data
    Hannes Wettig, Suvi Hiltunen, Roman Yangarber.

    RANLP-2011: Conference on Recent Advances in Natural Language Processing (2011) Hissar, Bulgaria

  16. Relevance prediction in information extraction using discourse and lexical features
    Silja Huttunen, Arto Vihavainen, Peter von Etter, Roman Yangarber.

    Nodalida-2011: Nordic Conference on Computational Linguistics (2011) Riga, Latvia

  17. Probabilistic models for alignment of etymological data
    Hannes Wettig, Roman Yangarber.

    Nodalida-2011: Nordic Conference on Computational Linguistics (2011) Riga, Latvia

  18. Hidden Markov models for induction of morphological structure of natural language
    Hannes Wettig, Suvi Hiltunen, Roman Yangarber.

    WITMSE-2010: Workshop on Information Theoretic Methods in Science and Engineering (2010) Tampere, Finland

  19. Assessment of utility in Web mining for the domain of Public Health (pdf)
    Peter von Etter, Silja Huttunen, Arto Vihavainen, Matti Vuorinen, Roman Yangarber.

    In Proceedings of LOUHI-2010: the Second Louhi Workshop on Text and Data Mining of Health Documents, at the NAACL/HLT Conference, (2010) Los Angeles, California

  20. MedISys—Medical Information System
    Jens P. Linge, Ralf Steinberger, Flavio Fuart, Stefano Bucci, Jenya Belyaeva, Monica Gemo, Delilah Al-Khudhairy, Roman Yangarber, Erik van der Goot.

    In Advanced ICTs for Disaster Management and Threat Detection: Collaborative and Distributed Frameworks. Eleana Asimakopoulou, Nik Bessis (eds.), (2010) IGI GLobal Press, pp. 131-142.

  21. Real-time text mining in multilingual news for the Creation of a Pre-frontier Intelligence Picture (pdf)
    Jakub Piskorski, Martin Atkinson, Jenya Belyaeva, Vanni Zavarella, Silja Huttunen, Roman Yangarber.

    In Proceedings of the 16th Conference on Knowledge Discovery and Data Mining (KDD-2010); ACM SIGKDD Workshop on Intelligence and Security Informatics. (2010) Washington, DC

  22. Filtering news for epidemic surveillance: towards processing more languages with fewer resources
    Gael Lejeune, Antoine Doucet, Roman Yangarber, Nadine Lucas.

    CLIA: Fourth International Workshop On Cross Lingual Information Access, at COLING 2010 (2010) Beijing, China

  23. Utility evaluation of tools for collaborative development and maintenance of ontologies
    Alex Norta, Roman Yangarber, Lauri Carlson.

    VORTE-2010: Joint 5th International Workshop on Vocabularies, Ontologies and Rules for The Enterprise / International Workshop on Metamodels, Ontologies and Semantic Technologies (MOST) at EDOC-2010: the Fourteenth IEEE International Conference On Enterprise Computing (2010) Vitória, ES, Brazil

  24. News mining for border security intelligence (pdf)
    Jakub Piskorski, Martin Atkinson, Jenya Belayeva, Vanni Zavarella Silja Huttunen, Roman Yangarber.

    In IEEE ISI-2010: Intelligence and Security Informatics (2010) Vancouver, BC, Canada

  25. The landscape of international event-based biosurveillance (link)
    D Hartley, N Nelson, R Walters, R Arthur, R Yangarber, L Madoff, J Linge, A Mawudeku, N Collier, J Brownstein, G Thinus, N Lightfoot.

    In Emerging Health Threats Journal, 3:e3 (2010)

  26. Automated event extraction in the domain of Border Security (pdf)
    Martin Atkinson, Jakub Piskorski, Hristo Tanev, Eric van der Goot, Roman Yangarber, Vanni Zavarella.

    In Proceedings of MINUCS-2009: Workshop on Mining User-Generated Content for Security, at the UCMedia-2009: ICST Conference on User-Centric Media (2009) Venice, Italy

  27. Automatic epidemiological surveillance from on-line news in MedISys and PULS (pdf)
    Roman Yangarber, Peter von Etter, Ralf Steinberger.

    In Proceedings of IMED-2009: International Meeting on Emerging Diseases and Surveillance (2009) Vienna, Austria

  28. Internet surveillance systems for early alerting of health threats (link pdf)
    Jens P. Linge, Ralf Steinberger, Thomas P. Weber, Roman Yangarber, Erik van der Goot, Delilah H. Al-Khudhairy, Nikolaos I. Stilianakis

    In Eurosurveillance Journal, 14(13) (2009) Stockholm, Sweden

  29. Text mining from the Web for Medical Intelligence (pdf)
    Ralf Steinberger, Flavio Fuart, Erik van der Groot, Clive Best,
    Peter von Etter, Roman Yangarber.

    In: Mining Massive Data Sets for Security, D. Perrotta, J. Piskorski, F. Soulié-Fogelman & R. Steinberger (eds.): OIS Press. (2008) Amsterdam, The Netherlands

  30. Content Collection and Analysis in the Domain of Epidemiology (pdf)
    Roman Yangarber, Peter von Etter, Ralf Steinberger.

    In Proceedings of DrMED-2008: International Workshop on Describing Medical Web Resources, at MIE-2008: the 21st International Congress of the European Federation for Medical Informatics (2008) Göteborg, Sweden

  31. A Database of the Uralic Language Family for Etymological Research (pdf)
    Roman Yangarber, Marko Salmenkivi, Marjaana Välisalo.

    Technical Report C-2008-38. University of Helsinki, Department of Computer Science, Series of Publications C (2008)

  32. Combining information retrieval and information extraction for medical intelligence (pdf)
    Roman Yangarber, Ralf Steinberger, Clive Best, Peter von Etter, Flavio Fuart, David Horby.

    Mining Massive Data Sets for Security, NATO Advanced Study Institute (2007) Gazzada, Italy

  33. Combining Information about Epidemic Threats from Multiple Sources (pdf)
    Roman Yangarber, Clive Best, Peter von Etter, Flavio Fuart, David Horby, Ralf Steinberger.

    In Proceedings Multi-source, Multilingual Information Extraction and Summarization at RANLP-2007. (2007) Borovets, Bulgaria

  34. Verification of Facts across Document Boundaries (pdf)
    Roman Yangarber.

    In Proceedings IIIA-2006: International Workshop on Intelligent Information Access (2006) Helsinki, Finland

  35. Mining the Semantics of Text via Counter-Training (link)
    Roman Yangarber.

    In Proceedings of the 12th Portuguese Conference on Artificial Intelligence, EPIA-2005, Thematic area: Text Mining and Applications TEMA-2005
    Springer LNCS Vol. 3808, pp. 647-657 (2005) Covilhã, Portugal

  36. Redundancy-based Correction of Automatically Extracted Facts (pdf)
    Roman Yangarber, Lauri Jokipii.

    In Proceedings Human Language Technology Conference/ Conference on Empirical Methods in Natural Language Processing: HLT/EMNLP-2005, (2005) Vancouver, Canada

  37. Information Extraction from Epidemiological Reports    (pdf)
    Roman Yangarber, Lauri Jokipii, Antti Rauramo, Silja Huttunen.

    In Proceedings Human Language Technology Conference/ Conference on Empirical Methods in Natural Language Processing: HLT/EMNLP-2005, demonstration; (2005) Vancouver, Canada

  38. Use of Deep Syntax Parsing in Cross-Language Information Extraction
    Konstantin Bogatyrev, Roman Yangarber.

    In Proceedings Workshop on Intelligent Linguistic Technologies, International Conference on Machine Learning; Models, Technologies and Applications MLMTA-2005, pp. 18-24 (2005) Las Vegas, NV

  39. User-Oriented Evaluation in Information Extraction
    Roman Yangarber.

    In Proceedings Workshop on User-Oriented Evaluation of Knowledge Discovery Systems, 4th International Conference on Language Resources and Evaluation (LREC 2004) Lisbon, Portugal

  40. Information Extraction for Enhanced Access to Disease Outbreak Reports (link)
    Ralph Grishman, Silja Huttunen, Roman Yangarber.

    In Journal of Biomedical Informatics, 35 (4) pp. 236-246, C. Friedman, ed. (2003)

  41. Acquisition of Domain Knowledge (link)
    Roman Yangarber.

    Invited chapter In Extraction in the Web Era (M.T. Pazienza, ed.), Lecture Notes in Computer Science, Vol. 2700 Springer-Verlag Heidelberg, pp. 1-28 (2003) Rome, Italy

  42. Bootstrapped Learning of Semantic Classes from Positive and Negative Examples (pdf)
    Winston Lin, Roman Yangarber, Ralph Grishman.

    In Proceedings of the 20th International Conference on Machine Learning: ICML 2003 Workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining (2003) Washington, D.C.

  43. Counter-Training in Discovery of Semantic Patterns (pdf)
    Roman Yangarber.

    In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics: ACL-2003 (2003) Sapporo, Japan

  44. Unsupervised Learning of Generalized Names (ps.gz, pdf)
    Roman Yangarber, Winston Lin, Ralph Grishman.

    In Proceedings of the 19th International Conference on Computational Linguistics: COLING-2002 (2002) Taipei, Taiwan

  45. Complexity of Event Structure in IE Scenarios (ps, pdf)
    Silja Huttunen, Roman Yangarber, Ralph Grishman.

    In Proceedings of the 19th International Conference on Computational Linguistics: COLING-2002 (2002) Taipei, Taiwan

  46. Real-Time Event Extraction for Infectious Disease Outbreaks (pdf)
    Ralph Grishman, Silja Huttunen, Roman Yangarber.

    In Proceedings of the 3rd Annual Human Language Technology Conference HLT-2002 (2002) San Diego, CA

  47. Diversity of Scenarios in Information Extraction (ps) Silja Huttunen, Roman Yangarber, Ralph Grishman. In Proceedings of the 3rd International Conference on Language Resources and Evaluation LREC-2002 (2002) Las Palmas de Gran Canaria, Spain
  48. Scenario Customization for Information Extraction (ps)
    Roman Yangarber.

    PhD Thesis. (2001) New York University, Courant Institute of Mathematical Sciences.

  49. Automatic Acquisition of Domain Knowledge for Information Extraction (ps.gz)
    Roman Yangarber, Ralph Grishman, Pasi Tapanainen, Silja Huttunen.

    In Proceedings of the 18th International Conference on Computational Linguistics: COLING-2000 (2000) Saarbrücken, Germany

  50. Machine Learning of Extraction Patterns from Un-annotated Corpora (pdf)
    Roman Yangarber, Ralph Grishman.

    In Proceedings of the 14th European Conference on Artificial Intelligence: ECAI-2000 Workshop on Machine Learning for Information Extraction (2000) Berlin, Germany

  51. Extraction Pattern Discovery through Corpus Analysis (doc)
    Roman Yangarber, Ralph Grishman.

    In Proceedings of the 2nd International Conference on Language Resources and Evaluation: LREC-2000 Workshop: Information Extraction meets Corpus Linguistics (2000) Athens, Greece

  52. Unsupervised Discovery of Scenario-Level Patterns for Information Extraction (ps.gz)
    Roman Yangarber, Ralph Grishman, Pasi Tapanainen, Silja Huttunen.

    In Proceedings of Conference on Applied Natural Language Processing ANLP-NAACL 2000 pp. 282-289, (2000) Seattle, WA

  53. Issues in Corpus-Trained Information Extraction (doc)
    Ralph Grishman, Roman Yangarber.

    In Proceedings of International Symposium: Toward the Realization of Spontaneous Speech Engineering, pp. 107-112, (2000) Tokyo, Japan

  54. Transforming Examples into Patterns for Information Extraction (ps.gz)
    Roman Yangarber, Ralph Grishman.

    In Proceedings of TIPSTER Text Program Phase III, Morgan Kaufmann (1998) Baltimore, MD

  55. Japanese IE System and Customization Tool (ps.gz)
    Chikashi Nobata, Satoshi Sekine, Roman Yangarber.

    In Proceedings of TIPSTER Text Program Phase III, Morgan Kaufmann (1998) Baltimore, MD

  56. Deriving Transfer Rules from Dominance-Preserving Alignments (ps.gz)
    Adam Meyers, Roman Yangarber, Ralph Grishman, Catherine Macleod, Antonio Moreno-Sandoval.

    In Proceedings of COLING-ACL-98 (1998) Montreal, Canada

  57. Using NOMLEX to Produce Nominalization Patterns for Information Extraction (ps.gz)
    Adam Meyers, Catherine Macleod, Roman Yangarber, Ralph Grishman, Leslie Barrett, Ruth Reeves.

    In Proceedings of COLING-ACL-98 Workshop on Computational Treatment of Nominals, (1998) Montreal, Canada

  58. NYU: Description of the Proteus/PET System as Used for MUC-7 ST (ps.gz)
    Roman Yangarber, Ralph Grishman.

    In Proceedings of the 7th Message Understanding Conference: MUC-7 (1998) Washington, DC

  59. Customization of Information Extraction Systems (ps.gz)
    Roman Yangarber, Ralph Grishman.

    In Proceedings of International Workshop on Lexically-Driven Information Extraction, invited talk, pp. 1-11, (1997) Frascati, Italy

  60. Alignment of Shared Forests for Bilingual Corpora (ps.gz)
    Adam Meyers, Roman Yangarber, Ralph Grishman.

    In Proceedings of the 16th International Conference on Computational Linguistics: COLING-96 pp. 460-465 (1996) Copenhagen, Denmark

  61. ThinkSheet: A Tool for Tailoring Complex Documents
    Peter Piatko, Roman Yangarber, Daoi Lin, Dennis Shasha.

    ACM SIGMOD '96, demonstration (1996) Montreal, Canada