Information extraction from text, Week 1



The solutions should be ready for inspection by Thursday 13.2.2003 (midnight).


  1. The goal of this exercise is to learn (or brush up) some basic linguistic terminology that we need in the course. As a material we use the resources of the web site Guide to Grammar and Writing. [Some terminology in Finnish]

    Study the page Identifying Basic Sentence Parts, and test your knowledge with A Very Basic Quiz. As your "answer" to this exercise you should do the following:

    You may want to check these pages, as well:



  2. Study the following document fragments.

     
    Police sources have reported that
    unidentified individuals planted a bomb in front of a Mormon Church in
    Talcahuano District. The bomb, which exploded and caused property
    damage worth 50,000 pesos, was placed at a chapel of the Church of
    Jesus Christ of Latter-Day Saints located at No 3856 Gomez Carreno
    Street.
    
    Prosecutor Juan Carbone Herrera requested the 25 years imprisonment
    for General Rolando Cabezas Alarcon of the Republican Guard for
    ordering the shooting of 124 of the San Pedro prison inmates.
    
    Last night in San Clemente District, 9 km north of Pisco, a
    group of terrorists dynamited machinery belonging to Albolones
    Peruanos, Inc.
    

    Try to figure out what might happen in the lexical analysis and name recognition phases when these texts are processed. For instance:

    1. Give examples of information that is available in the lexical analysis of these sentences. You can assume that some language analyser or special dictionaries are available. You don't have to analyse all the text, just give some examples of the output of the lexical analysis phase using the sample text fragments.

      You can find examples and descriptions on what language analysers can do on the web pages of the following language analyzers:

    2. Give examples of names and other special forms in the sample fragments. Try to formulate informal rules for finding the names and special forms, using the knowledge you found above in (1). You can also try to formulate the rules using regular expressions.



    Helena Ahonen-Myka
    Last modified: Wed Feb 5 13:00:05 EET 2003