582410 Processing of large document collections, Exercise 4



  1. Both the Boguraev-Kennedy method and the MEAD method use cross-sentence dependencies to decide which fragments of text are important. Sketch a hybrid method that combines these methods. For instance, consider how the Boguraev-Kennedy method could be modified to work in the multiple-document summarization case, and/or how the ideas of the MEAD method could be used in the single-document summarization of the Boguraev-Kennedy method. You can replace features with your own favorites and also make other modifications/simplifications to the methods, if you like.


  2. In the exercises 2 and 3 we try to figure out what might happen in the lexical analysis and name recognition phases of an information extraction process. Study the following document fragments.

     
    Police sources have reported that
    unidentified individuals planted a bomb in front of a Mormon Church in
    Talcahuano District. The bomb, which exploded and caused property
    damage worth 50,000 pesos, was placed at a chapel of the Church of
    Jesus Christ of Latter-Day Saints located at No 3856 Gomez Carreno
    Street.
    
    Prosecutor Juan Carbone Herrera requested the 25 years imprisonment
    for General Rolando Cabezas Alarcon of the Republican Guard for
    ordering the shooting of 124 of the San Pedro prison inmates.
    
    Last night in San Clemente District, 9 km north of Pisco, a
    group of terrorists dynamited machinery belonging to Albolones
    Peruanos, Inc.
    
  3. Give examples of information that is available in the lexical analysis of these sentences. You can assume that some language analyser or special dictionaries are available. You don't have to analyse all the text, just give some examples of the output of the lexical analysis phase using the sample text fragments.

    You can find examples and descriptions on what language analysers can do on the web pages of the following language analyzers:

  4. Give examples of names and other special forms in the sample fragments. Try to formulate informal rules for finding the names and special forms, using the knowledge you found above in (2). You can also try to formulate the rules using regular expressions.



Helena Ahonen-Myka
Last modified: Tue Apr 4 18:32:11 EEST 2006