Information extraction from text, Week 4



The solutions should be ready for inspection by Thursday 6.3.2002 (midnight).

Remember that always, if you are in doubt what you should do, you can ask Lili or send a message to our newsgroup!!


  1. In this exercise, we study the paper Riloff, Jones: Learning Dictionaries for Information Extraction by Multi-level Bootstrapping. Try to find answers to the following questions:

  2. Let's study the multi-level bootstrapping method more detailly. Assume we have the following document collection that has been analysed by a syntactic analyser (only parts relevant to this task are shown). Abbreviations: n = noun, np = noun phrase, av = active verb, p = preposition:


    np(Mason) av(waits) p(with) np(n(dozens) p(of) np(other n(tourists))) 
    p(in) np(a long line).
    
    np(Sixteen charter planes) av(landed) p(in) np(a single n(day)) 
    p(at) np(the sea n(resort) p(of) np(Hurghada)).
    
    Last year np(Egypt) av(attracted) many tourists who av(came) 
    p(to) np(the Middle East).
    
    He av(runs) np(a papyrys n(shop)) p(in) the old n(city) p(of) np(Cairo).
    
    np(Stone Town) is the urban n(center) p(of) np(Zanzibar).
    
    Few cars av(came) p(to) np(the south n(coast) p(of) np(Zanzibar)).
    
    The package includes a half-day tour to the n(city) p(of) np(Hurghada).
    
    The shop is located right at the city n(center) p(of) np(Cairo).
    
    Labor av(united) p(with) np(immigrants) on reform issues.
    
    np(n(City) p(of) np(Nairobi)) unveils a new user-firendly bike map.
    
    The government of the region av(asked) the security advicer 
    at the U.S. n(Embassy) p(in) np(Nairobi) about the warning.
    
    The attackers blew up the U.S. n(Embassy) p(in) np(Dar es Salaam).
    
    The n(city) p(of) np(Zanzibar) av(consists) p(of) np(Stone Town and
    Ngambo).
    
    Previous n(visitors) p(to) np(Mount Kumgang) had to go by ferry.
    
    His n(visit) p(to) np(Cairo) was delayed.
    
    In 1964 Tanganyika av(united) p(with) np(Zanzibar) to form Tanzania. 
    
    

    Assume further that we want to use only the following two AutoSlog heuristic rules:

    noun prep <noun-phrase>
    active-verb prep <noun-phrase>
    

    If the set of seed words is Cairo and Zanzibar, which other words would be added to the semantic lexicon? Why? It is enough if you study the first part of the method ('Mutual Bootstrapping') only.

    As the data set is very small, you can use a simpler score, e.g. score(pattern) = R * F.



Helena Ahonen-Myka
Last modified: Thu Feb 27 20:21:45 EET 2003