The solutions should be ready for inspection by Thursday 6.3.2002 (midnight).
Remember that always, if you are in doubt what you should do, you can ask Lili or send a message to our newsgroup!!
In this exercise, we study the paper Riloff, Jones: Learning Dictionaries for Information Extraction by Multi-level Bootstrapping. Try to find answers to the following questions:
What are the prerequisites of the method: What kind of input it needs and how this input should be processed/represented?
Describe very briefly the basic idea of the algorithm.
What is the output?
What the human user has to do?
Let's study the multi-level bootstrapping method more detailly. Assume we have the following document collection that has been analysed by a syntactic analyser (only parts relevant to this task are shown). Abbreviations: n = noun, np = noun phrase, av = active verb, p = preposition:
np(Mason) av(waits) p(with) np(n(dozens) p(of) np(other n(tourists))) p(in) np(a long line). np(Sixteen charter planes) av(landed) p(in) np(a single n(day)) p(at) np(the sea n(resort) p(of) np(Hurghada)). Last year np(Egypt) av(attracted) many tourists who av(came) p(to) np(the Middle East). He av(runs) np(a papyrys n(shop)) p(in) the old n(city) p(of) np(Cairo). np(Stone Town) is the urban n(center) p(of) np(Zanzibar). Few cars av(came) p(to) np(the south n(coast) p(of) np(Zanzibar)). The package includes a half-day tour to the n(city) p(of) np(Hurghada). The shop is located right at the city n(center) p(of) np(Cairo). Labor av(united) p(with) np(immigrants) on reform issues. np(n(City) p(of) np(Nairobi)) unveils a new user-firendly bike map. The government of the region av(asked) the security advicer at the U.S. n(Embassy) p(in) np(Nairobi) about the warning. The attackers blew up the U.S. n(Embassy) p(in) np(Dar es Salaam). The n(city) p(of) np(Zanzibar) av(consists) p(of) np(Stone Town and Ngambo). Previous n(visitors) p(to) np(Mount Kumgang) had to go by ferry. His n(visit) p(to) np(Cairo) was delayed. In 1964 Tanganyika av(united) p(with) np(Zanzibar) to form Tanzania.
Assume further that we want to use only the following two AutoSlog heuristic rules:
noun prep <noun-phrase> active-verb prep <noun-phrase>
If the set of seed words is Cairo and Zanzibar, which other words would be added to the semantic lexicon? Why? It is enough if you study the first part of the method ('Mutual Bootstrapping') only.
As the data set is very small, you can use a simpler score, e.g. score(pattern) = R * F.