In this exercise, we study the algorithm of AutoSlog-TS (Riloff, E. Automatically Generating Extraction Patterns from Untagged Text"). Assume, our text collection contains the following documents:
text 1; relevant s: A group of terrorists v: attacked do: a post pp: in Nuevo Progreso. text 2; relevant s: The National offices v: were attacked time: today. s: Unidentified individuals v: detonated do: a bomb. s: The bomb v: destroyed do: a car. text 3; not relevant s: The Armed Forces units v: killed do: one rebel. s: They v: destroyed do: an underground hideout. text 4; relevant s: Unidentified individuals v: attacked do: a high tension tower. s: They v: destroyed do: it. text 5; not relevant s: The coca growers v: protest do: the destruction of their fields. s: The strike v: is supported pp: by the Shining Path.
Explain the process of AutoSlog-TS using these documents and give the ranking for the extraction patterns that are generated.
Abbreviations: s=subject, v=verb, do=direct object, pp=preposition phrase
In this exercise, we study the paper Riloff, Jones: Learning Dictionaries for Information Extraction by Multi-level Bootstrapping.
Assume we have the following document collection that has been analysed by a syntactic analyser (only parts relevant to this task are shown). Abbreviations: n = noun, np = noun phrase, av = active verb, p = preposition:
np(Mason) av(waits) p(with) np(n(dozens) p(of) np(other n(tourists))) p(in) np(a long line). np(Sixteen charter planes) av(landed) p(in) np(a single n(day)) p(at) np(the sea n(resort) p(of) np(Hurghada)). Last year np(Egypt) av(attracted) many tourists who av(came) p(to) np(the Middle East). He av(runs) np(a papyrys n(shop)) p(in) the old n(city) p(of) np(Cairo). np(Stone Town) is the urban n(center) p(of) np(Zanzibar). Few cars av(came) p(to) np(the south n(coast) p(of) np(Zanzibar)). The package includes a half-day tour to the n(city) p(of) np(Hurghada). The shop is located right at the city n(center) p(of) np(Cairo). Labor av(united) p(with) np(immigrants) on reform issues. np(n(City) p(of) np(Nairobi)) unveils a new user-firendly bike map. The government of the region av(asked) the security advicer at the U.S. n(Embassy) p(in) np(Nairobi) about the warning. The attackers blew up the U.S. n(Embassy) p(in) np(Dar es Salaam). The n(city) p(of) np(Zanzibar) av(consists) p(of) np(Stone Town and Ngambo). Previous n(visitors) p(to) np(Mount Kumgang) had to go by ferry. His n(visit) p(to) np(Cairo) was delayed. In 1964 Tanganyika av(united) p(with) np(Zanzibar) to form Tanzania.
Assume further that we want to use only the following two AutoSlog heuristic rules:
noun prep <noun-phrase> active-verb prep <noun-phrase>
If the set of seed words is Cairo and Zanzibar, which other words would be added to the semantic lexicon? Why? It is enough if you study the first part of the method ('Mutual Bootstrapping') only.
As the data set is very small, you can use a simpler score, e.g. score(pattern) = R * F.
Study the ProMED-PULS Epidemiological Fact Base http://doremi.cs.helsinki.fi/puls/ and try its Web interface.
Search for cases of "avian influenza" (bird flu) and try to find examples where the extraction system has made mistakes.
Give feedback about the course: [ In Finnish] [ In English]