Homework 5: Natural language interface to weather forecasts

Background

Online five-day or seven-day weather forecasts are commonly presented in tabular format consisting of symbols, numbers, and simple text. For instance,

For examples, see BBC's or Weather channels' extended forecasts for Helsinki.

This format may not be optimal, for instance to users with vision impairment who need to rely on screen interpreters, which can only handle information in textual, not pictorial format. Furthermore, for some others symbolic information may be hard to interpret and understand.

Task

This week your task is to design and implement a program that converts symbolic weather forecasts to prose, i.e., to fluent text that follows grammatical rules of the language, and conveys the information presented in the forecast correctly and clearly. We advise you to use the English language, but if you insist, you may take the challenge of outputting the forecast in Finnish.

More precisely, your program should

  1. convert the tabular weather information into some intermediate representation that defines the primitives and constraints, numerical ranges, and time scales used in the weather forecasts. This knowledge representation can be a frame, script, or something comparable to pddl-language. Bare in mind, that this representation should be general enough to be useful for other purposes also, for instance, for automated reasoning.
  2. use the above intermediate representation to generate natural language output of the weather forecast. In order to do this, you may want to include already in the intermediate representation some general notions of time and changes over time, e.g., instead of outputting each day's temperature separately, your program should be able to abstract and tell the reader that "the days are getting colder toward the end of the week."

You are also asked to write a report in which you describe your approach, specifically the intermediate knowledge representation and the natural language generation process.

Technical details

As for now, we only use the BBC weather forecast as the test case. The weather forecast information is read from the website with the command: lynx -dump "URL". In order you to fulfill the task, we give you a wetable.py-script that extracts the weather information table from the lynx output.

An example

Here is an example that fetches the webpage in text format from http://www.bbc.co.uk/weather/5day.shtml?world=0034 and extracts the 5-day weather forcast as a tab-limited table.

In the table there is one row per day, and the columns are Day, Sunrise, Sunset, Overall Forcast, Day Temperature (C), Night Temperature (C), Wind Direction, Wind Speed (mph), Visibility, Air Pressure (mbar), Humidity (%), and UV index.

lynx -dump http://www.bbc.co.uk/weather/5day.shtml?world=0034 | python wetable.py
Saturday   07:45  16:21  light snow    0   -4   East South Easterly Wind     7   poor       1007   94  1
Sunday     07:48  16:19  sunny        -1  -12   North North Easterly Wind    7   moderate   1021   85  1
Monday     07:50  16:16  light rain    3    3   South Easterly Wind          5   moderate   1028   86  1
Tuesday    07:53  16:14  light rain    3    0   South South Westerly Wind   25   moderate   1008   90  1
Wednesday  07:55  16:11  light snow    0   -2   East South Easterly Wind     5   moderate   1010   93  1

Your program should read the table like the one above from stdin and write the natural language weather report to stdout. It should also produce a file KR.txt that contains your knowledge representation for producing the report.

Submission

Submit your report (preferably a PDF file) and the program with source code as a gzipped tar file (or ZIP file) in Moodle. Name the gzipped tar file as "yourusername"-hw5.pdf, where "yourusername" is your username in the CS system.


This homework is due on Friday, November 16th, 2007, at 23:59.

Results