Mian Du defends his PhD thesis on Natural Language Processing Systems for Business Intelligence on November 29th, 2017

M.Sc. Mian Du will defend his doctoral thesis Natural Language Processing Systems for Business Intelligence on Wednesday the 29th of November 2017 at 12 o'clock noon in the University of Helsinki Exactum Building, Auditorium A111 (Gustaf Hällströmin katu 2b). His opponent is Senior Lecturer Stevenson (University of Sheffield, United Kingdom), and custos Professor Sasu Tarkoma (University of Helsinki). The defence will be held in English.

Natural Language Processing Systems for Business Intelligence

The ongoing information explosion has a particular impact on business areas, involving corporate strategy and business decision-making. Business intelligence tools aim to help users to understand market trends, which is critical for their day-to-day operations. For example, it is a typical business intelligence task to effectively obtain accurate and relevant information about the competitor’s activity in the same industry sector. This thesis presents research on a natural language processing system, which aims to address the problem of information overload in the business domain. It uses document filtering, information extraction, and supervised and semi-supervised learning. Input to the system includes news documents from on-line news websites and company press pages.

We first demonstrate that a combination of NLP techniques and frequent sequential pattern mining can be used for finding patterns from unstructured natural-language text, i.e., news articles. The patterns relate to a specific domain of news. Evaluation results show that scenario-based summarization can filter out irrelevant documents and also extract important sentences from relevant documents as summaries for pre-defined scenarios in a specific domain. For document-level filtering, this method achieves very high precision, while keeping quite high recall in our study.

Next, we present experiments with supervised learning for labelling business-news documents with multiple industry sectors. The main contribution is that combining a named-entity-based rote classifier with the balanced classifiers yields better results than either classifier alone. This method also improves on the best score previously reported, while using the same amount of training data for the rote classifier, and considerably less for the statistical classifiers.

We then explore the interplay between company news, social media visibility, and stock prices. Information extracted from on-line news by means of the deep linguistic analysis is used to construct queries to various social media platforms. The main results presented in the thesis demonstrate the interesting correlations between the mentions of a company in the news and the views of its page in Wikipedia.

Based on the above research topics, the thesis also presents the design and architecture of a complete decision-support system. The system is an example of using the above research results to extract, analyze and organize information from plain-text news.

Availability of the dissertation

An electronic version of the doctoral dissertation is available on the e-thesis site of the University of Helsinki at http://urn.fi/URN:ISBN:978-951-51-3901-6.

Printed copies will be available on request from Mian Du: mian.du@cs.helsinki.fi.

 

29.11.2017 - 16:52 Pirjo Moen
23.11.2017 - 15:59 Pirjo Moen