From Structure-based to Semantic-based: Towards Effective XML Keyword Search

Tapahtuman tyyppi: 
23.05.2016 - 14:15 - 16:00
Prof. Tok Wang Ling/ National University of Singapore
B222, Exactum
Keyword search in XML has gained popularity as it provides a user-friendly and easy way for users to query the XML data. Existing XML keyword search approaches on XML trees such as Lowest Common Ancestor (LCA), SLCA, MLCA, VLCA, and ELCA, are all LCA-based and they rely on the hierarchical structure of the XML document. This causes serious problems in processing XML keyword queries, such as meaningless answers, duplicated answers, incomplete answers, missing answers, and schema dependent answers. We analyze these serious problems of existing keyword search methods and show that the main reason of causing these problems is due to the unawareness of the Object-Relationship-Attribute (ORA) semantics in XML. With the knowledge of ORA-semantics in the XML document, we are able to detect duplications of objects and relationship and resolve the first three problems of the LCA-based search approaches.  We present a new novel concept, called Common Relative (CR), and an algorithm based on the CR semantics to find more answers beyond LCA, i.e., the missing answers. The algorithm is independent of schema designs of the same data content as well.  Lastly, we extend the keyword query language to include keywords that match the metadata, i.e., the names of tags in XML document, and with group-by and aggregate functions including count, max, min, sum, etc. To process extended keyword queries correctly, we must use the ORA-semantics in the XML document to detect duplications of objects and relationships. Without using ORA-semantics, keyword queries with aggregate functions will be computed wrongly and return incorrect answers.
Short biography:  Prof.. Tok Wang Ling is a professor in Computer Science Department at the National University of Singapore. He was the Head of IT Division, Deputy Head of the Department of Information Systems and Computer Science, and Vice Dean of the School of Computing of the University. Before joining the University as a lecturer in 1979, he was a scientific staff at Bell Northern Research, Ottawa, Canada.  His current research interests include Database Modeling, Entity-Relationship Approach, Object-Oriented Data Model, Normalization Theory, Semi-Structured Data Model, XML Twig Pattern Query Processing, XML and Relational Database Keyword Query Processing. He serves/served on the steering committees of 5 international conferences, including ER, DASFAA, and BigComp. He was the steering committee chair of both ER and DASFAA. He served as Conference Co-chair of 11 international conferences, including ER 2004, DASFAA 2005, SIGMOD 2007, VLDB 2010, and BigComp 2015. He served as Program Committee Co-chair of 6 international conferences, including DASFAA 1995, ER 1998, 2003, and 2011.  He received the ACM Recognition of Service Award in 2007, the DASFAA Outstanding Contributions Award in 2010, and the Peter P. Chen Award in 2011. He is an ER Fellow.
Host: Associate professor Jianheng Lu
20.04.2016 - 14:26 Pauliina M J Pajunen
15.04.2016 - 15:04 Pauliina M J Pajunen