Finding Robust Itemsets Under Subsampling

Event type:

Guest lecture

Event time:

02.05.2012 - 14:15 - 15:00

Lecturer :

Nikolaj Tatti

Place:

Exactum C222

Description:

Abstract Mining frequent patterns is plagued by the problem of pattern explosion making pattern reduction techniques a key challenge in pattern mining. In this talk we propose a theoretical framework for pattern reduction. We do this by measuring the robustness of a property of an itemset such as closedness or non-derivability. The robustness of a property is the probability that this property holds on random subsets of the original data. We study four properties: closed, free, non-derivable and totally shattered itemsets, demonstrating how we can compute the robustness analytically without actually sampling the data. Our concept of robustness has many advantages: Unlike statistical approaches for reducing patterns, we do not assume a null hypothesis or any noise model and the patterns reported are simply a subset of all patterns with this property as opposed to approximate patterns for which the property does not really hold. If the underlying property is monotonic, then the measure is also monotonic, allowing us to efficiently mine robust itemsets. We further derive a parameter-free technique for ranking itemsets that can be used for top-k approaches.

Bio Nikolaj Tatti is a postdoctoral researcher at the Advanced Database Research and Modelling group of the University of Antwerp, Belgium. He received his PhD in 2009 from Department of Information and Computer Science, Helsinki University of Technology, Finland. Major part of Nikolaj's work focuses on discovering efficiently statistically significant non-redundant patterns from binary and sequential data. Other topics include using patterns as a surrogate for the original data in various tasks such as computing queries or determining the distance between two data sets. Nikolaj Tatti's research interests are algorithms, statistics, statistical analysis of the algorithms, algorithms involved with computing statistics, and mathematics in general.

Last updated: 23.04.2012 - 10:47 Hannu Toivonen
Post date: 23.04.2012 - 10:47 Hannu Toivonen

Permanent link: https://www.cs.helsinki.fi/en/node/71737

Printer-friendly version

Address: Department of Computer Science, P.O. 68 (Gustaf Hällströmin katu 2b), FI-00014 UNIVERSITY OF HELSINKI, FINLAND
Opening Hours: During spring and autumn semesters Mon - Fri 7.45 - 19.45 (7.45 am - 7.45 pm)
Phone: +358 9 1911 (University switch)
General e-mail: info [at] cs.helsinki.fi
Fax: +358 9 876 4314

Department of Computer Science [pre 2018 site]

University of Helsinki

Faculty of Science

Finding Robust Itemsets Under Subsampling