Guest lecture: Big data according to T.S. Eliot or How fine-grained provenance will save the world

Event type: 
Guest lecture
Event time: 
11.09.2014 - 12:15 - 13:00
Lecturer : 
Julia Stoyanovich

Assistant Professor Julia Stoyanovich (Drexel University, USA) will give a guest lecture titled "Big data according to T.S. Eliot or How fine-grained provenance will save the world" on Thursday, September 11 at 12:15 in Exactum B119. Welcome!


In this talk I will present a novel provenance framework that marries database-style and workflow-style provenance. In current workflow management systems it is assumed that each module is a black box, i.e., that each output depends on all inputs, capturing coarse-grained data dependencies. In contrast, in our framework, Pig Latin is used to expose module functionality, capturing internal state and fine-grained data dependencies. A critical ingredient in our solution is the use of a novel kind of a provenance graph that yields a compact representation of fine-grained provenance and enables several novel graph transformation operations in support of zoom-in / zoom-out and what-if workflow analytic queries. We implemented our approach in the Lipstick system, and developed a benchmark, demonstrating that tracking and querying fine-grained provenance is feasible.


Julia Stoyanovich is an Assistant Professor of Computer Science at the College of Computing and Informatics at Drexel University (Philadelphia, USA). Prior to joining Drexel, she was a Postdoctoral researcher and an NSF/CRA Computing Innovations Fellow at the University of Pennsylvania. Julia received her MS and PhD degrees in Computer Science at Columbia University (New York, USA) in 2003 and 2009, respectively, and her BS in Computer Science and in Mathematics and Statistics at the University of Massachusetts Amherst, USA in 1998. Having graduated from college, Julia spent 5 years in the start-up industry, as a software developer, data architect and database administrator. This experience has motivated her to work with real datasets whenever possible, and to deliver results of her research to the communities of target users, as part of open-source systems or as stand-alone prototypes.

Julia's research is in the area of data and knowledge management. Her focus is on developing novel information discovery approaches, with the goal of helping the user identify relevant information, and ultimately transform that information into knowledge. She has recently worked with a wide variety of real datasets, from shopping, dating and collaborative tagging applications, to full-genome association studies and gene expression microarrays, to data-intensive workflows and scientific articles.  Additional information about her research is available at


08.09.2014 - 12:04 Jukka Paakki
08.09.2014 - 12:00 Jukka Paakki