Bringing PC-ACE to perfection

Customer

Fabio Cunial (HIIT), Roberto Franzosi (Emory Univ. USA)

Summary

PC-ACE (Program for Computer-Assisted Coding of Events) is a research software developed to carry out a particular type of textual analysis used in sociology: Quantitative Narrative Analysis (QNA, see Franzosi 2010, 2012; Franzosi, De Fazio and Vicari, 2012). While there are several, commercial software programs available to perform textual analysis for the social scientists (e.g., Atlas.ti, NVivo, MaxQda), none give users the analytical power and control that PC-ACE does (see Franzosi, Doyle, McClelland, Putnam Rankin, and Vicari, 2012). This project is about bringing the current version of PC-ACE to perfection, so that it can be widely distributed to social scientists.

Objectives

Few annoying bugs and a fundamental lack of easy-to-use data handling tools have so far stood in the way of a wider diffusion of PC-ACE. This project aims at fixing all standing bugs and at giving users more flexible tools for checking and managing their data.

  1. Setup module - The module for the setup of the story grammar allows PC-ACE users to handle only one type of document (e.g. a newspaper article) and crossreference this document to the objects setup for the story grammar of data collection. If users wanted to collect data from different types of sources (e.g., newspaper articles and archival documents and books), this would not be possible in the current design.
  2. Data entry module - Many users perform text coding in Excel, in a rectangular matrix for what should be a relational problem. They would welcome the opportunity to work in PC-ACE and take advantage of all the built-in functions of data checking, querying, visualization, and analysis. PC-ACE does have many functions of importing Excel data but not a generalized import function that would allow unfamiliar users to have PC-ACE automatically build the grammar as it imports the data. Many users feel that learning the language of a story grammar is a daunting task. We need a generalized routine for importing from an Excel spreadsheet in a way that the story grammar corresponding to the Excel data is automatically set up by the import routine. That would require parameterizing several routines of data import from Excel files already available in PCACE. We also need to extend on-line data validity checks in the data entry forms. Data entry in PC-ACE is typically done using the same words found in the original documents. Verbs such as “kill”, “wound”, or “kick” are coded as such under an object “verb phrase”. That, in a typical project, results in hundreds of different verbs. For the purpose of data analysis, these disaggregated items need to be aggregated. Routines need to be introduced that would allow users to code aggregate items bringing up automatically a distribution of previous values.
  3. Query module - The GUI-based Query Manager (QM) first introduced in PC-ACE in 2009 provides a powerful tool for users to carry out SQL queries on the basis of objects they setup in the grammar without any knowledge of the underlying table structure, unknown to the end users (see Figure). Unfortunately, both the QM and the underlying query language upon which it relies to translate queries based on user-defined grammar objects to SQL tables, have limitations which make the tools less than ideal. For instance, the QM only allows AND conditions for specific sub-branches of a tree; we need to introduce OR conditions. We also need to extend the GUI to include usertable and not just PC-ACE tables – handling of both usertable and PC-ACE tables requires an SQL query that the user must write, another daunting task for the average social scientist, although the SQL/EQL form offers many tools for the automatic creation of SQL queries (see Figure). Finally, the EQL language behind the Query Manager has some bugs, leading to faulty query results: these needs to be found and fixed.
  4. Data cleaning module - Users need more flexible tools of data manipulation in the process of data cleaning. This would require generalized routines for the copying, cutting, pasting, moving, deleting objects as revealed by a query. Several routines of this kind are available but, developed at different times and for different purposes, they vary in programming standards. What is needed is to parameterize all these routines eliminating redundancies. Furthermore, there is a need for a generalized routine for checking documents/objects crossreferences and for fixing faulty documents/objects crossreferences, in case errors are introduced (by PC-ACE crashes, for instance).

Advantages for the student

  1. Impact: develop good code that other people will use, and which will be part of a larger project. This could go on the student’s CV.
  2. Receive a frameable “Certificate of Participation” signed by the Chair of the Sociology department at Emory University and by Dr. Franzosi. Emory University is one of America’s top 20 universities.
  3. Talented students will have the possibility of continuing the collaboration with the development of PC-ACE past the work required by this project.
  4. Not just software development: have fun hacking with existing code, experimenting with it and improving it.

Tentative work organization

Regular weekly Skype meetings with Dr Franzosi will be the main form of task organization. Dr Franzosi knows the program extremely well. Fabio Cunial has also collaborated to the development of PC-ACE since 2006: if necessary, students working on the project can rely locally on his expertise and advice. Most tasks in this project are highly modular: for example, the work on the query module and on the EQL language behind it is completely decoupled from the work on other modules. However, making some key changes to PC-ACE requires all students to work together, due to nontrivial long-range dependencies in legacy code.

Additional info

See document ACE proposal.

Immateriaalioikeudet

Projekti toteutetaan laitoksen yleisen lisenssisopimuksen alaisuudessa.