Carat Top 1000 Users Long-Term App Usage Dataset

This dataset is released to the public for research purposes only. Commercial use is strictly prohibited. It is forbidden to attempt discovery of the identities of participating users. Works that use the dataset must mention the name of the dataset "Carat Top 1000 Users Long-Term App Usage Dataset" and cite the original article (also available online:

A. J. Oliner, A. P. Iyer, I. Stoica, E. Lagerspetz, and S. Tarkoma.
Carat: Collaborative Energy Diagnosis for Mobile Devices.
In Proceedings of the 11th ACM Conference on Embedded Networked Sensor Systems, (10 pages), 2013, ACM.

By downloading and using the dataset, you agree to the above conditions. To download the dataset, use the link at the bottom of this page. The zip archive containing the dataset is password protected; the password is


How the Dataset was Collected

The dataset consists of data collected from volunteers who installed the Carat energy awareness application ( ). The app was developed during collaborative work of University of Helsinki and University of California at Berkeley. The app and all of its servers and data analytics services have been managed by University of Helsinki since the end of 2013. The app collected application usage and battery level information every time the battery level changed by 1%, as allowed by the mobile operating system. The app focused on energy improvement, and did not sample data regularly, and only sent data to our servers when the user opened the app. As a result, the data is sparse and large time periods may be missing for some users. As the app focused on energy use, it is likely that users that had it installed did so to solve energy issues on their device, were interested in contributing to research, and/or were keen followers of technology news. This results in a possible bias of users, which has to be considered when using the dataset. However, the dataset as a whole is very diverse and contains users from many countries, perhaps mitigating the bias. See also:

What the Dataset Contains

The dataset contains three facets of data:

  1. Time series of app usage data (only the top 10000 apps based on occurrence count in our data are included)
  2. User registrations with OS and device Model history of users
  3. App category information crawled from Google Play
  4. The list of the top 10000 apps that are available in the dataset.

The first contains the top 1000 users ranked by the total duration using Carat since the beginning of 2014. There are 18,146,042 time series records spanning 4.65 years for the longest duration users, and over 2 years even for the 1000th. The records are from over 100 countries, and 315 timezones. For each user, the dataset includes a time series of data samples, the following fields in JSON format:

Where app usage is a list of JSON objects with the following attributes:

The second contains information like the following in JSON format:

These fields are updated as changes in the OS version are detected, so some users may have a history of OS versions associated with the same device over time. Again, some OS versions wipe app data, and may not allow Carat to follow OS versions over time.

The third contains CSV data like the following:


These are primary app categories as displayed by Google play on the app's info page accessible at For example, has the category TOOLS.

The final file contains csv data like the following:


Where numberOfsamples is the number of occurrences of the app in the dataset.

Questions regarding the dataset and its use can be addressed to .

The dataset can be downloaded here: