Skip to content

Public Data Sets

This document is crowdsourced. Contributions are welcomed! Please feel free to suggest new data resources that you discover.

Data.Gov

data.gov is the official US government open data site. Here you will find links to a wide variety of data sets in categories from agriculture to weather.

Kaggle Datasets

Kaggle Datasets

Google Dataset Search lets you find datasets wherever they are hosted, whether it’s a publisher’s site, a digital library, or an author’s web page. It’s a phenomenal dataset finder, and it contains over 25 million datasets.

UCI Machine Learning Repository

UCI Machine Learning Repository: The Machine Learning Repository at UCI provides an up to date resource for open-source datasets.

CMU Libraries Dataset for Machine Learning

CMU Libraries: Discover high-quality datasets thanks to the collection of Huajin Wang, at CMU. Also links to other data repositories.

Group Lens

The University of Minnesota's GroupLens Project makes available large data sets on movies and ratings from the experimental MovieLens web site.

Many of the social networking sites publish streams of data for research purposes. Formats and available information vary widely. Some of them require you to register as a researcher, and potentially pay a fee, in order to gain access to the richest data.

Student-Contributed Data Sources Maintained by Professor Reiley

This Google doc is an editable document with contributions from past students.

Amazon’s Public Open Data Set

This registry exists to help people discover and share datasets that are available via AWS resources.
https://registry.opendata.aws/

Google Big Query Public Datasets

https://cloud.google.com/bigquery/public-data/