Public Data Sets
This document is crowdsourced. Contributions are welcomed! Please feel free to suggest new data resources that you discover.
Data.Gov
data.gov is the official US government open data site. Here you will find links to a wide variety of data sets in categories from agriculture to weather.
Kaggle Datasets
Google Dataset Search
Google Dataset Search lets you find datasets wherever they are hosted, whether it’s a publisher’s site, a digital library, or an author’s web page. It’s a phenomenal dataset finder, and it contains over 25 million datasets.
UCI Machine Learning Repository
UCI Machine Learning Repository: The Machine Learning Repository at UCI provides an up to date resource for open-source datasets.
CMU Libraries Dataset for Machine Learning
CMU Libraries: Discover high-quality datasets thanks to the collection of Huajin Wang, at CMU. Also links to other data repositories.
Group Lens
The University of Minnesota's GroupLens Project makes available large data sets on movies and ratings from the experimental MovieLens web site.
Many of the social networking sites publish streams of data for research purposes. Formats and available information vary widely. Some of them require you to register as a researcher, and potentially pay a fee, in order to gain access to the richest data.
Student-Contributed Data Sources Maintained by Professor Reiley
This Google doc is an editable document with contributions from past students.
Amazon’s Public Open Data Set
This registry exists to help people discover and share datasets that are available via AWS resources.
https://registry.opendata.aws/