Open Source Datasets
Gathering the right data is one of the most important task for data engineers. If you are looking for data for your own, self-directed project, or for data to enrich the data for your project, below are some good resources to find interesting datasets.
- Google: Dataset Search
- Kaggle Datasets
- Github: Awesome Public Datasets
- Dataquest: 18 places to find data sets for data science projects
- KDnuggets: Datasets for Data Mining and Data Science
- UCI Machine Learning Repository
- Reddit: r/datasets/
This Kaggle dataset is a good example of someone who found disparate datasets and combined them to provide an even more valuable dataset for others to analyze.