The Data Process
At every step of the data process – framing and scoping the problem; gathering or generating the data; cleaning, organizing, and combining the data; and inferring some piece of knowledge from the groomed data – care and diligence are required to arrive at the best solution or answer.
In a previous post, we explored Max Shron’s system for framing and scoping the problem. And many future posts will undoubtedly dig very deeply into the last two steps. At his point in our journey, though, it is time we focus our attention on the second step, specifically the gathering of data. Below is a (far from comprehensive) list of sources of data sets in a variety of categories.
For US-centric data sets, data.gov is a great place to start. It aggregates data sets from 230 organizations, including the USDA, state governments, the US Census Bureau, and many other data collecting bureaus. At the time of this writing, data.gov claims to have over 156,584 data sets.
If you need government data on a global scale, then try UNdata. It is a portal to access the United Nations’ statistical resources on topics including “agriculture, crime, education, energy, industry, labour, national accounts, population and tourism”. It includes data from sources such as The World Bank, the World Health Organization, UNICEF, and several others.
UN Comtrade Database
Another UN collection of data is the UN Comtrade Database, which contains “more than 3.1 billion trade records starting from 1962”. Actually, it includes historical trade data as well, from 1900 to 1960, but it’s less complete. In terms of completeness of the more recent data, the 2013 data has a coverage rate of over 90%.
Also, there are some really beautiful trade data visualizations in their UN Comtrade Labs section. They are definitely worth checking out!
Open Data by Socrata
While we’re on the subject of government data sets, we should also look at Open Data by Socrata. Socrata offers a service for non-technical users to upload data related to government, housing and zoning, healthcare, energy and environment, and education. In terms of consuming the data, they have a great search interface, including some built-in advanced features like a group by option.
Data Market is a data aggregator that pulls data from the UN, The World Bank, The Economist Intelligence Unit, US Geological Survey, Office for National Statistics, Eurostat, and others. It’s a great resource for automitive, education, energy, finance, economic, food, and healthcare data.
NOAA National Climate Data Center
There are several sites that offer weather data, like wunderground and weatherbase, but none of these options offers a good way to get broad ranges of data. A good alternative for large sets of historical data is the NOAA National Climate Data Center. Though it’s a little clunky, since you have to submit your email address and wait a few minutes to receive the data set, it’s still one of the better options I’ve found for weather data.