Data and Resources for Empirical Research

thesis supervision
Published

October 9, 2024

Here I collect resources for TU Delft master students setting out to do empirical research. It’s a start - I will add more resources and tips over time.

Data

EUI Library Data Portal. The European University Institute library team maintains a list of valuable data sets and describes the contents and access conditions.

OData API of Statistics Netherlands. Learn how to access publicly available tabular data sets from the CBS using R and Python packages.

Telling Stories with Data. The online appendix of Rohan Alexander’s book includes a long list of data sources.

Data in Brief. An open-access journal describing data sets for use by anyone, including master students. Search for a keyword of interest – maybe something interesting pops up.

Data Is Plural. A newsletter presenting data sets. Browse the archive – maybe you find a nugget.

BACI (product-level bilateral trade data). The United Nation’s Comtrade is the primary source for detailed product-level data on international trade flows. The raw Comtrade data, however, needs some cleaning and processing before it can be used. Competent researchers have done this work for you. For product-level trade data, use either the BACI database or the trade flows underlying the Atlas of Economic Complexity.

MISSY setup routines for the EU-LFS. The Microdata Information System, MISSY, makes Stata do files available for importing and processing the European Labor Force Survey (EU-LFS) and the Statistics on Income and Living Conditions (EU-SILC) survey. Master students generally do not receive permission to work with the complete samples (the scientific use files), but they can work with the public use files.

Classification Systems

UN Statistics Division. The UN Statistics Division maintains several classification systems for traded products and economic activities.

Classification of industries: ISIC, NACE, NAICS.

Classification of products: BEC, SITC, HS.

Classification of occupations: ISCO, SOC.

Lecture note on crosswalking. Merging data sets that come in different classification schemes (e.g. merging a set of products and a set of industries) is known as crosswalking. Brendan Price explains the dark art and unglamorous science of crosswalking.

Correspondence tables help you do the crosswalking. The UN Statistics Division publishes correspondence tables, and the World Integrated Trade Solution does, too.

Misc

Nick Huntington-Klein’s Econometrics Resources. NHK, author of the online textbook “The Effect”, suggests useful R packages and data sources.

Survey weights. To understand what survey weights are, and why and how to use them, read the section “Using weights or inflation factors” (p. 44ff) in Angus Deaton’s “The Analysis of Household Surveys”.