Data and Resources for Empirical Work

Here, I collect helpful resources for TU Delft master students setting out to do empirical research. It’s a start - more resources and tips will be added over time.

Data

EUI Library Data Portal. The European University Institute library team maintains a list of valuable data sets and describes the contents and access conditions.

OData API of Statistics Netherlands. Learn how to access publicly available tabular data sets from the CBS using R and Python packages.

Data in Brief. An open-access journal describing data sets for use by anyone, including master students. Search for a keyword of interest; maybe something interesting pops up.

Data Is Plural. Browse the archive of this weekly newsletter; maybe you find a nugget.

Access to different kinds of Statistics Netherlands’ microdata. Understand the difference between public use files and scientific use files and the channels via which CBS provides them.

BACI (product-level bilateral trade data). The United Nation’s Comtrade is the primary source for detailed product-level data on international trade flows. The raw Comtrade data, however, needs some cleaning and processing before it can be used. Competent researchers have done this work for you. For product-level trade data, use either the BACI database or the trade flows underlying the Atlas of Economic Complexity.

MISSY setup routines for the EU-LFS. The Microdata Information System, MISSY, makes Stata do files available for importing and processing the European Labor Force Survey (EU-LFS) and the Statistics on Income and Living Conditions (EU-SILC) survey. Master students generally do not receive permission to work with the complete samples (the scientific use files), but they can work with the public use files.

Classification systems

UN Statistics Division. The UN Statistics Division maintains several classification systems for traded products and economic activities.

Classification of industries: ISIC, NACE, NAICS.

Classification of products: BEC, SITC, HS.

Classification of occupations: ISCO, SOC.

Lecture note on crosswalking. Merging data sets that come in different classification schemes (e.g. merging a set of products and a set of industries) is known as crosswalking. Brendan Price explains the dark art and unglamorous science of crosswalking.

Correspondence tables can help you do the crosswalking. The UN Statistics Division publishes correspondence tables, and the World Integrated Trade Solution does, too.

Misc

Nick Huntington-Klein’s Econometrics Resources. NHK, author of the online textbook The Effect”, suggests useful R packages and data sources.

Using survey weights? Read the section Using weights or inflation factors” (p. 44ff) in Angus Deaton’s The Analysis of Household Surveys” to understand why and how.


Date
May 8, 2024