Quantitative social sciences: from correlation to causality

Offer semester
2nd semester

Lecture time
Tuesday 10:30-12:20

Lecture venue

Course description

Many if not most social research questions are concerned with questions of causality, e.g. what are the causes of good and bad things in society? Only if we understand the causes can we hope to modify the good/bad effects. Much if not most of social research is observational, i.e. correlational; we can observe and measure things, ask people questions etc., but it’s not easy to run experiments. This means that often we only have correlational data with which to evaluate and test our causal research questions. Taken together, the two conditions above present a problem, because as we all know, correlation does not equal causation.

Recently developed theories of causation challenge these limitations. We will use the theory of Directed Acyclic Graphs (DAGs) to understand how causality translates into correlations among variables. We will use this knowledge to help us specify statistical models that may help us evaluate our causal theories.

The statistical models we will use are varieties of Generalized Linear Models (GLMs), specifically Linear Regression and Logistic Regression. We will also be looking at simple extensions to these models that allow us to deal with so-called multilevel data, which has a nested structure, e.g. pupils nested in schools. We will use the R software package to estimate these models using data. We will evaluate some existing social research studies using our knowledge of DAGs and GLMs.

No prior knowledge about statistical modelling is needed for this course.

Course learning outcomes

By the end of the course, students will be able to:

  1. Develop a statistical model to answer a social science research question using observational data.
  2. Assess common issues with making causal inferences from observational data.
  3. Critique some social science research that use correlations to infer causality.
  4. Analyse a dataset using R and integrate the results into a report.
  5. Demonstrate how to interpret the results of their quantitative data analysis.


Group project presentations20%
Data Analysis report80%

Required reading

Agresti, A., & Finlay, B. (2018). Statistical methods for the social sciences, Global Edition. Pearson/ Prentice Hall.

Imai K (2017). Quantitative Social Science: Introduction. Princeton University.

Luke DA (2020) Multilevel Modeling: Second Edition. SAGE Publications, Inc.

Pampel FC (2021) Logistic Regression: A Primer. SAGE Publications, Inc.

Schroeder LD, Sjoquist DL & Stephan PE (2017) Understanding Regression Analysis: An Introductory Guide. SAGE Publications, Inc.

Zeitlin W, Auerbach C (2019) Basic Statistics for the Behavioral and Social Sciences Using R. Oxford University Press, Incorporated.

Recommended reading

DiPrete & Forristal (1994). Multilevel Models: Methods and Substance. Annual Review of Sociology, 20:331-357.

Elwert, F., & Winship, C. (2014). Endogenous Selection Bias: The Problem of Conditioning on a Collider Variable. Annual Review of Sociology, 40(1), 31–53.

McShane, B. B., Gal, D., Gelman, A., Robert, C., & Tackett, J. L. (2019). Abandon Statistical Significance. American Statistician, 73(sup1), 235–245.

Pearl, J., Glymour, M., and Jewell, N.P. (2016). Causal Inference in Statistics. Wiley.

Rohrer, J. M. (2018). Thinking Clearly About Correlations and Causation: Graphical Causal Models for Observational Data. Advances in Methods and Practices in Psychological Science, 1(1), 27-42.

Verzani, J. (2001). SimpleR: Using R for Introductory Statistics.

Course co-ordinator and teachers