POL42050 Quantitative Text Analysis

Academic Year 2022/2023

Automated text analysis has become very popular in political science over the past years. With the massive availability of text data on the web, political scientists increasingly recognize automated text analysis (or “text as data”) as a promising approach for analyzing various kinds of social and political behavior. This module introduces students of political science to the quantitative analysis of textual data. We discuss the underlying theoretical assumptions, substantive applications of these methods, and the respective implementations in the R statistical programming language.

Each session combines lectures with practical, hands-on exercises to apply the methods to political text, dealing with practical issues in each step of the research process. Most of these methods can be reduced to a three-step process: first, identifying texts and units of texts for analysis; second, extract quantitatively measured features from these texts and converting them to a quantitative feature matrix; third, analyse this matrix with statistical methods, such as dictionary construction and application, scaling models, and topic models, to draw inferences about the texts. Students will learn how to apply these steps to various types of texts. The course will also introduce advanced methods, including word embeddings, speech transcription, machine translation, and computer vision.

Show/hide contentOpenClose All

Curricular information is subject to change

Learning Outcomes:

Upon successful completion of the course, students will be able to:

1. Understand fundamental issues in (quantitative) text analysis such as inter-coder agreement, reliability, validation, accuracy, and precision.

2. Convert texts into quantitative matrices of features, and then analyse those features using statistical methods.

3. Use human coding of texts to train supervised classifiers.

4. Apply these methods to their own text corpus to address a substantive research question.

5. Critically evaluate (social science) research that uses automated text analysis methods.

Indicative Module Content:

Statistical software and programming using R and RMarkdown; assumptions and workflow of quantitative text analysis approaches; tokenisation and document-feature matrix; dictionaries and sentiment analysis; describing and comparing texts; human coding and document classification; supervised and unsupervised scaling; multilingual text analysis; topic models; speech recognition; word embeddings

Student Effort Hours: 
Student Effort Type Hours


Autonomous Student Learning




Approaches to Teaching and Learning:
active/task-based learning; peer and group work; lectures; lab/studio work; enquiry & problem-based learning; case-based learning 
Requirements, Exclusions and Recommendations

Not applicable to this module.

Module Requisites and Incompatibles
Not applicable to this module.
Assessment Strategy  
Description Timing Open Book Exam Component Scale Must Pass Component % of Final Grade
Assignment: Homework 2 Throughout the Trimester n/a Graded No


Assignment: Research paper Week 12 n/a Graded No


Assignment: Homework 1 Throughout the Trimester n/a Graded No


Carry forward of passed components
Resit In Terminal Exam
Summer No
Please see Student Jargon Buster for more information about remediation types and timing. 
Feedback Strategy/Strategies

• Feedback individually to students, post-assessment

How will my Feedback be Delivered?

Feedback will be provided to students within 20 working days of the deadline for the assignment in accordance with university policy.