Explore UCD

UCD Home >

POL42050

Academic Year 2025/2026

Quantitative Text Analysis (POL42050)

Subject:
Politics
College:
Social Sciences & Law
School:
Politics & Int Relations
Level:
4 (Masters)
Credits:
10
Module Coordinator:
Assoc Professor Stefan Muller
Trimester:
Spring
Mode of Delivery:
On Campus
Internship Module:
No
How will I be graded?
Letter grades

Curricular information is subject to change.

Computational text analysis has become increasingly popular in political science in recent years. With the vast availability of text data on the web, political scientists increasingly view quantitative text analysis (or “text as data”) as a valuable approach for studying various forms of social and political behaviour.

This module introduces political science students to the quantitative analysis of textual data. The course is structured in two complementary parts, each addressing different approaches to text analysis with distinct methodological properties.

The first part covers classical quantitative text analysis approaches that prioritise interpretability and transparency. It covers the theoretical foundations, practical applications, and technical implementations of these text-as-data methods using the R statistical programming language, and how to validate such approaches. Classical methods follow a three-step framework: identifying texts and units of analysis; extracting measurable features from these texts and converting them into a quantitative feature matrix; and analysing this matrix using statistical techniques. These include dictionary-based approaches, supervised document classification, scaling models, and topic modelling. These approaches offer clear, interpretable results that can be explained and validated at each stage of the analysis, making them particularly valuable for rigorous social science research.

The second part introduces more powerful, state-of-the-art techniques that leverage neural networks and large language models. Students will gain hands-on experience with word embeddings, transformer models, and generative AI approaches. Whilst these methods often deliver superior predictive performance and capture nuanced linguistic patterns, they frequently operate as “black boxes” with limited interpretability. The module incorporates the Hugging Face Python infrastructure, a leading resource for implementing transformer models and other state-of-the-art natural language processing tools. Students will learn how to fine-tune pre-trained models and leverage generative AI for text analysis, whilst understanding the important trade-offs between performance and interpretability.

Each session combines lectures with practical exercises, allowing students to apply these methods to political texts. These exercises address real-world challenges at each stage of the research process. By engaging with both classical and modern approaches, students will gain insights into the full spectrum of text analysis methods available to political science researchers and will be equipped to select appropriate techniques for their own research questions.

About this Module

Learning Outcomes:

Upon successful completion of the course, students will be able to:

1. Understand fundamental issues in quantitative text analysis, such as inter-coder agreement, reliability, validation, accuracy, and precision.

2. Master classical text-as-data approaches by converting texts into quantitative matrices of features and analysing them using statistical methods, scaling models, and topic modelling.

3. Understand the strengths and limitations of classical approaches, particularly their interpretability, and recognise contexts where they are most appropriately applied.

4. Apply and fine-tune modern neural network approaches, including transformer models, to text analysis tasks.

5. Understand the capabilities and limitations of generative AI for text analysis, including the trade-off between performance and interpretability.

6. Use human coding of texts to train and evaluate both supervised classifiers and fine-tuned transformer models.

7. Select and justify appropriate text analysis techniques (classical or modern) for their own research questions and text corpora.

8. Critically evaluate social science research that uses text analysis methods, assessing methodological choices and the appropriateness of different techniques.

Indicative Module Content:

Statistical software and programming using R, Python, and Quarto; assumptions and workflow of quantitative text analysis; tokenisation and document-feature matrices; dictionary-based and sentiment analysis approaches; text comparison and similarity metrics; word embeddings; human coding and validation; supervised document classification; scaling models (Wordscores, Wordfish, Latent Semantic Scaling); topic models; transformer-based models (BERT, DistilBERT); fine-tuning pre-trained models; large language models for text classification and analysis; trade-offs between interpretability and performance; working with text corpora and APIs.

Student Effort Hours:
Student Effort Type Hours
Autonomous Student Learning

226

Lectures

24

Total

250


Approaches to Teaching and Learning:
active/task-based learning; peer and group work; lectures; lab/studio work; enquiry & problem-based learning; case-based learning

Requirements, Exclusions and Recommendations
Learning Requirements:

NOTE: Prior familiarity with the statistical programming language R (or Python) is a prerequisite for this course due to its direct relevance to the content and assignments. Below are some reasons why prior experience with R or Python is crucial for students to follow the course and apply the methods effectively:

– Implementation of Text Analysis Methods: Text analysis is a central component of the course, and R is widely used for implementing text analysis techniques. R provides a comprehensive set of libraries and packages specifically designed for text processing, natural language processing (NLP), and sentiment analysis. Students with prior experience in R will be able to navigate and utilize these tools more efficiently, enabling them to implement text analysis methods covered in the course effectively.

– Course Content Alignment: The course content, lectures, and materials are designed with a focus on R-based implementation. The examples, code snippets, and demonstrations provided throughout the course will be predominantly in R. Some of the advanced methods are implemented in Python, but a good understanding of R will make it much easier to write and run code in Python. Without prior familiarity, students may struggle to comprehend and replicate these examples, hindering their understanding of the core concepts and methodologies.

– Homework Assignments and Research Papers: The assignments and research papers in this course will require students to apply the text analysis methods discussed in class to real-world data. Students without prior experience with R may find it challenging to write R code to preprocess large text corpora, visualise results, and interpret the findings. Their lack of proficiency in R could impede their ability to complete assignments accurately and efficiently.


Module Requisites and Incompatibles
Not applicable to this module.
 

Assessment Strategy
Description Timing Component Scale Must Pass Component % of Final Grade In Module Component Repeat Offered
Quizzes/Short Exercises: MCQ Test 1 Week 6 Alternative linear conversion grade scale 40% No
15
No
Assignment(Including Essay): Homework 2 Week 9 Other No
25
No
Assignment(Including Essay): Research Paper Week 14 Standard conversion grade scale 40% No
60
No

Carry forward of passed components
No
 

Resit In Terminal Exam
Summer No
Please see Student Jargon Buster for more information about remediation types and timing. 

Feedback Strategy/Strategies

• Feedback individually to students, post-assessment

How will my Feedback be Delivered?

Feedback will be provided to students within 20 working days of the deadline for the assignment in accordance with university policy.

Timetabling information is displayed only for guidance purposes, relates to the current Academic Year only and is subject to change.
Spring Computer Aided Lab Offering 1 Week(s) - 20, 21, 22, 23, 24, 25, 26, 29, 30, 31, 32, 33 Wed 09:00 - 10:50