POL42050 Quantitative Text Analysis

Academic Year 2023/2024

Automated text analysis has become very popular in political science over the past years. With the massive availability of text data on the web, political scientists increasingly recognize automated text analysis (or “text as data”) as a promising approach for analyzing various kinds of social and political behavior. This module introduces students of political science to the quantitative analysis of textual data. We discuss the underlying theoretical assumptions, substantive applications of these methods, and the respective implementations in the R statistical programming language.

Each session combines lectures with practical, hands-on exercises to apply the methods to political text, dealing with practical issues in each step of the research process. Most of these methods can be reduced to a three-step process: ﬁrst, identifying texts and units of texts for analysis; second, extract quantitatively measured features from these texts and converting them to a quantitative feature matrix; third, analyse this matrix with statistical methods, such as dictionary construction and application, scaling models, and topic models, to draw inferences about the texts. Students will learn how to apply these steps to various types of texts. The course will also introduce advanced methods, including word embeddings, speech transcription, machine translation, and computer vision.

Show/hide contentOpenClose All

Curricular information is subject to change

What will I learn?

Learning Outcomes:

Upon successful completion of the course, students will be able to:

1. Understand fundamental issues in (quantitative) text analysis, such as inter-coder agreement, reliability, and validation.

2. Convert texts into quantitative matrices of features, and then analyse those features using statistical methods.

3. Use human coding of texts to train supervised classiﬁers.

4. Apply these methods to a text corpus to address a substantive research question.

5. Critically evaluate (social science) research that uses automated text analysis methods.

Indicative Module Content:

Statistical software and programming using R and RMarkdown; assumptions and workflow of quantitative text analysis approaches; tokenisation and document-feature matrix; dictionaries and sentiment analysis; describing and comparing texts; human coding and document classification; supervised and unsupervised scaling; multilingual text analysis; topic models; speech recognition; word embeddings

How will I learn?

Student Effort Hours:

Student Effort Type	Hours
Lectures	24
Autonomous Student Learning	226
Total	250

Approaches to Teaching and Learning:
active/task-based learning; peer and group work; lectures; lab/studio work; enquiry & problem-based learning; case-based learning

Am I eligible to take this module?

Requirements, Exclusions and Recommendations

Learning Requirements:

NOTE: Prior familiarity with the statistical programming language R (or Python) is a prerequisite for this course due to its direct relevance to the content and assignments. Below are some reasons why prior experience with R or Python is crucial for students to follow the course and apply the methods effectively:

– Implementation of Text Analysis Methods: Text analysis is a central component of the course, and R is widely used for implementing text analysis techniques. R provides a comprehensive set of libraries and packages specifically designed for text processing, natural language processing (NLP), and sentiment analysis. Students with prior experience in R will be able to navigate and utilize these tools more efficiently, enabling them to implement text analysis methods covered in the course effectively.

– Course Content Alignment: The course content, lectures, and materials are designed with a focus on R-based implementation. The examples, code snippets, and demonstrations provided throughout the course will be predominantly in R. Some of the advanced methods are implemented in Python, but a good understanding of R will make it much easier to write and run code in Python. Without prior familiarity, students may struggle to comprehend and replicate these examples, hindering their understanding of the core concepts and methodologies.

– Homework Assignments and Research Papers: The assignments and research papers in this course will require students to apply the text analysis methods discussed in class to real-world data. Students without prior experience with R may find it challenging to write R code to preprocess large text corpora, visualise results, and interpret the findings. Their lack of proficiency in R could impede their ability to complete assignments accurately and efficiently.

Module Requisites and Incompatibles

Not applicable to this module.

How will I be assessed?

Assessment Strategy

Description	Timing	Open Book Exam	Component Scale	Must Pass Component	% of Final Grade
Assignment: Homework 2	Throughout the Trimester	n/a	Graded	No	25
Assignment: Research paper	Week 12	n/a	Graded	No	50
Assignment: Homework 1	Throughout the Trimester	n/a	Graded	No	25

Carry forward of passed components
No

What happens if I fail?

Resit In	Terminal Exam
Summer	No

Please see Student Jargon Buster for more information about remediation types and timing.

Assessment feedback

Feedback Strategy/Strategies

• Feedback individually to students, post-assessment

How will my Feedback be Delivered?

Feedback will be provided to students within 20 working days of the deadline for the assignment in accordance with university policy.

When is this module offered?

Timetabling information is displayed only for guidance purposes, relates to the current Academic Year only and is subject to change.


Spring
Computer Aided Lab	Offering 1	Week(s) - 20, 21, 23, 24, 25, 26, 29, 31, 32, 33	Mon 10:00 - 11:50

UCD Course Search
Quantitative Text Analysis (POL42050)

Academic Year 2023/2024

The information contained in this document is, to the best of our knowledge, true and accurate at the time of publication, and is solely for informational purposes. University College Dublin accepts no liability for any loss or damage howsoever arising as a result of use or reliance on this information.

Quantitative Text Analysis (POL42050)

Subject:: Politics
College:: Social Sciences & Law
School:: Politics & Int Relations
Level:: 4 (Masters)
Credits:: 10.0
Trimester:: Spring
Module Coordinator:: Dr Stefan Muller
Mode of Delivery:: Face-to-Face
Internship Module:: No
How will I be graded?: Letter grades

(Google Chrome is recommended when printing this page)