Learning Outcomes:
Upon successful completion of the course, students will be able to:
1. Understand fundamental issues in quantitative text analysis, such as inter-coder agreement, reliability, validation, accuracy, and precision.
2. Master classical text-as-data approaches by converting texts into quantitative matrices of features and analysing them using statistical methods, scaling models, and topic modelling.
3. Understand the strengths and limitations of classical approaches, particularly their interpretability, and recognise contexts where they are most appropriately applied.
4. Apply and fine-tune modern neural network approaches, including transformer models, to text analysis tasks.
5. Understand the capabilities and limitations of generative AI for text analysis, including the trade-off between performance and interpretability.
6. Use human coding of texts to train and evaluate both supervised classifiers and fine-tuned transformer models.
7. Select and justify appropriate text analysis techniques (classical or modern) for their own research questions and text corpora.
8. Critically evaluate social science research that uses text analysis methods, assessing methodological choices and the appropriateness of different techniques.
Indicative Module Content:
Statistical software and programming using R, Python, and Quarto; assumptions and workflow of quantitative text analysis; tokenisation and document-feature matrices; dictionary-based and sentiment analysis approaches; text comparison and similarity metrics; word embeddings; human coding and validation; supervised document classification; scaling models (Wordscores, Wordfish, Latent Semantic Scaling); topic models; transformer-based models (BERT, DistilBERT); fine-tuning pre-trained models; large language models for text classification and analysis; trade-offs between interpretability and performance; working with text corpora and APIs.