Learning Outcomes:
1. Understand the general properties of data, and the different approaches to analysing data for science
2. Understand the principles & practical use of data visualization
3. Develop basic skills in use of R programming language for simple data analysis & preparation of figures
4. Learn about advanced data analysis applications from leaders in science and industry
Indicative Module Content:
Class 1. Data Analysis I:
Roles, aims, tools of data analysis
The data life cycle
Data types: variables, parameters, constants
Preparing data – data ‘wrangling’; the Tidyverse
Class 2. Data Analysis II:
Exploratory data analysis - Brief introduction to key topics:
Sampling & distributions
Statistical significance & p-values
Quantiles & Q-Q plots
Relationships between variables : correlation & regression
Overview of hypothesis testing
Statistical modelling
Class 3. Data Analysis III:
What is artificial intelligence?
Differences between traditional statistical modelling and machine learning
Supervised learning: classification, regression
Unsupervised learning: clustering, dimension reduction
Class 4. Data Visualization I:
Elements of a plot
Choosing a chart type:
Representing amounts, proportions, frequency, variation, relationships
Aesthetics of plots
The process of data visualization
Class 5. Data Visualization II:
Aims & history of data visualization
Perception & Gestalt laws, encoding with colour
Telling a story
Good & bad figures
Design for reproducibility
Class 6. Data Visualization III
Class Exercises
Students in groups select and present examples of ‘good’ & ‘bad’ data visualization from the scientific literature or general publications, followed by class discussion
Prize for best example
Classes 7, 8, 9 involve presentations by visiting scientists active in cutting edge research
Practical 1. Introduction to the R programming language:
Overview of R & the R Studio IDE
Download & install R & R Studio
The R Studio layout
Command lines, scripts, projects
Objects, functions, packages
Write your first code
Import data, make a simple plot
Finding help
Hints for using R
Practical 2. Data visualization using ggplot2: Making basic plots:
The R Studio IDE
The basic plot() function
The grammar of graphics (data, aesthetics/coordinate systems, geoms)
The ggplot2 package
Make a simple plot
Customize your plot
Saving & exporting plots
Practical 3. Data visualization using ggplot2: Advanced plots
Displaying multiple variables
Using layers in your plots
Facets – divide & conquer
Interactive visualizations with plotly
Practical 4. Data visualization using ggplot2: Working with models
Basic analysis of a biological dataset using a general liner model (GLM)
a) OMICS dataset, or
b) Coronavirus time series