Introduction
Learning Objectives
By the end of this module, you should be able to:
- Explain the role of exploratory data analysis (EDA) in an HCI research workflow
- Compute and interpret summary statistics for grouped experimental data
- Create informative visualizations using
ggplot2 to inspect distributions, compare conditions, and explore variable relationships
- Identify outliers using the IQR method and visual inspection
- Check normality and variance assumptions before proceeding to inferential statistics
- Choose appropriate chart types based on data characteristics
What Is EDA?
Exploratory Data Analysis (EDA) is the practice of examining a dataset before applying any formal statistical models or hypothesis tests. The term was coined by John Tukey (1977), who argued that analysts should first look at the data rather than jump straight to confirming or rejecting hypotheses.
In short: understand before you model.
Why EDA Matters in HCI Research
HCI user studies generate messy, human-produced data. Participants behave in unexpected ways, sensors drop readings, and experimental conditions interact with individual differences. EDA helps you catch these issues early. Specifically, EDA answers questions like:
- What does the data look like? Are there missing values, duplicates, or impossible entries (e.g., negative reaction times)?
- How are the measurements distributed? Are they roughly normal, or heavily skewed? Are there outliers?
- Do conditions differ visually? Before running a t-test or ANOVA, can you already see a pattern in the data?
- Are variables related? Does typing speed correlate with experience? Does error rate depend on age?
- Are statistical assumptions met? Many inferential tests require normality and equal variances. EDA checks these before you commit to a particular test.