By the end of this section, you should be able to:
In the previous module on t-tests, we compared two conditions — for example, touchscreen vs. physical keyboard. But HCI studies frequently involve three or more conditions. Suppose you are evaluating three gesture interaction techniques — swipe, pinch, and tap-and-hold — for a map navigation task. You want to know whether task completion time differs across these techniques.
A natural first instinct is to run pairwise t-tests: swipe vs. pinch, swipe vs. tap-and-hold, and pinch vs. tap-and-hold. That gives you three tests. The problem is that each test carries a 5% chance of a false positive (Type I error). Across three tests, your overall error rate is no longer 5% — it climbs to approximately:
$$1 - (1 - 0.05)^3 \approx 0.143$$
That is a 14.3% chance of at least one false positive. With five groups you would need 10 pairwise comparisons, and the error rate balloons to about 40%. This is called the multiple comparisons problem (or family-wise error rate inflation).
Analysis of Variance (ANOVA) solves this by testing all groups simultaneously in a single omnibus test. Instead of asking "do these two groups differ?", ANOVA asks: "is there any difference among these group means?" If the answer is yes, you then follow up with targeted post-hoc comparisons that control for multiple testing.