The Effect of Extremes

Student Summary

Is it better to use the mean or median to describe the center of a data set?

The mean gives equal importance to each value when finding the center. The mean usually represents the typical values well when the data have a symmetric distribution. On the other hand, the mean can be greatly affected by changes to even a single value. 

The median tells you the middle value in the data set, so changes to a single value usually do not affect the median much. So, the median is more appropriate for data that are not very symmetrically distributed.

We can look at the distribution of a data set and draw conclusions about the mean and the median.

Here is a dot plot showing the amount of time a dart takes to hit a target in seconds. The data produce a symmetric distribution.

<p>Dot plot from 0.9 to 1.9 by 0.1’s. Time to hit dartboard in seconds. Beginning at 0.9, number of dots above each increment is 0, 1, 1, 2, 4, 6, 4, 2, 1, 1, 0.</p>

When a distribution is symmetric, the median and mean are both found in the middle of the distribution. Since the median is the middle value (or the mean of the two middle values) of a data set, you can use the symmetry around the center of a symmetric distribution to find it easily. For the mean, you need to know that the sum of the distances away from the mean of the values greater than the mean is equal to the sum of the distances away from the mean of the values less than the mean. Using the symmetry of the symmetric distribution you can see that there are four values 0.1 second above the mean, two values 0.2 seconds above the mean, one value 0.3 seconds above the mean, and one value 0.4 seconds above the mean. Likewise, you can see that there are the same number of values the same distances below the mean.

Here is a dot plot using the same data, but with two of the values changed, resulting in a skewed distribution.

<p>Dot plot from 0.2 to 1.7 by 0.1’s. Time to hit dartboard in seconds. Beginning at 0.2, number of dots above each increment is 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 2, 4, 6, 4, 2, 0.</p>

When you have a skewed distribution, the distribution is not symmetric, so you are not able to use the symmetry to find the median and the mean. The median is still 1.4 seconds since it is still the middle value. The mean, on the other hand, is now about 1.273 seconds. The mean is less than the median because the lower values (0.3 and 0.4) result in a smaller value for the mean.

The median is usually more resistant to extreme values than is the mean. For this reason, the median is the preferred measure of center when a distribution is skewed or if there are extreme values. When using the median, you would also use the IQR as the preferred measure of variability. In a more symmetric distribution, the mean is the preferred measure of center, and the MAD is the preferred measure of variability.

Visual / Anchor Chart

Standards

Addressing
HSS-ID.A.26 questions

Use statistics appropriate to the shape of the data distribution to compare center (median, mean) and spread (inter-quartile range, sample standard deviation) of two or more different data sets.

Q13 · 2ptJune 2024
Regents June 2024 Question 13
Q16 · 2ptJune 2024
Regents June 2024 Question 16
Q4 · 2ptJune 2025
Regents June 2025 Question 4
Q12 · 2ptJune 2025
Regents June 2025 Question 12
Q14 · 2ptAugust 2025
Regents August 2025 Question 14
Q20 · 2ptJanuary 2025
Regents January 2025 Question 20
HSS-ID.A.13 questions

Represent data with plots on the real number line (dot plots, histograms, and box plots).

Q9 · 2ptAugust 2024
Regents August 2024 Question 9
Q16 · 2ptJanuary 2025
Regents January 2025 Question 16
Q28 · 2ptJanuary 2026
Regents January 2026 Question 28
HSS-ID.A.3

Interpret differences in shape, center, and spread in the context of the data sets, accounting for possible effects of extreme data points (outliers).

Building Toward
HSS-ID.A.3

Interpret differences in shape, center, and spread in the context of the data sets, accounting for possible effects of extreme data points (outliers).