Comparing and Contrasting Data Distributions

Student Summary

The mean absolute deviation, or MAD, is a measure of variability that is calculated by finding the mean distance from the mean of all the data points. Here are two dot plots, each with a mean of 15 centimeters, displaying the length of sea scallop shells in centimeters.

<p>Dot plot from 11 to 19 by 1’s. Length in centimeters. Beginning at 11, number of dots above each increment is 0, 1, 2, 3, 5, 3, 2, 1, 0</p>

<p>Dot plot from 11 to 19 by 1’s. Length in centimeters. Beginning at 11, number of dots above each increment is 0, 0, 2, 4, 5, 4, 2, 0, 0.</p>

Notice that both dot plots show a symmetric distribution so the mean and the MAD are appropriate choices for describing center and variability. The data in the first dot plot appear to be more spread apart than the data in the second dot plot, so you can say that the first data set appears to have greater variability than does the second data set. This is confirmed by the MAD. The MAD of the first data set is 1.18 centimeters and the MAD of the second data set is approximately 0.94 cm. This means that the values in the first data set are, on average, about 1.18 cm away from the mean, and the values in the second data set are, on average, about 0.94 cm away from the mean. The greater the MAD of the data, the greater the variability of the data.

The interquartile range, IQR, is a measure of variability that is calculated by subtracting the value for the first quartile, Q1, from the value for the third quartile, Q3. These two box plots represent the distributions of the lengths in centimeters of a different group of sea scallop shells, each with a median of 15 centimeters.

<p>Box plot from 2 to 20 by 1’s. Length in centimeters. Whisker from 3 to 5. Box from 5 to 19 with vertical line at 15. Whisker from 19 to 20.</p>

<p>Box plot from 2 to 20 by 1’s. Length in centimeters. Whisker from 3 to 9. Box from 9 to 19 with vertical line at 15. Whisker from 19 to 20.</p>

Notice that neither of the box plots have a symmetric distribution. The median and the IQR are appropriate choices for describing center and variability for these data sets. The middle half of the data displayed in the first box plot appear to be more spread apart, or show greater variability, than the middle half of the data displayed in the second box plot. The IQR of the first distribution is 14 cm, and the IQR is 10 cm for the second data set. The IQR measures the difference between the median of the second half of the data, Q3, and the median of the first half, Q1, of the data, so it is not affected by the minimum or the maximum value in the data set. It is a measure of the spread of the middle 50% of the data.

The MAD is calculated using every value in the data set, and the IQR is calculated using only the values for Q1 and Q3.

Visual / Anchor Chart

Standards

Building On
HSS-ID.A.1

Represent data with plots on the real number line (dot plots, histograms, and box plots).

Addressing
HSS-ID.A.26 questions

Use statistics appropriate to the shape of the data distribution to compare center (median, mean) and spread (inter-quartile range, sample standard deviation) of two or more different data sets.

Q13 · 2ptJune 2024
Regents June 2024 Question 13
Q16 · 2ptJune 2024
Regents June 2024 Question 16
Q4 · 2ptJune 2025
Regents June 2025 Question 4
Q12 · 2ptJune 2025
Regents June 2025 Question 12
Q14 · 2ptAugust 2025
Regents August 2025 Question 14
Q20 · 2ptJanuary 2025
Regents January 2025 Question 20
Building Toward
HSS-ID.A.3

Interpret differences in shape, center, and spread in the context of the data sets, accounting for possible effects of extreme data points (outliers).