Question A: Over the past 10 years, what is the warmest temperature recorded, in degrees Fahrenheit, for the month of December in Miami, Florida?
Question B: At what temperature does water freeze in Miami, Florida?
Decide if each question is statistical or non-statistical. Explain your reasoning.
If you decide that a question is statistical, describe how you would find the answer. What data would you collect?
Show Solution
Question A is a statistical question. Sample reasoning: The temperature in Miami in December changes from day to day and from year to year.
Question B is not a statistical question. Water freezes at sea level at 32 degrees Fahrenheit. This is a known fact.
To answer Question A (about the warmest temperature), find the temperature records for the past ten years and look for the highest value in degrees Fahrenheit.
Section A Check
Section A Checkpoint
Problem 1
Classify each set of data as numerical data or categorical data.
The items on a shopping list for the grocery store.
The total cost of all the items on the shopping list for the grocery store.
The numbers on a barcode that can be scanned at the grocery store.
Show Solution
Categorical
Numerical
Categorical
Problem 2
Write a statistical question about your favorite food. Explain why it is a statistical question.
Show Solution
Sample response: What percentage of people like pancakes better than waffles? It is a statistical question because the answer would require collecting data from people about their preference and I expect there to be different answers included in the data.
Noah gathered information on the home states of the swimmers on a national team. He organized the data in a table. Would a dot plot be appropriate to display his data? Explain your reasoning.
This dot plot shows the ages of students in a swimming class. How many students are in the class?
Based on the dot plot, do you agree with each of the statements? Explain your reasoning.
The frequency of each age represented is never greater than 3.
Half of the students are between 2 and 3 years old.
Show Solution
No. He could use a bar graph because the home states of swimmers are categorical data.
16 students are in the class.
Agree. Sample reasoning: The number of dots stacked over a number is never greater than 3 for this dot plot.
Agree. On the number line, eight of the 16 data points, or half of the class, are placed to the right of 24 months and to the left of 36 months.
A group of students was asked, “How many children are in your family?” The responses are displayed in the dot plot.
How many students responded to the question?
What percentage of the students have more than one child in the family?
Write a sentence that describes the distribution of the data shown on the dot plot. Use a description of the center and spread in your description.
Show Solution
There are 20 dots and each corresponds to one student in the group.
75%. 15 out of 20 students answered that there are 2 or more children in the family.
Sample response: A typical number of children for this group of families is around 2 because the center is around 2.5 or so, but some families had many more children than others. The distribution is not very spread out with most families having 1–3 children and only a few of them having more.
A farmer sells tomatoes in packages of ten. She would like the tomatoes in each package to all be about the same size and close to 5.5 ounces in weight. The farmer is considering two different tomato varieties: Variety A and Variety B. She weighs 25 tomatoes of each variety. These dot plots show her data.
Variety A
A dot plot, Variety A, weight in ounces, labeled from 5 point 3 5 to 5 point 6 by five hundredths. Starting at 5 point 3 5 up by point 0 1, the number of dots above each increment is 0, 1, 1, 0, 1, 1, 0, 0, 2, 2, 1, 1, 2, 1, 1, 3, 1, 2, 1, 1, 0, 1, 0, 1, 1.
Variety B
A dot plot, Variety B, weight in ounces, labeled from 5 point 3 5 to 5 point 6 by five hundredths. Starting at 5 point 4 7 up by point 0 1, the number of dots above each increment is 1, 1, 7, 8, 4, 2, 1, 1.
What would be a good description for the center of the distribution of weights of Variety A tomatoes, in general? What about for the weight of Variety B tomatoes?
Which tomato variety should the farmer choose? Explain your reasoning.
Show Solution
In general, Variety A tomatoes are about 5.49 ounces and Variety B tomatoes are about 5.5 ounces.
She should choose Variety B. Sample reasoning: The two varieties of tomatoes have about the same center of their distributions, but there is much less variability in Variety B tomato weights. The weights are much more consistent than the weights for Variety A, so the tomatoes are more likely to be the same size and closer to 5.5 ounces in weight.
The two histograms show the points scored per game by a basketball player in 2008 and 2016.
A histogram, points per game in 2 thousand 8, 0 to 45 by fives. Beginning at 0 up to but not including 5, height of bar at each interval is 0, 1, 10, 11, 7, 1, 2, 1.
A histogram, points per game in two thousand 16, 0 to 45 by fives, Beginning at 0 up to but not including 5, height of bar at each interval is 2, 7, 6, 9, 5, 3, 2, 0, 0
Describe the center of each distribution represented by the histograms. Explain your reasoning.
Write 2–3 sentences that describe the spreads of the two distributions, including what spreads might tell us in this context.
Show Solution
Sample response:
In both seasons, the player typically scored around 15 to 20 points in a game. In each histogram, there seems to be a similar frequency of values on each side of this interval.
The spread of the distribution for 2008 seems less than the spread for the 2016 distribution. There are only 5 games in which the player did not score between 10 and 25 points per game in 2008, but in 2016 the data are more spread out. This means that, from game to game, the player was more consistent in 2008 than in 2016.
Here is a histogram that shows the number of points scored by a college basketball player during the 2008 season. Describe the shape and features of the data. Mention the center and spread as well as any symmetry, gaps, peaks, or other features that you notice.
Show Solution
Sample response: The distribution is not symmetrical because there is a peak on the left. The histogram shows a gap between 35 and 40, so there is no game in which the player scored 35, 36, 37, 38 or 39 points. There was one game that was unusually high scoring, between 40 and 44 points. The peak is between 15 and 20 points. The center is around 20 points, and the data is spread out pretty far above this center.
Section B Check
Section B Checkpoint
Problem 1
Describe the distribution. Include a mention of the center and spread in your description.
Show Solution
Sample response: The distribution of data is fairly evenly distributed between the values 2 and 12. The center of the distribution is around 7 because there are about the same number of points on either side of that value.
Problem 2
A car service company keeps track of how long it takes between when a customer requests a car and when it arrives. The data are summarized in the histogram.
What is a typical amount of time that customers have to wait for a car to arrive? Explain your reasoning.
How many customers in this group had to wait longer than 20 minutes?
Show Solution
Sample response: 8 minutes. Most wait times are in the 6–10 minute range, so I chose the middle value.
Last week, the daily low temperatures for a city, in degrees Celsius, were 5, 8, 6, 5, 10, 7, and 1. What was the average low temperature? Show your reasoning.
The mean of four numbers is 7. Three of the numbers are 5, 7, and 7. What is the fourth number? Explain your reasoning.
Show Solution
6 degrees Celsius. The sum of the temperatures divided by the total number of recorded temperatures is (5+ 8+ 6+ 5+ 10+ 7+ 1)÷7=6.
9. Sample reasoning: The 4 numbers must be distributed evenly around 7. Because 2 of the numbers are 7, and the third number is two less than 7, the fourth number must be 2 more than 7.
The three data sets show the number of text messages sent to their parents by Jada, Diego, and Lin over 6 days.
One of the data sets has a mean of 4, one has a mean of 5, and one has a mean of 6.
Jada
4
4
4
6
6
6
Diego
4
5
5
6
8
8
Lin
1
1
2
2
9
9
Which data set has which mean? What does this tell you about the text messages sent by the three students?
Which data set has the greatest variability? Explain your reasoning.
Show Solution
Jada's mean is 5, since 64+4+4+6+6+6=630=5. Diego's mean is 6, since 64+5+5+6+8+8=636=6. Lin's mean is 4, since 61+1+2+2+9+9=624=4. On average, Diego sent the most text messages to his parents per day, and Lin sent the fewest text messages per day to her parents.
Sample response: Lin's data has the highest variability. The sum of the distances to each side of the mean is the greatest.
These three data sets show the number of text messages sent to their parents by Jada, Diego, and Lin over 6 days as well as the mean number of text messages sent to their parents by each student per day.
Jada
mean: 5
4
4
4
6
6
6
Diego
mean: 6
4
5
5
6
8
8
Lin
mean: 4
1
1
2
2
9
9
Predict which data set has the largest MAD and which has the smallest MAD.
Compute the MAD for each data set to check your prediction.
Show Solution
Lin's data set has the largest MAD, because the data have the most variability. Jada's data set has the smallest MAD, because the data have the least variability.
Jada's MAD is 61+1+1+1+1+1=1. Diego's MAD is 62+1+1+0+2+2=1.33. Lin's MAD is 63+3+2+2+5+5=3.33.
One hundred sixth-grade students in five different countries are asked about their travel times to school. Their responses are organized into five data sets. The mean and MAD of each data set is shown in the table.
mean (minutes)
MAD (minutes)
United States
9
4.2
Australia
18.1
7.9
South Africa
23.5
16.2
Canada
11
8
New Zealand
12.3
5.5
Which group of students has the greatest variability in their travel times? Explain your reasoning.
Use the mean and MAD for Canada and New Zealand to compare travel times for those students.
The data sets for Australia and Canada have very different means (18.1 and 11 minutes) but very similar MADs. What can you say about the travel times of the students in those two data sets?
Show Solution
South Africa, because it has the largest MAD.
Sample response: The mean travel times are similar, so students in these countries typically take about the same amount of time to get to school. The MAD for New Zealand students is less than the MAD for Canadian students. This means that the travel time for students in New Zealand is more consistent than the travel times for students in Canada.
Sample response: On average, the students in Australia have a longer commute to school than students in Canada, but the travel times of students in both countries have the same variability. The data points are, on average, about 8 minutes from the mean.
Section C Check
Section C Checkpoint
Problem 1
A large company uses 2 manufacturing plants to package 50 pound bags of corn seed. The weight of the bags that are produced in a week are measured and summarized with this information.
Plant A
Mean weight of bags: 51.2 pounds
MAD weight of bags: 1.8 pounds
Plant B
Mean weight of bags: 50.1 pounds
MAD weight of bags: 0.1 pounds
Write 2 sentences comparing the distribution of bag weights for the 2 plants based on the given information.
The company is worried about one of the plants having too many bags that are under the advertised 50 pound weight. Which plant do you think is having this problem? Explain your reasoning.
Show Solution
Sample response: Plant A typically makes heavier bags of corn seed based on the mean, but has much larger variability based on the MAD. Plant B is more consistent in bag weight and stays closer to the 50 pound bags that are claimed.
Plant A. Sample reasoning: Although the mean weight is greater, the large MAD indicates that it is not uncommon to have bags that weigh less than the advertised 50 pounds (51.2−1.8=49.4)
Sample response: Half of Jada's practices are 20 minutes or shorter and the other half of her practices are 20 minutes or longer. Half of Diego's practices are 22.5 minutes or shorter, and the other half are 22.5 minutes or longer.
Predict if the mean is greater than, less than, or approximately equal to the median. Explain your reasoning.
Which measure of center—the mean or the median—better describes a typical value for the distributions?
Heights of 50 basketball players
Backpack weights of 55 sixth-grade students
<p>Dot plot from 0 to 16 by 2’s. Backpack weight in kilograms. Beginning at 0, number of dots above each increment from 0 to 9 is 0, 7, 9, 12, 7, 6, 3, 3, 2, 1. 1 dot above 16.</p>
Ages of 30 people at a family dinner party
Show Solution
Sample responses:
Player heights
The mean would be approximately equal to the median, because the data are roughly symmetric.
Since I think the values would be pretty close, either the mean or the median would describe a typical height pretty well.
Backpack weights
The mean would be higher than the median. The value of 16 kilograms would bring the mean up and move it away from the center of the data.
The median would better describe a typical backpack weight, since that value would lie in the center of the large cluster of data points.
People's ages
The mean would be lower than the median, because even though a large fraction of the people at the dinner party are 40 or older, the ages of the people that span from 5 to 40 would bring the average age down.
The median would better describe the center of the distribution of around 40–45 years old.
Diego wondered how far sixth-grade students could throw a heavy ball. He decided to collect data to find out. He asked 10 friends to throw the ball as far as they could and measured the distance from the starting line to where the ball landed. The data shows the distances he recorded in feet.
40
40
47
49
50
53
55
57
63
76
Find the median and IQR of the data set.
On a later day, he asked the same group of 10 friends to throw a ball again and collected another set of data. The median of the second data set is 49 feet, and the IQR is 6 feet.
Did the 10 friends, as a group, perform better (throw farther) or worse in the second round compared to the first round? Explain how you know.
Were the distances in the second data set more variable or less variable compared to those in the first round? Explain how you know.
Show Solution
The median is 51.5 feet. (50+53)÷2=51.5. The IQR is 10, because Q1 is 47, Q3 is 57, and 57−47=10.
Worse. Sample reasoning: The median of the second data set is 49 feet, which is 2.5 feet lower than in the first round.
Less variable. Sample reasoning: The IQR of the second data set is smaller, so the values are less spread out.
Here are two box plots that summarize two data sets.
Two box plots from 0 to 16 by 2’s. Top box plot labeled box plot A. Bottom box plot labeled box plot B. Box plot A whisker from 0 to 2. Box from 2 to 5 with vertical line at 3. Whisker from 5 to 16. Box plot B whisker from 0 to 2. Box from 2 to 8 with vertical line at 3 point 5. Whisker from 8 to 15.
Do you agree with each of the following statements?
Both data sets have the same range.
Both data sets have the same minimum value.
The IQR shown in box plot B is twice the IQR shown in box plot A.
Box plot A shows a data set that has a quarter of its values between 2 and 5.
These dot plots show the same data sets as those represented by the box plots. Decide which box plot goes with each dot plot. Explain your reasoning.
Data set 1
<p>A dot plot from 0 to 16 by 2's, labeled data set 1. The dot plot begins at 0, number of dots above each increment of 1 is 3, 9, 6, 7, 3, 2, 4, 3, 2, 2, 6, 0, 0, 0, 1, 2, 0. </p>
Data set 2
<p>A dot plot from 0 to 16 by 2's, labeled data set 2. The dot plot begins at 0, number of dots above each increment of 1 is 4, 7, 9, 11, 7, 6, 3, 3, 2, 1, 0, 0, 0, 0, 0, 0, 1.</p>
Show Solution
Disagree
Agree
Agree.
Disagree
Box plot A goes with Data set 2. Box plot B goes with Data set 1. Sample reasonings:
The maximum values tell which box plot goes with which dot plot.
The middle half of the points in data set 1 are more spread out compared to those in data set 2, so box plot B, which has a longer box, goes with data set 1.
Three quarters of the points in data set 2 are between 0 and 5, which matches the box and left whisker in Box plot A.
Researchers measures the lengths, in feet, of 20 male humpback whales and 20 female humpback whales. Here are two box plots that summarize their data.
<p>Two box plots on a grid from 38 to 56 by 2's. Length in feet. Top box plot labeled male. Bottom box plot labeled female. Box plot labeled male whisker from 39 point 2 to 43. Box from 43 to 46 with vertical line at 44 point 5. Whisker from 46 to 48. Box plot labeled female whisker from 48 to 49. Box from 49 to 51 point 8 with vertical line at 50 point 8. Whisker from 51 point 8 to 54 point 5.</p>
How long is the longest whale measured? Is this whale male or female?
What is a typical length for the male humpback whales in this study?
Do you agree with each of these statements about the whales? Explain your reasoning.
More than half of male humpback whales measured are longer than 46 feet.
The male humpback whales tend to be longer than female humpback whales.
The lengths of the male humpback whales tend to vary more than the lengths of the female humpback whales.
Show Solution
The longest whale is about 55 feet long and is a female.
A typical male humpback whale is about 44.5 feet long.
Disagree. Sample explanation: The upper quartile of the data for the male humpbacks is 46 feet, which means a quarter of the whales are longer than 46 feet.
Disagree. Sample explanation: The entire distribution for the lengths of female humpbacks is greater than that for male humpbacks, so female humpbacks tend to be longer than their male counterparts.
Agree. Sample explanation: The IQR of the data for male humpbacks is slightly greater than that for female humpbacks, and the range of the data for the males is larger than that for females, so the lengths of male humpbacks tend to vary more.
Section D Check
Section D Checkpoint
Problem 1
In a large city, the median rent paid monthly for an apartment is $2000 and the interquartile range (IQR) is $850. If you are planning to move to this city, what information does each of the values mean to you?
Show Solution
Sample response: The median rent of $2000 means that half of apartments in the city cost at least $2000 per month and half cost less. The IQR of $850 means that the middle half of rents are within $850 of each other. This means that there is quite a lot of variability for apartments in this city. It tells me that I might be expected to pay about $2000 per month for an apartment, but, if I need to, I should be able to find a cheaper one without too much trouble.
Problem 2
A person who recently graduated from college is looking at the salaries for people who work for two different companies. The box plots summarize the information from each company.
Compare the distribution of pay from each company.
Show Solution
Sample response: Both companies have the same median, so typical workers at the companies are paid similarly. The IQR for Company A is much less than at Company B, so there is more variability of salaries for the middle half of workers at Company B. The range of salaries at Company A is about $250,000 and only about $150,000 at Company B, so there may be more pay inequality at Company A.
Lin surveys her classmates on the number of hours they spend doing chores each week. She represents her data with a dot plot and a histogram.
Lin thinks that she can find the median, the minimum, and the maximum of the data set using both the dot plot and the histogram. Do you agree? Explain your reasoning.
Should Lin use the mean and MAD, or the median and IQR to summarize her data? Explain your reasoning.
Show Solution
Samples responses:
Disagree. The dot plot makes it possible to find the median, the minimum, and the maximum fairly easily because it shows each data value individually. The histogram makes it possible to estimate these values, but it is impossible to tell the exact values because the data points are grouped together.
Lin should use the median and IQR because the data is not approximately symmetrical and has values far from the center. There are a few larger values that are not similar to most of the other values