📚

 > 

📊 

 > 

👆

1.9 Comparing Distributions of a Quantitative Variable

5 min readjune 18, 2024

Jed Quiaoit

Jed Quiaoit

L

Lusine Ghazaryan

Jed Quiaoit

Jed Quiaoit

L

Lusine Ghazaryan

We talked a lot about distributions and how to describesummarize, and represent them in alternative formats; now it's time to put more into practice by comparing multiple sets of data. 🪑

Comparing Groups with Stem-and-Leaf Plots: Warm Up

Before we dive deeper into AP-style questions, which are more descriptive and comprehensive in nature, let's do a warm up question using a familiar graphical method used in statistics: stem plots! 🌳

Question: The weight of two groups of eight animals, Group M and Group N, are recorded and the data is shown in the stem plots below (with each stem and leaf representing weight in kg). Use the stem plots to compare the weight of the animals in the two groups.

Group M:

1 | 4

2 | 3 4 8

3 | 2 6 8

4 |

5 | 0

Group N:

1 | 0 

2 | 3 6

3 | 5

4 | 1

5 | 4 7

6 | 2

To compare the two groups, we can look at the distribution of the data and compare the range of the data. From the stem plots, we can see that Group M has weights ranging from 14 to 50 kg, while Group N has weights ranging from 10 to 62 kg. Group N has a wider range of weights, with some animals being significantly heavier than the heaviest animal in Group M.

We can also look at the distribution of the data within each group to see if there are any patterns or trends. For example, we can see that Group M has a cluster of values in the 20s and 30s, while Group N has a more even distribution of values throughout the range. This suggests that Group M has a higher proportion of animals that are relatively similar in weight, while Group N has a more diverse range of weights.

Overall, the stem plots show that Group N has a wider range of weights compared to Group M, with a more diverse distribution of weights within the group!

Comparing Groups with Histograms: Practice AP-Style Problem

Records are kept by each state in the United States on the number of pupils enrolled in public schools and the number of teachers employed by public schools for each school year. From these records, the ratio of the number of pupils to the number of teachers (P-T ratio) can be calculated for each state. The histograms below show the P-T ratio for every state during the 2001–2002 school year. The histogram on the left displays the ratios for the 24 states that are west of the Mississippi River, and the histogram on the right displays the ratios for the 26 states that are east of the Mississippi River. 🏫

Source: The College Board (via AP Classroom)

The question asks us to estimate the median (not to compute but estimate). For states west of the Mississippi (n = 24), n/2, the median falls between the 12th and 13th value in the ordered list, and both the 12th and 13th values fall in the interval 15–16. For states east of the Mississippi (n = 26) the median falls between the 13th and 14th value in the ordered list, and both of these values also fall in the interval 15–16. So, both groups have median at least 15 or at most 16 students per teacher. 

b. Write a few sentences comparing the distributions of P-T ratios for states in the two groups (west and east) during the 2001–2002 school year.

Here, you apply the three things about the distribution: shape, center and spread one by one. Always start with shape first. The shapes of the two histograms look different. The histogram for West is unimodal and skewed to the right, whereas the histogram East is unimodal and nearly symmetric.

For the center we already found in part (a), that the medians of the two distributions are about the same, between 15 and 16 for both distributions.

Aaaaand finally, report the spread! Look at how the values are scattered or concentrated next to its center on the distributions. The histograms show that West values vary more than in East. Although the data are grouped but we still can approximate the range. The range for the west is at most 22 – 12 = 10, and the range for the east is at most 19 – 12 = 7. The east has less variability compared to the West.

c. Using your answers in parts (a) and (b), explain how you think the mean P-T ratio during the 2001–2002 school year will compare for the two groups (west and east).

The two histograms have different shapes. Since West is skewed to the right, the mean will be higher and greater than the median. The highest number on the right tail will affect the mean number. For East, since it is fairly symmetric, the mean will be close to the median. To compare the two groups, we can conclude that the mean for the west group will probably be greater than the mean for the east group.

Comparing Groups with Box Plots: Practice AP-Style Problem

A team of psychologists studied the concept of visualization in basketball, where players visualize making a basket before shooting the ball. They conducted an experiment in which 20 basketball players with similar abilities were randomly assigned to two groups. The 10 players in group 1 received visualization training, and the 10 players in group 2 did not. 🏀

Each player stood 22 feet from the basket at the same location on the basketball court. Each player was then instructed to attempt to make the basket until two consecutive baskets were made. The players who received visualization training were instructed to use visualization techniques before attempting to make the basket. The total number of attempts, including the last two attempts, were recorded for each player.

The total number of attempts for each of the 20 players are summarized in the following box plots.

Source: The College Board

We have two groups, with 10 basketball players randomly assigned to each group. 

We learn from the question that group 1 received visualization training but group 2. There are a few things here we can compare to find the answer to the question. We can see both groups have the same minimum attempts, and all other measures are different. 

25% of the time the group 1 made the basket in 3 trials but group 2 in 4 trials. 

Now, look at the median. The median is much lower for group 1 than for group 2. Group 1 has an outlier, which is still less than the maximum of group 2. We can see that the training had an impact on group 1, as all the 5 summary measures are less than from group 2; however, we are not asked to generalize this finding yet.  

Finally, to answer the question: it is good enough only to report the median. Because the median number of attempts for players who received visualization training (4) is less than the median number of attempts for players who did not receive training (7), those who received visualization training tend to need fewer attempts to make two consecutive baskets. Talk about talent and persistence! 🦘

<< Hide Menu

📚

 > 

📊 

 > 

👆

1.9 Comparing Distributions of a Quantitative Variable

5 min readjune 18, 2024

Jed Quiaoit

Jed Quiaoit

L

Lusine Ghazaryan

Jed Quiaoit

Jed Quiaoit

L

Lusine Ghazaryan

We talked a lot about distributions and how to describesummarize, and represent them in alternative formats; now it's time to put more into practice by comparing multiple sets of data. 🪑

Comparing Groups with Stem-and-Leaf Plots: Warm Up

Before we dive deeper into AP-style questions, which are more descriptive and comprehensive in nature, let's do a warm up question using a familiar graphical method used in statistics: stem plots! 🌳

Question: The weight of two groups of eight animals, Group M and Group N, are recorded and the data is shown in the stem plots below (with each stem and leaf representing weight in kg). Use the stem plots to compare the weight of the animals in the two groups.

Group M:

1 | 4

2 | 3 4 8

3 | 2 6 8

4 |

5 | 0

Group N:

1 | 0 

2 | 3 6

3 | 5

4 | 1

5 | 4 7

6 | 2

To compare the two groups, we can look at the distribution of the data and compare the range of the data. From the stem plots, we can see that Group M has weights ranging from 14 to 50 kg, while Group N has weights ranging from 10 to 62 kg. Group N has a wider range of weights, with some animals being significantly heavier than the heaviest animal in Group M.

We can also look at the distribution of the data within each group to see if there are any patterns or trends. For example, we can see that Group M has a cluster of values in the 20s and 30s, while Group N has a more even distribution of values throughout the range. This suggests that Group M has a higher proportion of animals that are relatively similar in weight, while Group N has a more diverse range of weights.

Overall, the stem plots show that Group N has a wider range of weights compared to Group M, with a more diverse distribution of weights within the group!

Comparing Groups with Histograms: Practice AP-Style Problem

Records are kept by each state in the United States on the number of pupils enrolled in public schools and the number of teachers employed by public schools for each school year. From these records, the ratio of the number of pupils to the number of teachers (P-T ratio) can be calculated for each state. The histograms below show the P-T ratio for every state during the 2001–2002 school year. The histogram on the left displays the ratios for the 24 states that are west of the Mississippi River, and the histogram on the right displays the ratios for the 26 states that are east of the Mississippi River. 🏫

Source: The College Board (via AP Classroom)

The question asks us to estimate the median (not to compute but estimate). For states west of the Mississippi (n = 24), n/2, the median falls between the 12th and 13th value in the ordered list, and both the 12th and 13th values fall in the interval 15–16. For states east of the Mississippi (n = 26) the median falls between the 13th and 14th value in the ordered list, and both of these values also fall in the interval 15–16. So, both groups have median at least 15 or at most 16 students per teacher. 

b. Write a few sentences comparing the distributions of P-T ratios for states in the two groups (west and east) during the 2001–2002 school year.

Here, you apply the three things about the distribution: shape, center and spread one by one. Always start with shape first. The shapes of the two histograms look different. The histogram for West is unimodal and skewed to the right, whereas the histogram East is unimodal and nearly symmetric.

For the center we already found in part (a), that the medians of the two distributions are about the same, between 15 and 16 for both distributions.

Aaaaand finally, report the spread! Look at how the values are scattered or concentrated next to its center on the distributions. The histograms show that West values vary more than in East. Although the data are grouped but we still can approximate the range. The range for the west is at most 22 – 12 = 10, and the range for the east is at most 19 – 12 = 7. The east has less variability compared to the West.

c. Using your answers in parts (a) and (b), explain how you think the mean P-T ratio during the 2001–2002 school year will compare for the two groups (west and east).

The two histograms have different shapes. Since West is skewed to the right, the mean will be higher and greater than the median. The highest number on the right tail will affect the mean number. For East, since it is fairly symmetric, the mean will be close to the median. To compare the two groups, we can conclude that the mean for the west group will probably be greater than the mean for the east group.

Comparing Groups with Box Plots: Practice AP-Style Problem

A team of psychologists studied the concept of visualization in basketball, where players visualize making a basket before shooting the ball. They conducted an experiment in which 20 basketball players with similar abilities were randomly assigned to two groups. The 10 players in group 1 received visualization training, and the 10 players in group 2 did not. 🏀

Each player stood 22 feet from the basket at the same location on the basketball court. Each player was then instructed to attempt to make the basket until two consecutive baskets were made. The players who received visualization training were instructed to use visualization techniques before attempting to make the basket. The total number of attempts, including the last two attempts, were recorded for each player.

The total number of attempts for each of the 20 players are summarized in the following box plots.

Source: The College Board

We have two groups, with 10 basketball players randomly assigned to each group. 

We learn from the question that group 1 received visualization training but group 2. There are a few things here we can compare to find the answer to the question. We can see both groups have the same minimum attempts, and all other measures are different. 

25% of the time the group 1 made the basket in 3 trials but group 2 in 4 trials. 

Now, look at the median. The median is much lower for group 1 than for group 2. Group 1 has an outlier, which is still less than the maximum of group 2. We can see that the training had an impact on group 1, as all the 5 summary measures are less than from group 2; however, we are not asked to generalize this finding yet.  

Finally, to answer the question: it is good enough only to report the median. Because the median number of attempts for players who received visualization training (4) is less than the median number of attempts for players who did not receive training (7), those who received visualization training tend to need fewer attempts to make two consecutive baskets. Talk about talent and persistence! 🦘