Browse By Unit
7 min read•june 18, 2024
Jed Quiaoit
Lusine Ghazaryan
Jed Quiaoit
Lusine Ghazaryan
Once we finish organizing the set of data of our interest into a certain display of our choice, the next task is to describe the data. In other words we should tell what we see. There are three things that we should look for when trying to find trends and patterns: shape, center and spread. ❗
To describe the shape of the display, check the following:
A symmetric distribution is one in which the values on either side of the central value (such as the median or mean) are roughly equal. For example, a bell-shaped curve is a symmetric distribution because the values on either side of the peak are roughly equal.
To determine if a histogram is symmetric, you can visually inspect the shape of the histogram and see if it appears to be roughly symmetrical. You can also use statistical measures such as the mean and median to determine if the distribution is symmetric.
On the other hand, negative skewness occurs when the distribution is skewed to the left, with a long tail on the left side and a shorter tail on the right side. This means that the majority of the values in the distribution are clustered on the right side, with a few values on the left side that are much smaller or larger.
Uniform distributions, on the other hand, do not have a mode because all of the values in the distribution occur with roughly the same frequency. In a uniform distribution, there is no single value that stands out as being more common than any other value.
There are a few different ways to identify outliers in a dataset. One way is to use graphical methods such as boxplots, which can help you visualize the distribution of the data and identify any values that are significantly different from the rest of the data. You can also use statistical measures such as the mean and standard deviation to identify outliers.
The center is a good measure, but it's definitely not perfect if we don’t report it with the spread. There are several measures that can be used to describe the spread or dispersion of a dataset, including the range, standard deviation, and interquartile range (IQR). 🕸️
As we go through the next couple of sections, you'll be more familiar and comfortable describing distributions in terms of shape, center, and spread... woo-hoo! 🤩
<< Hide Menu
7 min read•june 18, 2024
Jed Quiaoit
Lusine Ghazaryan
Jed Quiaoit
Lusine Ghazaryan
Once we finish organizing the set of data of our interest into a certain display of our choice, the next task is to describe the data. In other words we should tell what we see. There are three things that we should look for when trying to find trends and patterns: shape, center and spread. ❗
To describe the shape of the display, check the following:
A symmetric distribution is one in which the values on either side of the central value (such as the median or mean) are roughly equal. For example, a bell-shaped curve is a symmetric distribution because the values on either side of the peak are roughly equal.
To determine if a histogram is symmetric, you can visually inspect the shape of the histogram and see if it appears to be roughly symmetrical. You can also use statistical measures such as the mean and median to determine if the distribution is symmetric.
On the other hand, negative skewness occurs when the distribution is skewed to the left, with a long tail on the left side and a shorter tail on the right side. This means that the majority of the values in the distribution are clustered on the right side, with a few values on the left side that are much smaller or larger.
Uniform distributions, on the other hand, do not have a mode because all of the values in the distribution occur with roughly the same frequency. In a uniform distribution, there is no single value that stands out as being more common than any other value.
There are a few different ways to identify outliers in a dataset. One way is to use graphical methods such as boxplots, which can help you visualize the distribution of the data and identify any values that are significantly different from the rest of the data. You can also use statistical measures such as the mean and standard deviation to identify outliers.
The center is a good measure, but it's definitely not perfect if we don’t report it with the spread. There are several measures that can be used to describe the spread or dispersion of a dataset, including the range, standard deviation, and interquartile range (IQR). 🕸️
As we go through the next couple of sections, you'll be more familiar and comfortable describing distributions in terms of shape, center, and spread... woo-hoo! 🤩
© 2024 Fiveable Inc. All rights reserved.