Member-only story
In the context of data science, distributions refer to the patterns by which data values are spread across a dataset. Understanding the underlying distribution of data is crucial for various statistical analyses and modeling techniques.
Here are some important probability distributions frequently encountered in data science:
1. Normal Distribution (Gaussian Distribution)
The normal distribution, also known as the Gaussian distribution, is a fundamental and widely used probability distribution.
- Symmetrical, bell-shaped curve characterized by its mean (center) and standard deviation (spread).
- The mean (
μ
) is the center of the distribution, and the standard deviation (σ
) controls the spread or dispersion of the distribution. - About 68% of the data falls within one standard deviation of the mean, 95% within two standard deviations, and approximately
99.7%
within three standard deviations. - Empirical Rule (68–95–99.7 Rule) states that for a normal distribution, approximately
68%
of the data falls within one standard deviation of the mean,95%
within two standard deviations, and99.7%
within three standard deviations. - The probability density function (PDF) of the…