Member-only story

12 Probability Distributions in Data Science

btd
13 min readNov 11, 2023

--

In the context of data science, distributions refer to the patterns by which data values are spread across a dataset. Understanding the underlying distribution of data is crucial for various statistical analyses and modeling techniques.

Here are some important probability distributions frequently encountered in data science:

1. Normal Distribution (Gaussian Distribution)

The normal distribution, also known as the Gaussian distribution, is a fundamental and widely used probability distribution.

  • Symmetrical, bell-shaped curve characterized by its mean (center) and standard deviation (spread).
  • The mean (μ) is the center of the distribution, and the standard deviation (σ) controls the spread or dispersion of the distribution.
  • About 68% of the data falls within one standard deviation of the mean, 95% within two standard deviations, and approximately 99.7% within three standard deviations.
  • Empirical Rule (68–95–99.7 Rule) states that for a normal distribution, approximately 68% of the data falls within one standard deviation of the mean, 95% within two standard deviations, and 99.7% within three standard deviations.
  • The probability density function (PDF) of the…

--

--

btd
btd

Responses (1)