chapter 3 : Quick Stats (Descriptive Statistics)

What Is an Outlier?

Outliers are data points that are significantly different from most other values in a dataset — they lie far outside the typical range of the data. They may happen due to measurement errors, unusual events, or natural variation.

1. IQR Method (Interquartile Range)

 Definition:

The IQR method detects outliers by measuring the spread of the middle 50% of the data. It uses the difference between the third quartile (Q3) and first quartile (Q1):
IQR = Q3 − Q1.

Then:

  • Lower bound = Q1 − 1.5 × IQR
  • Upper bound = Q3 + 1.5 × IQR
    Any value below the lower bound or above the upper bound is considered an outlier.

Example:

Suppose dataset:
1, 2, 3, 4, 5, 6, 50

  • Q1 (25th percentile) = 2
  • Q3 (75th percentile) = 6
  • IQR = 6 − 2 = 4

Lower bound = 2 − 1.5×4 = -4
Upper bound = 6 + 1.5×4 = 12

Here, 50 lies above 12 → an outlier.


2. Z-Score Method

Definition:

Z-score measures how many standard deviations a data point is from the mean of the dataset.
A high absolute z-score (e.g., greater than 3 or sometimes 2) indicates the value is unusually far from the rest — an outlier.

Formula:

Z=x−μσZ = \frac{x - \mu}{\sigma}Z=σx−μ​

Where:

  • x = data point
  • μ = mean of the dataset
  • σ = standard deviation of the dataset

 Example:

Dataset:
1, 2, 2, 2, 3, 15

Mean ≈ 4.17
Standard deviation ≈ 5.07
Z-score of 15 ≈ (15–4.17) / 5.07 ≈ 2.15 (above typical threshold ~2 or 3) → outlier.