Outliers are data points that are significantly different
from most other values in a dataset — they lie far outside the typical range of
the data. They may happen due to measurement errors, unusual events, or natural
variation.
1. IQR Method (Interquartile Range)
Definition:
The IQR method detects outliers by measuring the
spread of the middle 50% of the data. It uses the difference between the
third quartile (Q3) and first quartile (Q1):
IQR = Q3 − Q1.
Then:
Example:
Suppose dataset:
1, 2, 3, 4, 5, 6, 50
Lower bound = 2 − 1.5×4 = -4
Upper bound = 6 + 1.5×4 = 12
Here, 50 lies above 12 → an outlier.
2. Z-Score Method
Definition:
Z-score measures how many standard deviations
a data point is from the mean of the dataset.
A high absolute z-score (e.g., greater than 3 or sometimes 2)
indicates the value is unusually far from the rest — an outlier.
Formula:
Z=x−μσZ = \frac{x - \mu}{\sigma}Z=σx−μ
Where:
Example:
Dataset:
1, 2, 2, 2, 3, 15
Mean ≈ 4.17
Standard deviation ≈ 5.07
Z-score of 15 ≈ (15–4.17) / 5.07 ≈ 2.15 (above typical threshold ~2 or
3) → outlier.