Outliers

Data cleaning? Missing Values Duplicate Data Case Sensitivity Data Types Outliers

Quick Stats Basic Stats What Is an Outlier? How to Practice Basic Stats & Outliers

1. Variance 2. Standard Deviation (SD) 3. Coefficient of Variation (CV) 4 Practice Variance, Standard Deviation (SD), Coefficient of Variation (CV)

1. Skewness 2. Kurtosis

Correlation Why We Use Correlation? Pearson Correlation (r) Spearman Correlation (ρ or rₛ)

Trend Analysis 1. Time forecasting 2. Trend Break Detection 3 Moving Average

Grouping 1. Group By Sum 2. Group By Mean (Average) 3. Group By Count 4. Group By Minimum 5. Group By Maximum 6. Group By Median

AI Insights 1. Anomaly Detection 2 .Forecast Suggestion (Predictive Forecasting) 3 Correlation Warning 4 Trend Direction Prediction 5 Seasonality Detection 6 Top Driver / Influencer Analysis 7 Productivity Improvement Prediction 8 Business Risk Warnings

CERTIFICATION

chapter 2 : Data cleaning

Outliers are data points that lie far outside the pattern of the rest of the dataset — values that are significantly higher or lower than most others. They may occur due to measurement errors, data entry mistakes, natural variation, or rare events, and they can distort key statistics like averages and standard deviations if not handled correctly.

Example:
A column with ages: 22, 24, 23, 25, 120
Here, 120 is likely an outlier because it differs drastically from the typical age range.

Why Outliers Matter

Outliers can have a large impact on data analysis because:
• They can skew summary statistics such as the mean.
• They may distort trends, predictions, and modeling results.
• Sometimes outliers represent true, important signals (e.g., exceptional sales records), so they shouldn’t always be removed without thought.

How Outliers Are Detected

There are several common ways to identify outliers:

📈 Visualization Methods
• Box plots, histograms, scatter plots make it easy to spot values that lie far outside normal ranges.

📊 Statistical Methods
• Interquartile Range (IQR): Values below Q1 − 1.5×IQR or above Q3 + 1.5×IQR are flagged as outliers.
• Z-Score: Values with a high absolute z-score (e.g., greater than 2 or 3) indicate distances far from the mean.

How Outliers Are Treated

Deciding what to do with outliers depends on context:
✔ Investigate them manually to check whether they are errors or true values.
✔ Remove outliers only if they are clearly due to mistakes or noise.
✔ Use robust statistics like median or IQR-based methods that are less affected by outliers.
✔ Transform or cap values so extreme points don’t overly drive results.

⬅ Previous

Course Lessons

Course Lessons

chapter 2 : Data cleaning

Outliers

Course Lessons

chapter 1 : Data analytics

chapter 2 : Data cleaning

chapter 3 : Quick Stats (Descriptive Statistics)

chapter 4 : Data Variation

chapter 5 : Data Shape

chapter 6 : Correlation

chapter 7 : Trend Analysis

chapter 8 : Grouping

chapter 9 :AI insights

chapter 10 : Certification

Course Lessons

chapter 1 : Data analytics

chapter 2 : Data cleaning

chapter 3 : Quick Stats (Descriptive Statistics)

chapter 4 : Data Variation

chapter 5 : Data Shape

chapter 6 : Correlation

chapter 7 : Trend Analysis

chapter 8 : Grouping

chapter 9 :AI insights

chapter 10 : Certification

chapter 2 : Data cleaning

Outliers