Data Preprocessing and Outlier Detection with various methods

This repository demonstrates end-to-end data preprocessing techniques using Python, including handling missing values, feature encoding, scaling, and outlier detection.

Key Topics Covered

Missing value analysis (missingno, seaborn)
Filling missing values (mean, median, mode)
Data leakage prevention using SimpleImputer
Preprocessing with Pipeline and ColumnTransformer
Feature scaling and one-hot encoding
Outlier detection using:
- Boxplot & Scatterplot
- IQR method
- Z-score
- Isolation Forest

Libraries Used

pandas, numpy
seaborn, matplotlib, missingno
scikit-learn
scipy

File

preprocessing.ipynb – Jupyter Notebook containing the full workflow

Use Case

Useful for beginners and practitioners learning data cleaning, preprocessing pipelines, and outlier handling in machine learning.

⭐ If you find this helpful, feel free to star the repository!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Data_processing.ipynb		Data_processing.ipynb
README.md		README.md
preprocessing_dataset.csv		preprocessing_dataset.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Preprocessing and Outlier Detection with various methods

Key Topics Covered

Libraries Used

File

Use Case

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Preprocessing and Outlier Detection with various methods

Key Topics Covered

Libraries Used

File

Use Case

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages