This repository demonstrates end-to-end data preprocessing techniques using Python, including handling missing values, feature encoding, scaling, and outlier detection.
- Missing value analysis (missingno, seaborn)
- Filling missing values (mean, median, mode)
- Data leakage prevention using
SimpleImputer - Preprocessing with
PipelineandColumnTransformer - Feature scaling and one-hot encoding
- Outlier detection using:
- Boxplot & Scatterplot
- IQR method
- Z-score
- Isolation Forest
- pandas, numpy
- seaborn, matplotlib, missingno
- scikit-learn
- scipy
preprocessing.ipynb– Jupyter Notebook containing the full workflow
Useful for beginners and practitioners learning data cleaning, preprocessing pipelines, and outlier handling in machine learning.
⭐ If you find this helpful, feel free to star the repository!