Skip to content

SelvamathanS/Data-preprocessing-with-various-methods

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Data Preprocessing and Outlier Detection with various methods

This repository demonstrates end-to-end data preprocessing techniques using Python, including handling missing values, feature encoding, scaling, and outlier detection.

Key Topics Covered

  • Missing value analysis (missingno, seaborn)
  • Filling missing values (mean, median, mode)
  • Data leakage prevention using SimpleImputer
  • Preprocessing with Pipeline and ColumnTransformer
  • Feature scaling and one-hot encoding
  • Outlier detection using:
    • Boxplot & Scatterplot
    • IQR method
    • Z-score
    • Isolation Forest

Libraries Used

  • pandas, numpy
  • seaborn, matplotlib, missingno
  • scikit-learn
  • scipy

File

  • preprocessing.ipynb – Jupyter Notebook containing the full workflow

Use Case

Useful for beginners and practitioners learning data cleaning, preprocessing pipelines, and outlier handling in machine learning.


⭐ If you find this helpful, feel free to star the repository!

About

End-to-end data preprocessing techniques using Python, including handling missing values, feature encoding, scaling, and outlier detection.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors