Enron NLP – Topic Modeling & Email Analysis

Project Overview

This project applies Exploratory Data Analysis (EDA) and Natural Language Processing (NLP) techniques to a large corpus of emails from the Enron Email Dataset.

The primary goal is to:

Identify key discussion topics
Measure topic frequency
Analyze sentiment patterns within corporate email communications

By combining traditional NLP preprocessing with topic modeling and sentiment analysis, this project provides insights into the dominant themes present in the Enron email corpus.

Objectives

Perform exploratory data analysis on email text data
Clean and preprocess unstructured text
Extract latent topics using topic modeling
Evaluate topic quality using coherence scores
Analyze sentiment using VADER
Summarize dominant themes across the dataset

Dataset

Source: Kaggle – Enron Email Dataset
https://www.kaggle.com/datasets/wcukierski/enron-email-dataset

File Used:

emails.csv

This dataset contains hundreds of thousands of real emails exchanged within Enron prior to its collapse.

Technologies & Libraries Used

Core Libraries

Python
pandas
numpy
re, string
warnings

NLP & Text Processing

nltk
spaCy
gensim

Topic Modeling

scikit-learn (NMF)
TF-IDF Vectorization
Gensim CoherenceModel

Sentiment Analysis

NLTK VADER SentimentIntensityAnalyzer

Visualization

matplotlib

Environment & Versions

Key package versions used:

numpy    1.26.4
scipy    1.13.1
gensim   4.3.3

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Emails NLP.ipynb		Emails NLP.ipynb
Enron_Dataset_NLP.pdf		Enron_Dataset_NLP.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enron NLP – Topic Modeling & Email Analysis

Project Overview

Objectives

Dataset

Technologies & Libraries Used

Core Libraries

NLP & Text Processing

Topic Modeling

Sentiment Analysis

Visualization

Environment & Versions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Enron NLP – Topic Modeling & Email Analysis

Project Overview

Objectives

Dataset

Technologies & Libraries Used

Core Libraries

NLP & Text Processing

Topic Modeling

Sentiment Analysis

Visualization

Environment & Versions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages