StyleSense — Personalization Platform

Production-grade e-commerce recommendation engine. Full-stack. Containerized. Battle-tested.

Live Demo: https://ssusantachary.github.io/StyleSense/

System Design of StyleSense

Recommendation Types (Integrated)

Overview

StyleSense is a fashion e-commerce personalization platform built to demonstrate how large-scale recommendation systems are designed, implemented, and deployed in production environments.

This project covers the complete engineering lifecycle: raw data ingestion, ML embedding generation, a multi-strategy recommendation engine, a FastAPI backend, a React frontend, and fully containerized deployment via Docker Compose.

What's Inside

Layer	Technology	Relevant Link
Backend API	FastAPI	https://fastapi.tiangolo.com/
Backend API	SQLAlchemy	https://www.sqlalchemy.org/
Backend API	Alembic	https://alembic.sqlalchemy.org/
Database	PostgreSQL	https://www.postgresql.org/
Database	pgvector	https://github.com/pgvector/pgvector
ML Embeddings	PyTorch + torchvision (ResNet50)	https://pytorch.org/
Auth	JWT Bearer (`python-jose`)	https://jwt.io/
Frontend	React	https://react.dev/
Frontend	Vite	https://vitejs.dev/
Frontend	Tailwind CSS	https://tailwindcss.com/
Frontend	Zustand	https://github.com/pmndrs/zustand
Deployment	Docker Compose	https://docs.docker.com/compose/

Repository Structure

Ecommerce_Personalisation/
├── backend/                # FastAPI app, services, models, migrations, tests
├── frontend/               # React app (Vite + Tailwind + Zustand)
├── data/
│   ├── fashion.csv         # Raw catalog input
│   ├── processed/          # Cleaned products + embeddings
│   └── simulated/          # Generated users, orders, interactions
├── scripts/                # Data prep, embedding generation, validation
├── docs/                   # Frontend QA checklists and notes
└── docker-compose.yml      # Orchestrates db + backend + frontend

Data at a Glance

Entity	Count
Products (raw)	2,906
Products (cleaned)	2,906
Users	1,200
Orders	15,000
Order items	40,000
Interactions	25,000
Embedding shape	(2906, 2048) float32

Product Schema

Key fields in products_clean.csv:

Identity: id
Taxonomy: masterCategory, subCategory, articleType
Personalization signals: gender, baseColour, season, year, usage
Commerce signals: price, stock_count, avg_rating
Image linkage: image_path

Simulated User Cohorts

simulate_users.py generates structured behavioral cohorts to make recommendation evaluation meaningful:

Nike lovers
Dress buyers
Shoe collectors
Seasonal shoppers
General shoppers

Recommendation Engine — 10 Scenarios

1. History — Purchase-Based Personalization

Endpoint: GET /api/recommend/history

Question: Based on what this user bought, what should they see next?

Scores candidates by matching against learned sub-category, article type, and color preferences extracted from past orders.

def score_history_candidate(product, sub_pref, type_pref, color_pref):
    sub_score = sub_pref.get(product.sub_category or "", 0.0)
    type_score = type_pref.get(product.article_type or "", 0.0)
    color_score = color_pref.get(product.base_colour or "", 0.0)
    rating_score = float(product.avg_rating or 0.0)
    return sub_score * 3.0 + type_score * 2.0 + color_score * 1.0 + rating_score * 0.5

2. Collaborative — Item-Item Co-Purchase

Endpoint: GET /api/recommend/collaborative/{product_id}

Question: People who bought this also bought what?

Uses popularity dampening via square root normalization to prevent globally popular items from dominating every recommendation surface.

from math import sqrt

def collaborative_score(co_count: int, popularity: int) -> float:
    return co_count / sqrt(max(popularity, 1))

3. Restock — Buy Again

Endpoint: GET /api/recommend/restock

Question: What previously purchased items should this user repurchase?

Identifies replenishable items using repeat-purchase signals filtered by delivery status and live stock availability.

def is_restock_candidate(purchase_count: int, order_status: str, in_stock: bool) -> bool:
    return order_status == "delivered" and in_stock and purchase_count >= 2

4. Visual Alternatives — Embedding Similarity

Endpoint: GET /api/recommend/alternatives/{product_id}

Question: What products look visually similar to this item?

Leverages L2-normalized ResNet50 embeddings. Cosine similarity between normalized vectors reduces to a dot product — efficient and accurate.

def visual_similarity(query_vec, candidate_vec):
    # Vectors are L2-normalized — dot product equals cosine similarity
    return float((query_vec * candidate_vec).sum())

5. Cart Add-ons — Cross-Sell

Endpoint: GET /api/recommend/cart-addons

Question: Given what's in the cart, what complementary items should we surface?

Weighted blend of co-purchase signals, price attractiveness, and rating quality. Same-category duplicates are penalized to maximize cross-category discovery.

def cart_addon_score(co_purchase_score, price_discount_factor, rating_factor, is_cross_sell):
    score = co_purchase_score * 0.6 + price_discount_factor * 0.2 + rating_factor * 0.2
    if not is_cross_sell:
        score *= 0.7
    return score

6. Trending — Platform-Wide Signals

Endpoint: GET /api/recommend/trending

Question: What is hot platform-wide right now?

Rolling 30-day window combining order volume (70%) and interaction count (30%) with a rating boost.

def trending_score(order_qty: float, interaction_count: float, rating: float) -> float:
    return order_qty * 0.7 + interaction_count * 0.3 + rating * 0.2

7. Interaction-Based — Explicit Intent Signals

Endpoint: GET /api/recommend/interaction-based

Question: What should we show based on likes, bookmarks, and reviews?

Differentiates signal strength by interaction type. Review ratings above/below neutral shift the weight up or down.

INTERACTION_WEIGHT = {"like": 1.0, "bookmark": 1.2, "review": 1.5}

def interaction_weight(kind: str, rating: int | None) -> float:
    base = INTERACTION_WEIGHT.get(kind, 1.0)
    review_boost = 1.0 + ((float(rating) - 3.0) / 10.0) if rating is not None else 1.0
    return max(0.2, base * review_boost)

8. Seasonal — Context-Aware Catalog

Endpoint: GET /api/recommend/seasonal

Question: What's relevant for the current season and user profile?

Maps the current month to a season and boosts catalog items matching both the season tag and the user's established category preferences.

def current_season(month: int) -> str:
    if month in {12, 1, 2}: return "Winter"
    if month in {3, 4, 5}:  return "Spring"
    if month in {6, 7, 8}:  return "Summer"
    return "Fall"

9. Budget — Spend Zone Matching

Endpoint: GET /api/recommend/budget

Question: What good options exist within this user's spend comfort zone?

Derives a budget band from historical average unit spend, then scores candidates by closeness to band center, affordability, and rating.

def budget_score(price, avg_price, max_price, rating):
    closeness = 1.0 - min(abs(price - avg_price) / max(avg_price, 1e-9), 1.0)
    affordability = 1.0 - min(price / max(max_price, 1.0), 1.0)
    rating_score = rating / 5.0
    return closeness * 0.5 + affordability * 0.2 + rating_score * 0.3

10. New Arrivals — Freshness + Fit

Endpoint: GET /api/recommend/new-arrivals

Question: What are the latest products most likely to match this user?

Blends recency (year normalization + ID proxy), category fit against user preferences, and rating quality.

def new_arrival_score(year_norm, id_norm, category_match, rating_score):
    return year_norm * 1.8 + id_norm * 0.8 + category_match * 1.2 + rating_score * 0.6

ML & Embeddings Pipeline

Pipeline: `scripts/generate_embeddings.py`

Load product images from catalog
Pass through pretrained ResNet50 (classification head removed)
Extract 2048-d penultimate layer activations
Apply L2 normalization per vector
Persist to:
- data/processed/embeddings.npy
- data/processed/embedding_ids.json

macOS note: Run with --num-workers 0 to avoid torch_shm_manager permission errors.

Backend Architecture

Stack

FastAPI — async, high-throughput API framework
SQLAlchemy ORM — declarative models, session management
Alembic — versioned schema migrations
PostgreSQL + pgvector — relational store with native vector similarity search
JWT — stateless auth via python-jose

Core Data Models

users
products — includes embedding vector(2048)
orders / order_items
cart_items
interactions

Startup Lifecycle (`backend/app/main.py`)

On every cold start:

Execute Alembic migrations (alembic upgrade head)
Seed the database from CSV files if empty
Sync PK sequences to prevent duplicate key errors post-bulk-seed
Load embedding matrix into in-memory store
Mount product images at /images as static files

API Domains

Prefix	Responsibility
`/api/auth`	Login, register, token refresh
`/api/products`	Catalog browse and search
`/api/cart`	Cart CRUD
`/api/orders`	Order placement and history
`/api/interactions`	Like, bookmark, review
`/api/recommend`	All 10 recommendation endpoints

Frontend Architecture

Stack

React + Vite — fast dev server and optimized production build
Tailwind CSS — utility-first styling
Zustand — lightweight global state
Axios — HTTP client with JWT interceptor

Pages

Home
Catalog
Product Detail
Cart (slide-out panel)
Checkout
Order Success
Dashboard
Login / Register

Setup and Deployment

Prerequisites

Tool	Version
Python	3.12+
Node.js	20+
Docker Desktop	Latest
Kaggle account	Required for dataset

Step 1 — Environment Variables

Create .env in project root:

POSTGRES_DB=stylesense
POSTGRES_USER=admin
POSTGRES_PASSWORD=admin123
DATABASE_URL=postgresql://admin:admin123@localhost:5432/stylesense
JWT_SECRET=change-this-to-a-64-char-random-string
VITE_API_URL=http://localhost:8000/api
CORS_ORIGINS=http://localhost:3000,http://localhost:5173

Step 2 — Download Dataset

Dataset: Fashion Images on Kaggle

Via Kaggle CLI (recommended):

pip install --upgrade kaggle
mkdir -p ~/.kaggle
# Place your kaggle.json token at ~/.kaggle/kaggle.json
chmod 600 ~/.kaggle/kaggle.json

mkdir -p data/raw_kaggle
kaggle datasets download -d vikashrajluhaniwal/fashion-images -p data/raw_kaggle --unzip
cp data/raw_kaggle/styles.csv data/fashion.csv

Step 3 — Install Dependencies

# Python
python3 -m venv .venv
source .venv/bin/activate
pip install -r backend/requirements.txt
pip install torch torchvision pillow

# Node
cd frontend && npm install && cd ..

Step 4 — Generate Data Artifacts

source .venv/bin/activate

python3 scripts/clean_data.py \
  --input-csv data/fashion.csv \
  --data-dir data \
  --output-csv data/processed/products_clean.csv

python3 scripts/simulate_users.py \
  --products-csv data/processed/products_clean.csv \
  --out-dir data/simulated

python3 scripts/generate_embeddings.py \
  --products-csv data/processed/products_clean.csv \
  --data-dir data \
  --output-npy data/processed/embeddings.npy \
  --ids-json data/processed/embedding_ids.json \
  --num-workers 0

Step 5 — Start Services

Docker Compose (recommended):

docker compose up --build

Service	URL
Frontend	http://localhost:3000
Backend	http://localhost:8000
Database	localhost:5432

Local processes:

# DB only via Docker
docker compose up -d db

# Backend
source .venv/bin/activate && cd backend
uvicorn app.main:app --reload --port 8000

# Frontend
cd frontend && npm run dev

Verification

# Health checks
curl http://localhost:8000/health
curl http://localhost:8000/api/recommend/health

# Login with seeded user
curl -X POST http://localhost:8000/api/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email":"isha.mehta1@stylesense.com","password":"password123"}'

# Public recommendation
curl http://localhost:8000/api/recommend/trending

# Protected recommendation
curl http://localhost:8000/api/recommend/history \
  -H "Authorization: Bearer <ACCESS_TOKEN>"

Tests

cd backend && pytest

Recommendation Quality Validation

python3 scripts/validate_recommendations.py --base-url http://localhost:8000/api

Validation checks: history preference match, restock non-empty behavior, collaborative input exclusion, visual similarity threshold, cart add-ons cross-category distribution.

Common Issues

Issue	Fix
`Cannot connect to Docker daemon`	Start Docker Desktop, then retry `docker compose up --build`
TLS timeout pulling `node:20-alpine`	Run `docker pull node:20-alpine` manually first
Duplicate order ID on checkout	Already fixed — sequence sync runs at startup in `seed.py`
`torch_shm_manager` permission error (macOS)	Use `--num-workers 0` when running `generate_embeddings.py`

Recommendation System Theory

Types

Collaborative Filtering — recommends based on similar user behavior.

User A liked: [X, Y, Z]  →  Recommend Z to User B who liked X and Y

Content-Based Filtering — recommends items similar to what the user previously liked, based on item features.

Liked: "Inception" (Sci-Fi, Thriller)  →  Recommend "Interstellar"

Hybrid Filtering — combines both methods. Reduces cold-start problems, improves ranking quality. Used by Netflix, Spotify, Amazon.

Knowledge-Based — applies explicit user constraints and domain rules. Best for high-consideration purchases (cars, real estate).

Demographic-Based — uses profile attributes (age, gender, location) to segment recommendations.

Type Comparison

Type	Signal Source	Best Fit
Collaborative	Similar users	Streaming, social platforms
Content-Based	Item features	News, music, articles
Hybrid	Both	Large consumer platforms
Knowledge-Based	Rules + domain knowledge	Cars, real estate
Demographic-Based	User profile attributes	E-commerce segmentation

Real-World Examples

Netflix — hybrid (watch history + similar users + content signals)
Amazon — collaborative + content + contextual signals
Spotify — hybrid (listening behavior + audio features)
YouTube — collaborative + content + freshness + trending layers

Why This Project Uses a Hybrid Approach

A single method doesn't hold up across all user states:

New users have no history → cold-start problem
Popular items dominate collaborative signals → diversity collapse
Pure content similarity misses emerging trends

StyleSense combines behavior-driven methods (history, collaborative, restock, trending), attribute-driven methods (seasonal, budget, new arrivals), explicit intent signals (interaction-based, cart add-ons), and visual embedding similarity (alternatives) to handle all these failure modes.

Theory-to-Code Mapping

Theory	Implementation
Collaborative filtering	`services/recommendation/collaborative.py`
Content-based filtering	`history.py`, `interaction_based.py`
Embedding similarity	`visual.py` + `ml/embeddings_store.py`
Knowledge-like constraints	`budget.py`, `seasonal.py`
Hybrid surface	Multi-scenario tabs in frontend Dashboard

Screenshots

Home Page

Product Detail Page

Recommendation Types

Orders Page

Demo Walkthrough

Seeded test account:

email:    isha.mehta1@stylesense.com
password: password123

Suggested demo flow:

Login and inspect JWT token
Hit /api/recommend/trending — no auth required, quick signal check
Hit /api/recommend/history — auth required, shows personalization
Open Product Detail → visual alternatives tab
Add items to cart → cart add-ons surface
Complete checkout → inspect Order History with item-level detail

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
app_experience		app_experience
backend		backend
docs		docs
frontend		frontend
scripts		scripts
.env.example		.env.example
LICENSE		LICENSE
README.md		README.md
banner_stylesense.png		banner_stylesense.png
docker-compose.yml		docker-compose.yml
ecom_recsys_system_design.png		ecom_recsys_system_design.png
rec_intuition.png		rec_intuition.png
recsys_type_integrated.png		recsys_type_integrated.png

Folders and files

Latest commit

History

Repository files navigation

StyleSense — Personalization Platform

System Design of StyleSense

Recommendation Types (Integrated)

Overview

What's Inside

Repository Structure

Data at a Glance

Product Schema

Simulated User Cohorts

Recommendation Engine — 10 Scenarios

1. History — Purchase-Based Personalization

2. Collaborative — Item-Item Co-Purchase

3. Restock — Buy Again

4. Visual Alternatives — Embedding Similarity

5. Cart Add-ons — Cross-Sell

6. Trending — Platform-Wide Signals

7. Interaction-Based — Explicit Intent Signals

8. Seasonal — Context-Aware Catalog

9. Budget — Spend Zone Matching

10. New Arrivals — Freshness + Fit

ML & Embeddings Pipeline

Pipeline: scripts/generate_embeddings.py

Backend Architecture

Stack

Core Data Models

Startup Lifecycle (backend/app/main.py)

API Domains

Frontend Architecture

Stack

Pages

Setup and Deployment

Prerequisites

Step 1 — Environment Variables

Step 2 — Download Dataset

Step 3 — Install Dependencies

Step 4 — Generate Data Artifacts

Step 5 — Start Services

Verification

Tests

Recommendation Quality Validation

Common Issues

Recommendation System Theory

Types

Type Comparison

Real-World Examples

Why This Project Uses a Hybrid Approach

Theory-to-Code Mapping

Screenshots

Demo Walkthrough

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Pipeline: `scripts/generate_embeddings.py`

Startup Lifecycle (`backend/app/main.py`)

Packages