Skip to content

SSusantAchary/StyleSense

Repository files navigation

StyleSense — Personalization Platform

Production-grade e-commerce recommendation engine. Full-stack. Containerized. Battle-tested.

StyleSense Banner

Live Demo: https://ssusantachary.github.io/StyleSense/

System Design of StyleSense

System Design

Recommendation Types (Integrated)

Recommendation Intuition


Overview

StyleSense is a fashion e-commerce personalization platform built to demonstrate how large-scale recommendation systems are designed, implemented, and deployed in production environments.

This project covers the complete engineering lifecycle: raw data ingestion, ML embedding generation, a multi-strategy recommendation engine, a FastAPI backend, a React frontend, and fully containerized deployment via Docker Compose.


What's Inside

FastAPI SQLAlchemy Alembic PostgreSQL pgvector PyTorch JWT React Vite Tailwind CSS Zustand Docker Compose

Layer Technology Relevant Link Icon
Backend API FastAPI https://fastapi.tiangolo.com/ FastAPI
Backend API SQLAlchemy https://www.sqlalchemy.org/ SQLAlchemy
Backend API Alembic https://alembic.sqlalchemy.org/ Alembic
Database PostgreSQL https://www.postgresql.org/ PostgreSQL
Database pgvector https://github.com/pgvector/pgvector pgvector
ML Embeddings PyTorch + torchvision (ResNet50) https://pytorch.org/ PyTorch
Auth JWT Bearer (python-jose) https://jwt.io/ JWT
Frontend React https://react.dev/ React
Frontend Vite https://vitejs.dev/ Vite
Frontend Tailwind CSS https://tailwindcss.com/ Tailwind CSS
Frontend Zustand https://github.com/pmndrs/zustand Zustand
Deployment Docker Compose https://docs.docker.com/compose/ Docker

Repository Structure

Ecommerce_Personalisation/
├── backend/                # FastAPI app, services, models, migrations, tests
├── frontend/               # React app (Vite + Tailwind + Zustand)
├── data/
│   ├── fashion.csv         # Raw catalog input
│   ├── processed/          # Cleaned products + embeddings
│   └── simulated/          # Generated users, orders, interactions
├── scripts/                # Data prep, embedding generation, validation
├── docs/                   # Frontend QA checklists and notes
└── docker-compose.yml      # Orchestrates db + backend + frontend

Data at a Glance

Entity Count
Products (raw) 2,906
Products (cleaned) 2,906
Users 1,200
Orders 15,000
Order items 40,000
Interactions 25,000
Embedding shape (2906, 2048) float32

Product Schema

Key fields in products_clean.csv:

  • Identity: id
  • Taxonomy: masterCategory, subCategory, articleType
  • Personalization signals: gender, baseColour, season, year, usage
  • Commerce signals: price, stock_count, avg_rating
  • Image linkage: image_path

Simulated User Cohorts

simulate_users.py generates structured behavioral cohorts to make recommendation evaluation meaningful:

  • Nike lovers
  • Dress buyers
  • Shoe collectors
  • Seasonal shoppers
  • General shoppers

Recommendation Engine — 10 Scenarios

1. History — Purchase-Based Personalization

Endpoint: GET /api/recommend/history

Question: Based on what this user bought, what should they see next?

Scores candidates by matching against learned sub-category, article type, and color preferences extracted from past orders.

def score_history_candidate(product, sub_pref, type_pref, color_pref):
    sub_score = sub_pref.get(product.sub_category or "", 0.0)
    type_score = type_pref.get(product.article_type or "", 0.0)
    color_score = color_pref.get(product.base_colour or "", 0.0)
    rating_score = float(product.avg_rating or 0.0)
    return sub_score * 3.0 + type_score * 2.0 + color_score * 1.0 + rating_score * 0.5

2. Collaborative — Item-Item Co-Purchase

Endpoint: GET /api/recommend/collaborative/{product_id}

Question: People who bought this also bought what?

Uses popularity dampening via square root normalization to prevent globally popular items from dominating every recommendation surface.

from math import sqrt

def collaborative_score(co_count: int, popularity: int) -> float:
    return co_count / sqrt(max(popularity, 1))

3. Restock — Buy Again

Endpoint: GET /api/recommend/restock

Question: What previously purchased items should this user repurchase?

Identifies replenishable items using repeat-purchase signals filtered by delivery status and live stock availability.

def is_restock_candidate(purchase_count: int, order_status: str, in_stock: bool) -> bool:
    return order_status == "delivered" and in_stock and purchase_count >= 2

4. Visual Alternatives — Embedding Similarity

Endpoint: GET /api/recommend/alternatives/{product_id}

Question: What products look visually similar to this item?

Leverages L2-normalized ResNet50 embeddings. Cosine similarity between normalized vectors reduces to a dot product — efficient and accurate.

def visual_similarity(query_vec, candidate_vec):
    # Vectors are L2-normalized — dot product equals cosine similarity
    return float((query_vec * candidate_vec).sum())

5. Cart Add-ons — Cross-Sell

Endpoint: GET /api/recommend/cart-addons

Question: Given what's in the cart, what complementary items should we surface?

Weighted blend of co-purchase signals, price attractiveness, and rating quality. Same-category duplicates are penalized to maximize cross-category discovery.

def cart_addon_score(co_purchase_score, price_discount_factor, rating_factor, is_cross_sell):
    score = co_purchase_score * 0.6 + price_discount_factor * 0.2 + rating_factor * 0.2
    if not is_cross_sell:
        score *= 0.7
    return score

6. Trending — Platform-Wide Signals

Endpoint: GET /api/recommend/trending

Question: What is hot platform-wide right now?

Rolling 30-day window combining order volume (70%) and interaction count (30%) with a rating boost.

def trending_score(order_qty: float, interaction_count: float, rating: float) -> float:
    return order_qty * 0.7 + interaction_count * 0.3 + rating * 0.2

7. Interaction-Based — Explicit Intent Signals

Endpoint: GET /api/recommend/interaction-based

Question: What should we show based on likes, bookmarks, and reviews?

Differentiates signal strength by interaction type. Review ratings above/below neutral shift the weight up or down.

INTERACTION_WEIGHT = {"like": 1.0, "bookmark": 1.2, "review": 1.5}

def interaction_weight(kind: str, rating: int | None) -> float:
    base = INTERACTION_WEIGHT.get(kind, 1.0)
    review_boost = 1.0 + ((float(rating) - 3.0) / 10.0) if rating is not None else 1.0
    return max(0.2, base * review_boost)

8. Seasonal — Context-Aware Catalog

Endpoint: GET /api/recommend/seasonal

Question: What's relevant for the current season and user profile?

Maps the current month to a season and boosts catalog items matching both the season tag and the user's established category preferences.

def current_season(month: int) -> str:
    if month in {12, 1, 2}: return "Winter"
    if month in {3, 4, 5}:  return "Spring"
    if month in {6, 7, 8}:  return "Summer"
    return "Fall"

9. Budget — Spend Zone Matching

Endpoint: GET /api/recommend/budget

Question: What good options exist within this user's spend comfort zone?

Derives a budget band from historical average unit spend, then scores candidates by closeness to band center, affordability, and rating.

def budget_score(price, avg_price, max_price, rating):
    closeness = 1.0 - min(abs(price - avg_price) / max(avg_price, 1e-9), 1.0)
    affordability = 1.0 - min(price / max(max_price, 1.0), 1.0)
    rating_score = rating / 5.0
    return closeness * 0.5 + affordability * 0.2 + rating_score * 0.3

10. New Arrivals — Freshness + Fit

Endpoint: GET /api/recommend/new-arrivals

Question: What are the latest products most likely to match this user?

Blends recency (year normalization + ID proxy), category fit against user preferences, and rating quality.

def new_arrival_score(year_norm, id_norm, category_match, rating_score):
    return year_norm * 1.8 + id_norm * 0.8 + category_match * 1.2 + rating_score * 0.6

ML & Embeddings Pipeline

Pipeline: scripts/generate_embeddings.py

  1. Load product images from catalog
  2. Pass through pretrained ResNet50 (classification head removed)
  3. Extract 2048-d penultimate layer activations
  4. Apply L2 normalization per vector
  5. Persist to:
    • data/processed/embeddings.npy
    • data/processed/embedding_ids.json

macOS note: Run with --num-workers 0 to avoid torch_shm_manager permission errors.


Backend Architecture

Stack

  • FastAPI — async, high-throughput API framework
  • SQLAlchemy ORM — declarative models, session management
  • Alembic — versioned schema migrations
  • PostgreSQL + pgvector — relational store with native vector similarity search
  • JWT — stateless auth via python-jose

Core Data Models

  • users
  • products — includes embedding vector(2048)
  • orders / order_items
  • cart_items
  • interactions

Startup Lifecycle (backend/app/main.py)

On every cold start:

  1. Execute Alembic migrations (alembic upgrade head)
  2. Seed the database from CSV files if empty
  3. Sync PK sequences to prevent duplicate key errors post-bulk-seed
  4. Load embedding matrix into in-memory store
  5. Mount product images at /images as static files

API Domains

Prefix Responsibility
/api/auth Login, register, token refresh
/api/products Catalog browse and search
/api/cart Cart CRUD
/api/orders Order placement and history
/api/interactions Like, bookmark, review
/api/recommend All 10 recommendation endpoints

Frontend Architecture

Stack

  • React + Vite — fast dev server and optimized production build
  • Tailwind CSS — utility-first styling
  • Zustand — lightweight global state
  • Axios — HTTP client with JWT interceptor

Pages

  • Home
  • Catalog
  • Product Detail
  • Cart (slide-out panel)
  • Checkout
  • Order Success
  • Dashboard
  • Login / Register

Setup and Deployment

Prerequisites

Tool Version
Python 3.12+
Node.js 20+
Docker Desktop Latest
Kaggle account Required for dataset

Step 1 — Environment Variables

Create .env in project root:

POSTGRES_DB=stylesense
POSTGRES_USER=admin
POSTGRES_PASSWORD=admin123
DATABASE_URL=postgresql://admin:admin123@localhost:5432/stylesense
JWT_SECRET=change-this-to-a-64-char-random-string
VITE_API_URL=http://localhost:8000/api
CORS_ORIGINS=http://localhost:3000,http://localhost:5173

Step 2 — Download Dataset

Dataset: Fashion Images on Kaggle

Via Kaggle CLI (recommended):

pip install --upgrade kaggle
mkdir -p ~/.kaggle
# Place your kaggle.json token at ~/.kaggle/kaggle.json
chmod 600 ~/.kaggle/kaggle.json

mkdir -p data/raw_kaggle
kaggle datasets download -d vikashrajluhaniwal/fashion-images -p data/raw_kaggle --unzip
cp data/raw_kaggle/styles.csv data/fashion.csv

Step 3 — Install Dependencies

# Python
python3 -m venv .venv
source .venv/bin/activate
pip install -r backend/requirements.txt
pip install torch torchvision pillow

# Node
cd frontend && npm install && cd ..

Step 4 — Generate Data Artifacts

source .venv/bin/activate

python3 scripts/clean_data.py \
  --input-csv data/fashion.csv \
  --data-dir data \
  --output-csv data/processed/products_clean.csv

python3 scripts/simulate_users.py \
  --products-csv data/processed/products_clean.csv \
  --out-dir data/simulated

python3 scripts/generate_embeddings.py \
  --products-csv data/processed/products_clean.csv \
  --data-dir data \
  --output-npy data/processed/embeddings.npy \
  --ids-json data/processed/embedding_ids.json \
  --num-workers 0

Step 5 — Start Services

Docker Compose (recommended):

docker compose up --build
Service URL
Frontend http://localhost:3000
Backend http://localhost:8000
Database localhost:5432

Local processes:

# DB only via Docker
docker compose up -d db

# Backend
source .venv/bin/activate && cd backend
uvicorn app.main:app --reload --port 8000

# Frontend
cd frontend && npm run dev

Verification

# Health checks
curl http://localhost:8000/health
curl http://localhost:8000/api/recommend/health

# Login with seeded user
curl -X POST http://localhost:8000/api/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email":"isha.mehta1@stylesense.com","password":"password123"}'

# Public recommendation
curl http://localhost:8000/api/recommend/trending

# Protected recommendation
curl http://localhost:8000/api/recommend/history \
  -H "Authorization: Bearer <ACCESS_TOKEN>"

Tests

cd backend && pytest

Recommendation Quality Validation

python3 scripts/validate_recommendations.py --base-url http://localhost:8000/api

Validation checks: history preference match, restock non-empty behavior, collaborative input exclusion, visual similarity threshold, cart add-ons cross-category distribution.


Common Issues

Issue Fix
Cannot connect to Docker daemon Start Docker Desktop, then retry docker compose up --build
TLS timeout pulling node:20-alpine Run docker pull node:20-alpine manually first
Duplicate order ID on checkout Already fixed — sequence sync runs at startup in seed.py
torch_shm_manager permission error (macOS) Use --num-workers 0 when running generate_embeddings.py

Recommendation System Theory

Types

Collaborative Filtering — recommends based on similar user behavior.

User A liked: [X, Y, Z]  →  Recommend Z to User B who liked X and Y

Content-Based Filtering — recommends items similar to what the user previously liked, based on item features.

Liked: "Inception" (Sci-Fi, Thriller)  →  Recommend "Interstellar"

Hybrid Filtering — combines both methods. Reduces cold-start problems, improves ranking quality. Used by Netflix, Spotify, Amazon.

Knowledge-Based — applies explicit user constraints and domain rules. Best for high-consideration purchases (cars, real estate).

Demographic-Based — uses profile attributes (age, gender, location) to segment recommendations.

Type Comparison

Type Signal Source Best Fit
Collaborative Similar users Streaming, social platforms
Content-Based Item features News, music, articles
Hybrid Both Large consumer platforms
Knowledge-Based Rules + domain knowledge Cars, real estate
Demographic-Based User profile attributes E-commerce segmentation

Real-World Examples

  • Netflix — hybrid (watch history + similar users + content signals)
  • Amazon — collaborative + content + contextual signals
  • Spotify — hybrid (listening behavior + audio features)
  • YouTube — collaborative + content + freshness + trending layers

Why This Project Uses a Hybrid Approach

A single method doesn't hold up across all user states:

  • New users have no history → cold-start problem
  • Popular items dominate collaborative signals → diversity collapse
  • Pure content similarity misses emerging trends

StyleSense combines behavior-driven methods (history, collaborative, restock, trending), attribute-driven methods (seasonal, budget, new arrivals), explicit intent signals (interaction-based, cart add-ons), and visual embedding similarity (alternatives) to handle all these failure modes.

Theory-to-Code Mapping

Theory Implementation
Collaborative filtering services/recommendation/collaborative.py
Content-based filtering history.py, interaction_based.py
Embedding similarity visual.py + ml/embeddings_store.py
Knowledge-like constraints budget.py, seasonal.py
Hybrid surface Multi-scenario tabs in frontend Dashboard

Screenshots

Home Page Home Page

Product Detail Page Product Detail Page

Recommendation Types Recommendation Types

Orders Page Orders Page


Demo Walkthrough

Seeded test account:

email:    isha.mehta1@stylesense.com
password: password123

Suggested demo flow:

  1. Login and inspect JWT token
  2. Hit /api/recommend/trending — no auth required, quick signal check
  3. Hit /api/recommend/history — auth required, shows personalization
  4. Open Product Detail → visual alternatives tab
  5. Add items to cart → cart add-ons surface
  6. Complete checkout → inspect Order History with item-level detail

About

🛍️StyleSense✨ AI-powered fashion e-commerce with a 10-strategy hybrid rec engine — visual embeddings 🧠, collaborative filtering 🤝, trending 📈, seasonal 📅 & budget-aware 💰 recs. Stack: ⚡FastAPI · 🐘PostgreSQL · pgvector · ⚛️React · 🐳Docker. Built for scale. 🚀

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors