Production-grade e-commerce recommendation engine. Full-stack. Containerized. Battle-tested.
Live Demo: https://ssusantachary.github.io/StyleSense/
StyleSense is a fashion e-commerce personalization platform built to demonstrate how large-scale recommendation systems are designed, implemented, and deployed in production environments.
This project covers the complete engineering lifecycle: raw data ingestion, ML embedding generation, a multi-strategy recommendation engine, a FastAPI backend, a React frontend, and fully containerized deployment via Docker Compose.
| Layer | Technology | Relevant Link | Icon |
|---|---|---|---|
| Backend API | FastAPI | https://fastapi.tiangolo.com/ | |
| Backend API | SQLAlchemy | https://www.sqlalchemy.org/ | |
| Backend API | Alembic | https://alembic.sqlalchemy.org/ | |
| Database | PostgreSQL | https://www.postgresql.org/ | |
| Database | pgvector | https://github.com/pgvector/pgvector | |
| ML Embeddings | PyTorch + torchvision (ResNet50) | https://pytorch.org/ | |
| Auth | JWT Bearer (python-jose) |
https://jwt.io/ | |
| Frontend | React | https://react.dev/ | |
| Frontend | Vite | https://vitejs.dev/ | |
| Frontend | Tailwind CSS | https://tailwindcss.com/ | |
| Frontend | Zustand | https://github.com/pmndrs/zustand | |
| Deployment | Docker Compose | https://docs.docker.com/compose/ |
Ecommerce_Personalisation/
├── backend/ # FastAPI app, services, models, migrations, tests
├── frontend/ # React app (Vite + Tailwind + Zustand)
├── data/
│ ├── fashion.csv # Raw catalog input
│ ├── processed/ # Cleaned products + embeddings
│ └── simulated/ # Generated users, orders, interactions
├── scripts/ # Data prep, embedding generation, validation
├── docs/ # Frontend QA checklists and notes
└── docker-compose.yml # Orchestrates db + backend + frontend
| Entity | Count |
|---|---|
| Products (raw) | 2,906 |
| Products (cleaned) | 2,906 |
| Users | 1,200 |
| Orders | 15,000 |
| Order items | 40,000 |
| Interactions | 25,000 |
| Embedding shape | (2906, 2048) float32 |
Key fields in products_clean.csv:
- Identity:
id - Taxonomy:
masterCategory,subCategory,articleType - Personalization signals:
gender,baseColour,season,year,usage - Commerce signals:
price,stock_count,avg_rating - Image linkage:
image_path
simulate_users.py generates structured behavioral cohorts to make recommendation evaluation meaningful:
- Nike lovers
- Dress buyers
- Shoe collectors
- Seasonal shoppers
- General shoppers
Endpoint: GET /api/recommend/history
Question: Based on what this user bought, what should they see next?
Scores candidates by matching against learned sub-category, article type, and color preferences extracted from past orders.
def score_history_candidate(product, sub_pref, type_pref, color_pref):
sub_score = sub_pref.get(product.sub_category or "", 0.0)
type_score = type_pref.get(product.article_type or "", 0.0)
color_score = color_pref.get(product.base_colour or "", 0.0)
rating_score = float(product.avg_rating or 0.0)
return sub_score * 3.0 + type_score * 2.0 + color_score * 1.0 + rating_score * 0.5Endpoint: GET /api/recommend/collaborative/{product_id}
Question: People who bought this also bought what?
Uses popularity dampening via square root normalization to prevent globally popular items from dominating every recommendation surface.
from math import sqrt
def collaborative_score(co_count: int, popularity: int) -> float:
return co_count / sqrt(max(popularity, 1))Endpoint: GET /api/recommend/restock
Question: What previously purchased items should this user repurchase?
Identifies replenishable items using repeat-purchase signals filtered by delivery status and live stock availability.
def is_restock_candidate(purchase_count: int, order_status: str, in_stock: bool) -> bool:
return order_status == "delivered" and in_stock and purchase_count >= 2Endpoint: GET /api/recommend/alternatives/{product_id}
Question: What products look visually similar to this item?
Leverages L2-normalized ResNet50 embeddings. Cosine similarity between normalized vectors reduces to a dot product — efficient and accurate.
def visual_similarity(query_vec, candidate_vec):
# Vectors are L2-normalized — dot product equals cosine similarity
return float((query_vec * candidate_vec).sum())Endpoint: GET /api/recommend/cart-addons
Question: Given what's in the cart, what complementary items should we surface?
Weighted blend of co-purchase signals, price attractiveness, and rating quality. Same-category duplicates are penalized to maximize cross-category discovery.
def cart_addon_score(co_purchase_score, price_discount_factor, rating_factor, is_cross_sell):
score = co_purchase_score * 0.6 + price_discount_factor * 0.2 + rating_factor * 0.2
if not is_cross_sell:
score *= 0.7
return scoreEndpoint: GET /api/recommend/trending
Question: What is hot platform-wide right now?
Rolling 30-day window combining order volume (70%) and interaction count (30%) with a rating boost.
def trending_score(order_qty: float, interaction_count: float, rating: float) -> float:
return order_qty * 0.7 + interaction_count * 0.3 + rating * 0.2Endpoint: GET /api/recommend/interaction-based
Question: What should we show based on likes, bookmarks, and reviews?
Differentiates signal strength by interaction type. Review ratings above/below neutral shift the weight up or down.
INTERACTION_WEIGHT = {"like": 1.0, "bookmark": 1.2, "review": 1.5}
def interaction_weight(kind: str, rating: int | None) -> float:
base = INTERACTION_WEIGHT.get(kind, 1.0)
review_boost = 1.0 + ((float(rating) - 3.0) / 10.0) if rating is not None else 1.0
return max(0.2, base * review_boost)Endpoint: GET /api/recommend/seasonal
Question: What's relevant for the current season and user profile?
Maps the current month to a season and boosts catalog items matching both the season tag and the user's established category preferences.
def current_season(month: int) -> str:
if month in {12, 1, 2}: return "Winter"
if month in {3, 4, 5}: return "Spring"
if month in {6, 7, 8}: return "Summer"
return "Fall"Endpoint: GET /api/recommend/budget
Question: What good options exist within this user's spend comfort zone?
Derives a budget band from historical average unit spend, then scores candidates by closeness to band center, affordability, and rating.
def budget_score(price, avg_price, max_price, rating):
closeness = 1.0 - min(abs(price - avg_price) / max(avg_price, 1e-9), 1.0)
affordability = 1.0 - min(price / max(max_price, 1.0), 1.0)
rating_score = rating / 5.0
return closeness * 0.5 + affordability * 0.2 + rating_score * 0.3Endpoint: GET /api/recommend/new-arrivals
Question: What are the latest products most likely to match this user?
Blends recency (year normalization + ID proxy), category fit against user preferences, and rating quality.
def new_arrival_score(year_norm, id_norm, category_match, rating_score):
return year_norm * 1.8 + id_norm * 0.8 + category_match * 1.2 + rating_score * 0.6- Load product images from catalog
- Pass through pretrained ResNet50 (classification head removed)
- Extract 2048-d penultimate layer activations
- Apply L2 normalization per vector
- Persist to:
data/processed/embeddings.npydata/processed/embedding_ids.json
macOS note: Run with
--num-workers 0to avoidtorch_shm_managerpermission errors.
- FastAPI — async, high-throughput API framework
- SQLAlchemy ORM — declarative models, session management
- Alembic — versioned schema migrations
- PostgreSQL + pgvector — relational store with native vector similarity search
- JWT — stateless auth via
python-jose
usersproducts— includesembedding vector(2048)orders/order_itemscart_itemsinteractions
On every cold start:
- Execute Alembic migrations (
alembic upgrade head) - Seed the database from CSV files if empty
- Sync PK sequences to prevent duplicate key errors post-bulk-seed
- Load embedding matrix into in-memory store
- Mount product images at
/imagesas static files
| Prefix | Responsibility |
|---|---|
/api/auth |
Login, register, token refresh |
/api/products |
Catalog browse and search |
/api/cart |
Cart CRUD |
/api/orders |
Order placement and history |
/api/interactions |
Like, bookmark, review |
/api/recommend |
All 10 recommendation endpoints |
- React + Vite — fast dev server and optimized production build
- Tailwind CSS — utility-first styling
- Zustand — lightweight global state
- Axios — HTTP client with JWT interceptor
- Home
- Catalog
- Product Detail
- Cart (slide-out panel)
- Checkout
- Order Success
- Dashboard
- Login / Register
| Tool | Version |
|---|---|
| Python | 3.12+ |
| Node.js | 20+ |
| Docker Desktop | Latest |
| Kaggle account | Required for dataset |
Create .env in project root:
POSTGRES_DB=stylesense
POSTGRES_USER=admin
POSTGRES_PASSWORD=admin123
DATABASE_URL=postgresql://admin:admin123@localhost:5432/stylesense
JWT_SECRET=change-this-to-a-64-char-random-string
VITE_API_URL=http://localhost:8000/api
CORS_ORIGINS=http://localhost:3000,http://localhost:5173Dataset: Fashion Images on Kaggle
Via Kaggle CLI (recommended):
pip install --upgrade kaggle
mkdir -p ~/.kaggle
# Place your kaggle.json token at ~/.kaggle/kaggle.json
chmod 600 ~/.kaggle/kaggle.json
mkdir -p data/raw_kaggle
kaggle datasets download -d vikashrajluhaniwal/fashion-images -p data/raw_kaggle --unzip
cp data/raw_kaggle/styles.csv data/fashion.csv# Python
python3 -m venv .venv
source .venv/bin/activate
pip install -r backend/requirements.txt
pip install torch torchvision pillow
# Node
cd frontend && npm install && cd ..source .venv/bin/activate
python3 scripts/clean_data.py \
--input-csv data/fashion.csv \
--data-dir data \
--output-csv data/processed/products_clean.csv
python3 scripts/simulate_users.py \
--products-csv data/processed/products_clean.csv \
--out-dir data/simulated
python3 scripts/generate_embeddings.py \
--products-csv data/processed/products_clean.csv \
--data-dir data \
--output-npy data/processed/embeddings.npy \
--ids-json data/processed/embedding_ids.json \
--num-workers 0Docker Compose (recommended):
docker compose up --build| Service | URL |
|---|---|
| Frontend | http://localhost:3000 |
| Backend | http://localhost:8000 |
| Database | localhost:5432 |
Local processes:
# DB only via Docker
docker compose up -d db
# Backend
source .venv/bin/activate && cd backend
uvicorn app.main:app --reload --port 8000
# Frontend
cd frontend && npm run dev# Health checks
curl http://localhost:8000/health
curl http://localhost:8000/api/recommend/health
# Login with seeded user
curl -X POST http://localhost:8000/api/auth/login \
-H "Content-Type: application/json" \
-d '{"email":"isha.mehta1@stylesense.com","password":"password123"}'
# Public recommendation
curl http://localhost:8000/api/recommend/trending
# Protected recommendation
curl http://localhost:8000/api/recommend/history \
-H "Authorization: Bearer <ACCESS_TOKEN>"cd backend && pytestpython3 scripts/validate_recommendations.py --base-url http://localhost:8000/apiValidation checks: history preference match, restock non-empty behavior, collaborative input exclusion, visual similarity threshold, cart add-ons cross-category distribution.
| Issue | Fix |
|---|---|
Cannot connect to Docker daemon |
Start Docker Desktop, then retry docker compose up --build |
TLS timeout pulling node:20-alpine |
Run docker pull node:20-alpine manually first |
| Duplicate order ID on checkout | Already fixed — sequence sync runs at startup in seed.py |
torch_shm_manager permission error (macOS) |
Use --num-workers 0 when running generate_embeddings.py |
Collaborative Filtering — recommends based on similar user behavior.
User A liked: [X, Y, Z] → Recommend Z to User B who liked X and Y
Content-Based Filtering — recommends items similar to what the user previously liked, based on item features.
Liked: "Inception" (Sci-Fi, Thriller) → Recommend "Interstellar"
Hybrid Filtering — combines both methods. Reduces cold-start problems, improves ranking quality. Used by Netflix, Spotify, Amazon.
Knowledge-Based — applies explicit user constraints and domain rules. Best for high-consideration purchases (cars, real estate).
Demographic-Based — uses profile attributes (age, gender, location) to segment recommendations.
| Type | Signal Source | Best Fit |
|---|---|---|
| Collaborative | Similar users | Streaming, social platforms |
| Content-Based | Item features | News, music, articles |
| Hybrid | Both | Large consumer platforms |
| Knowledge-Based | Rules + domain knowledge | Cars, real estate |
| Demographic-Based | User profile attributes | E-commerce segmentation |
- Netflix — hybrid (watch history + similar users + content signals)
- Amazon — collaborative + content + contextual signals
- Spotify — hybrid (listening behavior + audio features)
- YouTube — collaborative + content + freshness + trending layers
A single method doesn't hold up across all user states:
- New users have no history → cold-start problem
- Popular items dominate collaborative signals → diversity collapse
- Pure content similarity misses emerging trends
StyleSense combines behavior-driven methods (history, collaborative, restock, trending), attribute-driven methods (seasonal, budget, new arrivals), explicit intent signals (interaction-based, cart add-ons), and visual embedding similarity (alternatives) to handle all these failure modes.
| Theory | Implementation |
|---|---|
| Collaborative filtering | services/recommendation/collaborative.py |
| Content-based filtering | history.py, interaction_based.py |
| Embedding similarity | visual.py + ml/embeddings_store.py |
| Knowledge-like constraints | budget.py, seasonal.py |
| Hybrid surface | Multi-scenario tabs in frontend Dashboard |
Seeded test account:
email: isha.mehta1@stylesense.com
password: password123
Suggested demo flow:
- Login and inspect JWT token
- Hit
/api/recommend/trending— no auth required, quick signal check - Hit
/api/recommend/history— auth required, shows personalization - Open Product Detail → visual alternatives tab
- Add items to cart → cart add-ons surface
- Complete checkout → inspect Order History with item-level detail






