A clean, maintainable implementation of a RAG (Retrieval-Augmented Generation) service demonstrating object-oriented design principles and production-ready architecture patterns.
This project implements a layered architecture with clear separation of concerns:
βββββββββββββββββββββββββββββββββββββββββββ
β API Layer (api/) β
β - FastAPI endpoints β
β - Pydantic request/response models β
β - HTTP concerns only β
ββββββββββββββββββ¬βββββββββββββββββββββββββ
β
ββββββββββββββββββΌβββββββββββββββββββββββββ
β Workflow Layer (workflows/) β
β - RAG orchestration using LangGraph β
β - State management β
β - Coordinates services β
ββββββββββββββββββ¬βββββββββββββββββββββββββ
β
ββββββββββββββββββΌβββββββββββββββββββββββββ
β Services Layer (services/) β
β - EmbeddingService: text β vectors β
β - DocumentStore: Qdrant + fallback β
β - Business logic encapsulation β
βββββββββββββββββββββββββββββββββββββββββββ
.
βββ api/
β βββ __init__.py # API package exports
β βββ models.py # Pydantic schemas
β βββ routes.py # Endpoint handlers
βββ services/
β βββ __init__.py # Services package exports
β βββ embedding_service.py # Embedding logic
β βββ document_store.py # Storage logic (Qdrant + fallback)
βββ workflows/
β βββ __init__.py # Workflows package exports
β βββ rag_workflow.py # LangGraph orchestration
βββ config.py # Centralized configuration
βββ main.py # Application entry point
βββ .env.example # Environment variables template
βββ .gitignore # Git ignore rules
βββ notes.md # Refactoring design decisions
βββ README.md # This file
- Python 3.8+
- (Optional) Qdrant instance running locally
- Clone the repository
git clone https://github.com/wafiyanwarul/associate-ai-engineer-test.git
cd associate-ai-engineer-test- Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies
pip install fastapi uvicorn pydantic qdrant-client langgraph- Configure environment (optional)
cp .env.example .env
# Edit .env if needed (defaults work fine for local development)- Run the application
uvicorn main:app --reloadThe API will be available at http://127.0.0.1:8000
POST /add
Add a document to the knowledge base.
curl -X POST http://127.0.0.1:8000/add \
-H "Content-Type: application/json" \
-d '{"text":"LangGraph is awesome for workflows"}'Response:
{
"id": 0,
"status": "added"
}POST /ask
Query the RAG system.
curl -X POST http://127.0.0.1:8000/ask \
-H "Content-Type: application/json" \
-d '{"question":"what is langgraph?"}'Response:
{
"question": "what is langgraph?",
"answer": "I found this: 'LangGraph is awesome for workflows'",
"context_used": [
"LangGraph is awesome for workflows"
],
"latency_sec": 0.023
}GET /status
Check system health and configuration.
curl http://127.0.0.1:8000/statusResponse:
{
"qdrant_ready": false,
"storage_type": "in-memory",
"document_count": 1,
"graph_ready": true
}GET /docs
Interactive API documentation (Swagger UI) available at http://127.0.0.1:8000/docs
Configuration is managed through config.py and can be customized via environment variables:
| Variable | Default | Description |
|---|---|---|
QDRANT_URL |
http://localhost:6333 |
Qdrant server URL |
QDRANT_COLLECTION |
demo_collection |
Collection name in Qdrant |
EMBEDDING_DIMENSION |
128 |
Vector embedding dimension |
SEARCH_LIMIT |
2 |
Max documents returned per search |
Each layer has a single, well-defined responsibility:
- API layer handles HTTP
- Workflow layer orchestrates operations
- Services layer implements business logic
Dependencies are explicitly passed through constructors, making the code testable and the dependency graph clear.
If Qdrant is unavailable, the system automatically falls back to in-memory storage without crashing.
All environment-specific values are centralized and can be changed without modifying code.
The architecture supports easy unit testing:
# Example: Testing EmbeddingService independently
from services import EmbeddingService
def test_embedding_dimension():
service = EmbeddingService(dimension=64)
result = service.embed("test")
assert len(result) == 64
# Example: Testing DocumentStore with mock Qdrant
from services import DocumentStore
def test_document_store_fallback():
# Force fallback by using invalid URL
store = DocumentStore(qdrant_url="http://invalid:9999")
assert not store.using_qdrant
# Should still work with in-memory storage
success = store.add_document(0, "test", [0.1] * 128)
assert success| Aspect | Before | After |
|---|---|---|
| Structure | Single 100-line file | Modular 4-layer architecture |
| Configuration | Hardcoded values | Centralized config with env support |
| Dependencies | Global state | Explicit dependency injection |
| Testability | Difficult (global state) | Easy (isolated components) |
| Maintainability | Mixed concerns | Clear separation of concerns |
| Error Handling | Basic try-catch | Graceful degradation + clear error messages |
| Documentation | Minimal | Comprehensive (docstrings + README) |
While this remains a demo with fake embeddings, the architecture is production-ready:
- β Scalable: Each layer can be scaled independently
- β Maintainable: Clear structure for team development
- β Testable: Components can be unit tested in isolation
- β Flexible: Easy to swap implementations (e.g., real embedding models)
- β Observable: Structured logging and error handling
- β Configurable: Environment-based configuration
To deploy this to production with real AI capabilities:
- Replace fake embeddings: Swap
EmbeddingServicewith real model (e.g.,sentence-transformers) - Add authentication: Implement API key or OAuth
- Add persistence: Configure Qdrant with persistent storage
- Add monitoring: Integrate Prometheus/Grafana
- Add rate limiting: Prevent abuse
- Add caching: Cache frequent queries
- Add comprehensive tests: Unit, integration, and E2E tests
- Design decisions: See
notes.mdfor detailed explanation of architectural choices - API documentation: Visit
/docsendpoint for interactive API explorer
This is a technical assessment project. For production use, consider:
- Adding proper error handling for edge cases
- Implementing comprehensive test coverage
- Adding monitoring and observability
- Using production-grade embedding models
This is a demo project for educational purposes.