Advanced Data Analysis Toolkit

Overview

The Advanced Data Analysis Toolkit is a powerful solution for analyzing and visualizing relational databases. It allows users to connect to databases, explore schemas, perform in-depth data analysis, and generate reusable reports. The toolkit is built with advanced Python techniques and modern development practices for scalability, maintainability, and ease of use.

Features

Reports

Reports are composed of two main sections:

Generic Section

Schema Diagram: Visualizes tables, primary keys, foreign keys, and their relationships.
Data Quality Overview: Summarizes missing values, null distributions, and column completeness.
Descriptive Statistics: Provides ranges, percentiles, distributions, and outlier detection for numerical fields.

Dynamic Section

Prompt-Driven Queries: User prompts are transformed into SQL queries using a Retrieval-Augmented Generation (RAG) model.
- Includes a random question generator for exploratory analysis.
Result Tables: Displays outputs from executed queries.
Visualizations: Presents graphical representations of query results.

Example dynamic analysis questions for the Chinook database:

How many employees are there in each age group?
Which are the top 10 most frequently used genres (with usage counts)?
Which genre has generated the highest total revenue?
What are the total revenues and number of tracks sold for each genre (subplots/bar plots with two axes)?

Supported Databases

Currently, the toolkit supports:

SQLite

Database type and connection URL can be configured in the settings.

Visualizations & Exporting

Diagrams and result tables can be expanded into preview windows for enhanced visibility. The preview supports zooming, panning, and uses SVG rendering for high-quality graphics.

All images and tables can be exported as SVG or CSV files.

Usage

Using `just`

The justfile is used for task automation. Key tasks include:

Linting:
- just lint: Runs code style checks using ruff, black, mypy, vulture, and pip-audit.
- just lint-fix: Automatically fixes code style issues.
- just lint-full: Runs both lint-fix and lint.
Testing:
- just test *args: Runs tests with optional arguments.
- just coverage: Generates a coverage report.
Serving the Application:
- just serve: Starts the server in production mode.
- just serve-dev: Starts the server in development mode with auto-reload.
- just serve-presentation: Starts the server with presentation settings (optimized for generative models).
Fetching Sample Data:
- just fetch-chinook: Downloads and sets up the Chinook sample database.

Using Docker Compose

The toolkit can be deployed using Docker Compose. Application can be started in three modes:

docker compose up toolkit-server
docker compose up toolkit-dev
docker compose up toolkit-presentation

Which correspond to the serve, serve-dev, and serve-presentation just commands respectively.

All the commands will launch the server and make it accessible at the configured port.

Running the CLI

The toolkit can also be run from the command line. Example usage:

python ./src/driver.py --help

Logging and Verbosity

The CLI supports standard logging and verbosity flags. Use -v to increase verbosity:

No -v: warnings and errors only
-v: info messages
-vv or more: debug messages

Architecture

Module Dependency Graphs

For the sake of clarity, the dependencies related to logging have been omitted from the graph.

Frontend-Backend Interaction

The frontend and backend communicate via a RESTful API. The frontend sends requests to the backend for data retrieval, report generation, and other operations. The backend processes these requests, interacts with the database, and returns the results to the frontend for display. Frontend components are hosted as static files served by the backend.

Advanced Python Techniques

This project employs several advanced Python techniques and tools:

Unit Testing: Comprehensive test coverage using pytest.
Pytest Fixtures: Reusable, isolated test setup using @pytest.fixture.
Mocking: Controlled dependency isolation with pytest monkeypatching.
Dataclasses: Structured, lightweight data containers using DataClass.
Type Hinting: Explicit static typing with Python type annotations to improve readability and tooling support.
Exception Handling: Explicit and robust error handling to ensure reliability.
Separation of Concerns: Clear boundaries between API access, retry logic, and application logic.
Configuration via Environment Variables: Runtime configuration using environment variables and .env files.
Logging: Structured logging for debugging and operational visibility.
Python Environment Management: Dependency and virtual environment management using uv.
Containerization: Application containerization using Docker and Docker Compose for reproducible environments.
CI Pipeline: Automated linting checks executed as part of a continuous integration pipeline.

Development Environment

The development environment for the Advanced Data Analysis Toolkit is designed to ensure consistency, ease of use, and scalability. Below are the key components and tools used in the development process:

Package Management with `uv`

The project uses uv for package management, which provides:

Isolated Environments: Each project has its own virtual environment to avoid dependency conflicts.
Dependency Caching: Speeds up installation by caching dependencies.
Reproducible Builds: Ensures that the same dependencies are installed across different environments.

Dev Containers

The project includes a .devcontainer configuration, which provides:

Pre-installed Dependencies: Ensures all required libraries and tools are available.
Docker-based Isolation: Guarantees a consistent environment across different machines.
Simplified Onboarding: New contributors can quickly set up their environment.

Justfile

The justfile is used for task automation, allowing developers to run common tasks such as linting, testing, and serving the application with simple commands. This streamlines the development workflow and reduces the potential for errors. The available commands were detailed in the "Usage" section above.

Dockerfile

The Dockerfile defines a multi-stage build process:

Builder Stage:
- Uses the uv image for dependency management and bytecode compilation.
- Installs necessary system dependencies like libcairo2 and graphviz.
- Caches dependencies for faster builds.
Runtime Stage:
- Uses a slim Python image for a lightweight runtime environment.
- Copies the application and dependencies from the builder stage.
- Exposes port 8000 for the application.

Linting and Formatting

ruff: Ensures code adheres to style guidelines.
black: Automatically formats code for consistency.
mypy: Performs static type checking.
vulture: Identifies unused code.
pip-audit: Checks for vulnerabilities in dependencies.

Version Control

Git Hooks: Pre-commit hooks are used for linting and testing.
Pull Requests: All changes are introduced via pull requests, requiring approval from at least one other team member.
Copilot Auto-Reviews: GitHub Copilot is configured to provide automated code reviews, assisting in maintaining code quality and consistency.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
config		config
docs		docs
img		img
sketches		sketches
src		src
tests		tests
.example.env		.example.env
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
justfile		justfile
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advanced Data Analysis Toolkit

Overview

Features

Reports

Generic Section

Dynamic Section

Supported Databases

Visualizations & Exporting

Usage

Using `just`

Using Docker Compose

Running the CLI

Logging and Verbosity

Architecture

Module Dependency Graphs

Frontend-Backend Interaction

Advanced Python Techniques

Development Environment

Package Management with `uv`

Dev Containers

Justfile

Dockerfile

Linting and Formatting

Version Control

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Advanced Data Analysis Toolkit

Overview

Features

Reports

Generic Section

Dynamic Section

Supported Databases

Visualizations & Exporting

Usage

Using just

Using Docker Compose

Running the CLI

Logging and Verbosity

Architecture

Module Dependency Graphs

Frontend-Backend Interaction

Advanced Python Techniques

Development Environment

Package Management with uv

Dev Containers

Justfile

Dockerfile

Linting and Formatting

Version Control

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Using `just`

Package Management with `uv`

Packages