Video-to-3D scene compiler with live AI-driven agent simulation. Upload a short video of any indoor space and watch it come alive as an interactive 3D scene populated with autonomous agents.
Video Upload β Gemini Analysis β ADK Compile Pipeline β CompiledScenePackage JSON β Three.js Simulation
- Upload a short video (up to 20s) of an indoor space
- Gemini 3.1 Pro analyzes the video β detects objects, people, zones, spatial layout
- ADK pipeline compiles the analysis into a structured 3D scene with agents, furniture, navigation graphs
- Three.js frontend renders the scene and runs a live simulation with utility-based AI agents
The server is stateless at runtime β the browser owns all live world state.
| Layer | Choice |
|---|---|
| Frontend | Vite + React 19 + Three.js r183 + R3F v9.5 |
| State | Zustand |
| 3D Renderer | WebGPURenderer (auto-fallback to WebGL 2) |
| UI | shadcn/ui + Tailwind CSS |
| Backend | Node.js + TypeScript |
| AI | Gemini 3.1 Pro/Flash-Lite via @google/genai |
| Orchestration | @google/adk v0.4.0 |
| Validation | Zod |
| Testing | Vitest |
- Node.js 20+
- A Google AI API key (paid account)
# Clone and install
git clone <repo-url> && cd next-state
npm install
# Configure environment
cp .env.example .env # or create .env manuallyAdd your API key to .env:
GEMINI_API_KEY=<your-key>
PORT=3001
NODE_ENV=development
CORS_ORIGIN=http://localhost:5173# Terminal 1 β Backend (port 3001)
cd server && npm run dev
# Terminal 2 β Frontend (port 5173)
cd client && npm run devOpen http://localhost:5173 and upload a video.
client/ React + Three.js frontend
src/
scene/ 3D rendering (agents, environment, furniture, labels)
store/ Zustand state management
components/ UI panels (inspector, debug overlay, upload)
server/ Node.js API server
src/
adk/ Gemini ADK compile pipeline
agents/ Structuring, style extraction
prompts/ LLM prompt templates
shared/ Shared TypeScript types and Zod schemas
- Procedural 3D agents with animated walking, sitting, talking, fidgeting
- Agent props β laptops, phones, cups rendered on agent bodies
- Thought bubbles β pop up when agents change their intent
- Click-to-inspect β click any agent to see their mind state, traits, goals
- Furniture inference β deterministic infill adds chairs, tables, and venue-appropriate objects
- Density-aware population β scenes auto-populate with synthetic agents based on crowd density
- Style extraction β colors, materials, and lighting matched from the source video
- Utility-based AI β agents use softmax sampling over utility scores for natural behavior
| Method | Path | Description |
|---|---|---|
| POST | /api/upload-video |
Upload video to Gemini Files API |
| POST | /api/compile-scene |
Start ADK compile pipeline |
| GET | /api/compile-progress/:jobId |
SSE stream of compile steps |
| GET | /api/scene/:sceneId |
Full CompiledScenePackage JSON |
| POST | /api/agent-refresh |
Sparse cognitive update |
| POST | /api/intervention |
World-state mutation |
cd server && npm run test
cd client && npm run testPrivate β all rights reserved.