A clean, enterprise-ready AI agent for troubleshooting Kubernetes and Istio issues. Designed to be LLM-agnostic and work with your internal infrastructure.
- Contributing Guide β How to contribute to the project
- Changelog β Version history and release notes
- Local Setup & Examples β Example configs and scripts for running with HuggingFace and other LLMs locally
git clone <repository>
cd k8s-istio-agent
pip install -r requirements.txt# Check system requirements for HuggingFace
python huggingface_example.py --check
# Create configuration
cp config.yaml.example config.yaml
# Edit config.yaml with your LLM provider settings# Interactive CLI
python main.py interactive
# Single query
python main.py query "My pods are not starting"
# Web interface
python main.py webβββββββββββββββββββββββββββββββββββββββββββ
β User Interface β
β (CLI, Web UI, API, Slack Bot) β
βββββββββββββββ¬ββββββββββββββββββββββββββββ
β
βββββββββββββββΌββββββββββββββββββββββββββββ
β Agent Controller β
β (Task routing, context management) β
βββββββββββββββ¬ββββββββββββββββββββββββββββ
β
βββββββββββββββΌβββββββββββββββββββββββββββ
β LLM Abstraction Layer β
β (Provider-agnostic interface) β
βββββ¬ββββββββββ¬ββββββββββ¬ββββββββββ¬βββββββ
β β β β
βββββΌββββ βββββΌβββββ ββββΌββββ βββββΌβββββββ
βOpenAI β βHugging β βAzure β β Internal β
β API β β Face β βOpenAIβ β Wrappers β
βββββββββ ββββββββββ ββββββββ ββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββ
β Tool Registry β
β (Kubernetes, Istio, Observability) β
βββββββββββββββ¬ββββββββββββββββββββββββββββ
β
βββββββββββββββΌββββββββββββββββββββββββββββ
β Execution Engine β
β (Safe command execution, context) β
βββββββββββββββββββββββββββββββββββββββββββ
llm:
provider: "huggingface"
config:
model_name: "microsoft/DialoGPT-medium"
device: "auto"
load_in_8bit: true # Memory optimization
max_length: 1024
temperature: 0.7llm:
provider: "openai"
config:
api_key: "${OPENAI_API_KEY}"
model: "gpt-4"llm:
provider: "azure"
config:
azure_endpoint: "https://your-resource.openai.azure.com/"
api_key: "${AZURE_OPENAI_API_KEY}"
api_version: "2023-12-01-preview"
deployment_name: "gpt-4"llm:
provider: "internal"
config:
endpoint: "https://internal-llm.company.com"
api_key: "${INTERNAL_API_KEY}"
model: "company-gpt-4"- kubectl: Safe kubectl commands (get, describe, logs, etc.)
- logs: Advanced log analysis with pattern matching
- resources: Resource usage and health monitoring
- istio_proxy: Proxy status and configuration analysis
- istio_config: Configuration validation and analysis
- traffic: Traffic flow and routing analysis
- prometheus: Metrics querying and analysis
- grafana: Dashboard integration (coming soon)
π€ What can I help you troubleshoot?
My frontend pods are not starting
π I'll investigate the pod startup issues. Let me check the current status:
kubectl get pods -n frontend
β
Found pods in ImagePullBackOff status. Let me get more details:
kubectl describe pod frontend-deployment-abc123 -n frontend
π Root cause: Image pull authentication failure
π‘ Solution: Configure imagePullSecrets for private registry access
π€ What can I help you troubleshoot?
Traffic between services is failing intermittently
π Let me check your Istio service mesh configuration:
istio_proxy status
β
Found stale EDS configuration on backend service
π Checking traffic patterns:
prometheus query 'istio_request_total{destination_service_name="backend"}'
π‘ Recommendation: Restart affected pods to refresh Envoy configuration
python main.py interactive --config config_dev.yaml# Build and deploy
docker build -t k8s-istio-agent:latest .
kubectl apply -f k8s/az container create \
--resource-group myResourceGroup \
--name k8s-agent \
--image k8s-istio-agent:latest \
--environment-variables LLM_API_KEY=$API_KEY- RBAC: Minimal required permissions
- Read-only Operations: No destructive commands
- Safe Command Filtering: Whitelist of allowed operations
- Secret Management: Secure credential handling
- Non-root Container: Security-hardened deployment
# 1. User reports issue
user_query = "My application has high latency"
# 2. Agent investigates systematically
agent.process_query(user_query)
# - Checks pod status and resource usage
# - Analyzes application logs for errors
# - Reviews Istio metrics for service mesh issues
# - Examines Prometheus metrics for performance bottlenecks
# 3. Provides actionable recommendations
# - Identifies resource constraints
# - Suggests configuration optimizations
# - Recommends scaling strategies- β No Autogen Dependency: Clean, purpose-built architecture
- β Enterprise Ready: Works with internal LLM wrappers
- β LLM Flexibility: Easy provider swapping
- β Focused Scope: Kubernetes/Istio specific
- β Simple Codebase: Easy to understand and extend
- β Systematic Approach: Follows best practice workflows
- β Knowledge Retention: Learns from patterns
- β Faster Resolution: Automated investigation steps
- β Comprehensive Analysis: Multi-tool correlation
k8s-istio-agent/
βββ main.py # Entry point
βββ requirements.txt # Dependencies
βββ config.yaml # Configuration
βββ Dockerfile # Container build
βββ llm/ # LLM providers
β βββ provider.py # Base abstractions
β βββ providers/ # Specific implementations
βββ tools/ # Tool system
β βββ base.py # Tool abstractions
β βββ registry.py # Tool management
β βββ kubernetes/ # K8s tools
β βββ istio/ # Istio tools
β βββ observability/ # Monitoring tools
βββ agent/ # Agent logic
β βββ controller.py # Main orchestration
βββ web/ # Web interface
β βββ app.py # FastAPI application
β βββ templates/ # HTML templates
βββ k8s/ # Kubernetes manifests
β βββ deployment.yaml # FastAPI application
β βββ service.yaml # HTML templates
β βββ rbac.yaml # HTML templates
βββ tests/ # Test suite
βββ __init__.py # Test package
βββ conftest.py # Test configuration
βββ unit/ # Unit tests
β βββ test_llm.py # LLM provider tests
β βββ test_tools.py # Tool tests
β βββ test_agent.py # Agent logic tests
βββ integration/ # Integration tests
β βββ test_k8s.py # Kubernetes integration
β βββ test_istio.py # Istio integration
β βββ test_web.py # Web interface tests
βββ e2e/ # End-to-end tests
βββ test_flows.py # Common workflows
βββ test_scenarios.py # Real-world scenarios
llm:
provider: "huggingface"
config:
model_name: "microsoft/DialoGPT-small"
device: "cpu"
torch_dtype: "float32"llm:
provider: "huggingface"
config:
model_name: "microsoft/DialoGPT-medium"
device: "auto"
load_in_8bit: truellm:
provider: "internal"
config:
endpoint: "https://langchain-api.company.com"
api_key: "${COMPANY_LLM_KEY}"az aks create \
--resource-group myResourceGroup \
--name myAKSCluster \
--node-count 3 \
--enable-addons monitoring \
--generate-ssh-keysistioctl install --set values.defaultRevision=default
kubectl label namespace default istio-injection=enabledkubectl apply -f k8s/kubectl port-forward svc/k8s-istio-agent 8080:80 -n troubleshooting
# Open http://localhost:8080- Add New Tools: Extend
tools/directory - New LLM Providers: Add to
llm/providers/ - Enhanced Workflows: Modify
agent/controller.py - UI Improvements: Update
web/components
Create a PR and we will review it.