Skip to content

atresonia/k8s-istio-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Kubernetes & Istio AI Troubleshooting Agent

A clean, enterprise-ready AI agent for troubleshooting Kubernetes and Istio issues. Designed to be LLM-agnostic and work with your internal infrastructure.

Documentation

Quick Start

1. Installation

git clone <repository>
cd k8s-istio-agent
pip install -r requirements.txt

2. Configuration

# Check system requirements for HuggingFace
python huggingface_example.py --check

# Create configuration
cp config.yaml.example config.yaml
# Edit config.yaml with your LLM provider settings

3. Run Locally

# Interactive CLI
python main.py interactive

# Single query
python main.py query "My pods are not starting"

# Web interface
python main.py web

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              User Interface             β”‚
β”‚    (CLI, Web UI, API, Slack Bot)        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           Agent Controller              β”‚
β”‚   (Task routing, context management)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚          LLM Abstraction Layer         β”‚
β”‚     (Provider-agnostic interface)      β”‚
β””β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
    β”‚         β”‚         β”‚         β”‚
β”Œβ”€β”€β”€β–Όβ”€β”€β”€β” β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β–Όβ”€β”€β”€β” β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
β”‚OpenAI β”‚ β”‚Hugging β”‚ β”‚Azure β”‚ β”‚ Internal β”‚
β”‚ API   β”‚ β”‚ Face   β”‚ β”‚OpenAIβ”‚ β”‚ Wrappers β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚            Tool Registry                β”‚
β”‚    (Kubernetes, Istio, Observability)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚          Execution Engine               β”‚
β”‚    (Safe command execution, context)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

LLM Provider Support

HuggingFace (Local Models)

llm:
  provider: "huggingface"
  config:
    model_name: "microsoft/DialoGPT-medium"
    device: "auto"
    load_in_8bit: true  # Memory optimization
    max_length: 1024
    temperature: 0.7

OpenAI

llm:
  provider: "openai"
  config:
    api_key: "${OPENAI_API_KEY}"
    model: "gpt-4"

Azure OpenAI

llm:
  provider: "azure"
  config:
    azure_endpoint: "https://your-resource.openai.azure.com/"
    api_key: "${AZURE_OPENAI_API_KEY}"
    api_version: "2023-12-01-preview"
    deployment_name: "gpt-4"

Internal/Enterprise

llm:
  provider: "internal"
  config:
    endpoint: "https://internal-llm.company.com"
    api_key: "${INTERNAL_API_KEY}"
    model: "company-gpt-4"

πŸ› οΈ Available Tools

Kubernetes Tools

  • kubectl: Safe kubectl commands (get, describe, logs, etc.)
  • logs: Advanced log analysis with pattern matching
  • resources: Resource usage and health monitoring

Istio Tools

  • istio_proxy: Proxy status and configuration analysis
  • istio_config: Configuration validation and analysis
  • traffic: Traffic flow and routing analysis

Observability Tools (placeholder, not yet implemented)

  • prometheus: Metrics querying and analysis
  • grafana: Dashboard integration (coming soon)

πŸ“‹ Use Cases

Pod Troubleshooting

πŸ€– What can I help you troubleshoot? 
My frontend pods are not starting

πŸ” I'll investigate the pod startup issues. Let me check the current status:

kubectl get pods -n frontend

βœ… Found pods in ImagePullBackOff status. Let me get more details:

kubectl describe pod frontend-deployment-abc123 -n frontend

πŸ” Root cause: Image pull authentication failure
πŸ’‘ Solution: Configure imagePullSecrets for private registry access

Service Mesh Issues

πŸ€– What can I help you troubleshoot?
Traffic between services is failing intermittently

πŸ” Let me check your Istio service mesh configuration:

istio_proxy status

βœ… Found stale EDS configuration on backend service
πŸ” Checking traffic patterns:

prometheus query 'istio_request_total{destination_service_name="backend"}'

πŸ’‘ Recommendation: Restart affected pods to refresh Envoy configuration

Deployment Options

Option 1: Local Development

python main.py interactive --config config_dev.yaml

Option 2: In-Cluster Deployment

# Build and deploy
docker build -t k8s-istio-agent:latest .
kubectl apply -f k8s/

Option 3: Azure Container Instance

az container create \
  --resource-group myResourceGroup \
  --name k8s-agent \
  --image k8s-istio-agent:latest \
  --environment-variables LLM_API_KEY=$API_KEY

Security Features

  • RBAC: Minimal required permissions
  • Read-only Operations: No destructive commands
  • Safe Command Filtering: Whitelist of allowed operations
  • Secret Management: Secure credential handling
  • Non-root Container: Security-hardened deployment

πŸ“Š Example Troubleshooting Workflow

# 1. User reports issue
user_query = "My application has high latency"

# 2. Agent investigates systematically
agent.process_query(user_query)
# - Checks pod status and resource usage
# - Analyzes application logs for errors
# - Reviews Istio metrics for service mesh issues
# - Examines Prometheus metrics for performance bottlenecks

# 3. Provides actionable recommendations
# - Identifies resource constraints
# - Suggests configuration optimizations
# - Recommends scaling strategies

Key Advantages

vs. kagent

  • βœ… No Autogen Dependency: Clean, purpose-built architecture
  • βœ… Enterprise Ready: Works with internal LLM wrappers
  • βœ… LLM Flexibility: Easy provider swapping
  • βœ… Focused Scope: Kubernetes/Istio specific
  • βœ… Simple Codebase: Easy to understand and extend

vs. Manual Troubleshooting

  • βœ… Systematic Approach: Follows best practice workflows
  • βœ… Knowledge Retention: Learns from patterns
  • βœ… Faster Resolution: Automated investigation steps
  • βœ… Comprehensive Analysis: Multi-tool correlation

πŸ“ Project Structure

k8s-istio-agent/
β”œβ”€β”€ main.py                 # Entry point
β”œβ”€β”€ requirements.txt        # Dependencies
β”œβ”€β”€ config.yaml            # Configuration
β”œβ”€β”€ Dockerfile             # Container build
β”œβ”€β”€ llm/                   # LLM providers
β”‚   β”œβ”€β”€ provider.py        # Base abstractions
β”‚   └── providers/         # Specific implementations
β”œβ”€β”€ tools/                 # Tool system
β”‚   β”œβ”€β”€ base.py           # Tool abstractions
β”‚   β”œβ”€β”€ registry.py       # Tool management
β”‚   β”œβ”€β”€ kubernetes/       # K8s tools
β”‚   β”œβ”€β”€ istio/           # Istio tools
β”‚   └── observability/   # Monitoring tools
β”œβ”€β”€ agent/               # Agent logic
β”‚   └── controller.py    # Main orchestration
β”œβ”€β”€ web/                # Web interface
β”‚   β”œβ”€β”€ app.py         # FastAPI application
β”‚   └── templates/     # HTML templates
└── k8s/               # Kubernetes manifests
β”‚   β”œβ”€β”€ deployment.yaml         # FastAPI application
β”‚   β”œβ”€β”€ service.yaml     # HTML templates
β”‚   └── rbac.yaml     # HTML templates
└── tests/               # Test suite
    β”œβ”€β”€ __init__.py     # Test package
    β”œβ”€β”€ conftest.py     # Test configuration
    β”œβ”€β”€ unit/           # Unit tests
    β”‚   β”œβ”€β”€ test_llm.py        # LLM provider tests
    β”‚   β”œβ”€β”€ test_tools.py      # Tool tests
    β”‚   └── test_agent.py      # Agent logic tests
    β”œβ”€β”€ integration/    # Integration tests
    β”‚   β”œβ”€β”€ test_k8s.py        # Kubernetes integration
    β”‚   β”œβ”€β”€ test_istio.py      # Istio integration
    β”‚   └── test_web.py        # Web interface tests
    └── e2e/           # End-to-end tests
        β”œβ”€β”€ test_flows.py      # Common workflows
        └── test_scenarios.py  # Real-world scenarios



Configuration Examples

Lightweight (CPU-only)

llm:
  provider: "huggingface"
  config:
    model_name: "microsoft/DialoGPT-small"
    device: "cpu"
    torch_dtype: "float32"

Production (Quantized)

llm:
  provider: "huggingface"
  config:
    model_name: "microsoft/DialoGPT-medium"
    device: "auto"
    load_in_8bit: true

Enterprise (Internal)

llm:
  provider: "internal"
  config:
    endpoint: "https://langchain-api.company.com"
    api_key: "${COMPANY_LLM_KEY}"

Azure POC Setup

1. Create AKS Cluster

az aks create \
  --resource-group myResourceGroup \
  --name myAKSCluster \
  --node-count 3 \
  --enable-addons monitoring \
  --generate-ssh-keys

2. Install Istio

istioctl install --set values.defaultRevision=default
kubectl label namespace default istio-injection=enabled

3. Deploy Agent

kubectl apply -f k8s/

4. Access Web Interface

kubectl port-forward svc/k8s-istio-agent 8080:80 -n troubleshooting
# Open http://localhost:8080

Contributing

  1. Add New Tools: Extend tools/ directory
  2. New LLM Providers: Add to llm/providers/
  3. Enhanced Workflows: Modify agent/controller.py
  4. UI Improvements: Update web/ components

Create a PR and we will review it.

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors