{"data":{"kind":"file","path":"README.md","version_id":"jawq0grwz8gwmhebnt03p8xj","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2186,"modified_at":"2026-05-11T13:37:16.451000","content_hash":"499815907cc694d3bc7ab2f6865f296ccdb9edee1263ecf0354176be75cdf423"},"entries":[],"content":"# DevOps Troubleshoot\n\nA multi-turn DevOps troubleshooting environment that tests infrastructure problem-solving skills with simulated diagnostic tools.\n\n## Overview\n\nThis environment presents real-world DevOps scenarios including Docker crashes, Kubernetes failures, DNS issues, disk problems, memory leaks, SSL errors, and more. The model must use diagnostic tools to investigate the issue, identify the root cause, and provide actionable fix steps.\n\n## Features\n\n- **15+ infrastructure problems** across Docker, Kubernetes, DNS, disk, memory, SSL, monitoring, and more\n- **3 diagnostic tools**: `check_logs`, `check_config`, `check_metrics` — all pattern-based (no subprocess)\n- **Multi-turn interaction** with up to 5 turns for investigation\n- **Weighted reward functions**: keyword matching (30%), root cause accuracy (40%), fix completeness (30%)\n\n## Tools\n\n| Tool | Description |\n|------|-------------|\n| `check_logs(service, log_type)` | Returns simulated log snippets matching common error patterns |\n| `check_config(service)` | Returns configuration analysis for known services |\n| `check_metrics(resource)` | Returns system metrics (CPU, memory, disk, network) |\n\n## Reward Functions\n\n1. **keyword_reward** (0.3) — Checks diagnostic keywords from the problem appear in the response\n2. **root_cause_reward** (0.4) — Validates the identified root cause matches the problem category\n3. **fix_completeness_reward** (0.3) — Checks for structured fix steps and tool usage\n\n## Installation\n\n```bash\nprime env install devops-troubleshoot\n```\n\n## Evaluation\n\n```bash\nprime eval run devops-troubleshoot -m gpt-4.1-mini\n```\n\n## Categories\n\n- Docker issues (OOM, build failures)\n- Kubernetes crashes (CrashLoopBackOff, probe failures)\n- DNS failures (NXDOMAIN, propagation)\n- Disk problems (full disk, deleted files)\n- Memory leaks (heap, GC, cache)\n- SSL errors (expired certificates)\n- Web server timeouts (Nginx proxy)\n- Database connections (pool exhaustion)\n- Cache issues (Redis eviction)\n- Monitoring (Prometheus scraping)\n- CI/CD (Jenkins, Docker push)\n- Infrastructure-as-Code (Terraform state)\n\n## Tags\n\n`multi-turn` `devops` `tool-use` `infrastructure` `train` `eval`\n","encoding":"utf-8","truncated":false,"total_bytes":2186},"status":null}