{"data":{"kind":"file","path":"README.md","version_id":"oc9pe6gwcc954ezyh49awaiw","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2566,"modified_at":"2026-04-18T00:48:13.492000","content_hash":"12539bbc0fc14567b1c7743095f38966e3402bba8c95b338765f87cd3a52e389"},"entries":[],"content":"# FrontierSWE\n\nFrontierSWE is a benchmark for software engineering tasks at the edge of human ability.\n\nSee the [leaderboard](https://www.frontierswe.com) for current results.\n\nEach task provides a Docker image, a verification harness, and a reward\nfunction. Agents run inside Modal sandboxes.\n\nThe current environment drives the sandbox with **Qwen Code** over an\nOpenAI-compatible tunnel. This avoids the empty-assistant regression we hit with\nnewer OpenCode/Codex CLIs, which now speak the OpenAI `/responses` API while\n`verifiers` interception still expects `/chat/completions`.\n\n## Package layout\n\nThe environment entrypoint is `frontier_swe.py` (import target: `frontier_swe`).\n`modal_sandbox.py` provides the Modal sandbox client used at runtime.\nTask definitions live under `tasks/`.\n\n## Required auth / environment variables\n\nEvals need these credentials available to the process:\n\n- **Modal auth**: configure Modal with `modal setup`, or export `MODAL_TOKEN_ID` and `MODAL_TOKEN_SECRET`.\n- **Task-specific secrets**: `frogsgame-rl` requires `TINKER_API_KEY`.\n- **Model provider auth**: if you are not using Prime Inference / `prime login`, export the normal provider key for your chosen model, such as `OPENAI_API_KEY` or `ANTHROPIC_API_KEY`.\n\nThe task images under `ghcr.io/proximal-labs/frontier-swe/...` are public. The Modal integration imports them with `add_python=\"3.12\"` because the base task images do not provide a default `python` binary for Modal's build step.\n\n## Networking note\n\n`frontier_swe` proxies model calls through a public Prime tunnel URL. That means\nsandbox egress must stay enabled even for tasks whose original Harbor config has\n`allow_internet = false`; otherwise the agent cannot reach the inference proxy\nand the rollout stalls before any assistant message is recorded.\n\n## Running the eval\n\nRun a single rollout against the published upstream environment:\n\n```bash\nprime eval run proximal/frontier-swe \\\n    --model anthropic/claude-sonnet-4.6 \\\n    --num-examples 1 \\\n    --rollouts-per-example 1 \\\n    -t 8192 \\\n    --max-concurrent 1 \\\n    --env-args '{\"tasks\":\"git-to-zig\"}'\n```\n\nRun the full task suite with 5 rollouts per example:\n\n```bash\nprime eval run proximal/frontier-swe \\\n    --model anthropic/claude-sonnet-4.6 \\\n    --rollouts-per-example 5 \\\n    -t 8192\n```\n\nFor hosted smoke checks with Prime Inference, `openai/gpt-4.1-mini` now starts\ncleanly at both `--max-tokens 2048` and `--max-tokens 8192`. The earlier empty\nassistant transcript was a harness/proxy compatibility problem, not a\n`max_tokens` problem.\n","encoding":"utf-8","truncated":false,"total_bytes":2566},"status":null}