{"data":{"kind":"file","path":"README.md","version_id":"vbf38cb1nvufyuomlsu9sra3","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":4379,"modified_at":"2026-02-07T15:07:24.728000","content_hash":"c3e1e60f3d607416ff1de9f479abb2f69032ee293181ad3f4db2a1b3eda7fc06"},"entries":[],"content":"# nsa-codebreaker\n\n### Overview\n- **Environment ID**: `nsa-codebreaker`\n- **Short description**: NSA Codebreaker Challenge Eval\n- **Tags**: ctf, security, reverse-engineering, cryptography, forensics\n\n### Datasets\n- **Primary dataset(s)**: NSA Codebreaker 2024 challenges (6 tasks, 9-1000 points)\n- **Additional dataset(s)**: CBC25 / NSA Codebreaker 2025 (7 tasks)\n- **Source links**: Public writeups from competition participants\n- **Split sizes**: 2024 has 6 tasks; 2025 has 7 tasks (filter by task number)\n\n### Task\n- **Type**: Multi-turn sandbox with shell access\n- **Parser**: Custom submit() tool for answer submission\n- **Rubric overview**: Per-task verification (string match, JSON match, functional tests)\n\n### Docker Images\n\nThree tiered images:\n- `nancyjlau/codebreaker-2024-base` - Tasks 1-4 (Python, ZFS, Ghidra, gRPC)\n- `nancyjlau/codebreaker-2024-crypto` - Task 5 (+ gocryptfs, gmpy2)\n- `nancyjlau/codebreaker-2024-full` - Task 6 (+ Go, delve, binutils)\n\n2025 image:\n- `cm3lf0pl100ccpissq9t6yh2c/cbc25-forensics:2026-01-24` - Tasks 1-7 (forensics toolkit, Ghidra, JADX)\n- Local runs: tag your local build as `cm3lf0pl100ccpissq9t6yh2c/cbc25-forensics:2026-01-24`\n\n### Quickstart\n\n```bash\n# All tasks\nuv run vf-eval nsa-codebreaker -m gpt-4.1 -n 6 -r 1\n\n# Specific task only\nuv run vf-eval nsa-codebreaker -m gpt-4.1 -a '{\"task_filter\": [1]}'\n\n# 2025 tasks\nuv run vf-eval nsa-codebreaker -m gpt-4.1 -a '{\"years\": [2025]}'\n```\n\n**Note on artifacts:** The published environment wheel does not bundle large task artifacts. On first use it downloads the full artifact bundle (~160MB) into `~/.cache/nsa_codebreaker/` (override with `NSA_CODEBREAKER_CACHE_DIR` / `NSA_CODEBREAKER_ASSET_WHEEL_URL` / `NSA_CODEBREAKER_ASSET_WHEEL`).\n\n### Validation\n\nOffline checks (no model/sandbox required):\n\n```bash\n.venv/bin/python -m unittest discover -s tests\n```\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `years` | int\\|list[int] | `2024` | Challenge year(s) |\n| `task_filter` | list[int] | `None` | Filter to specific task numbers |\n| `timeout_minutes` | int | `60` | Sandbox timeout per task |\n| `timeout_per_command_seconds` | int | `120` | Per-command timeout |\n\n### Tasks (2024)\n\n| Task | Points | Category | Verification |\n|------|--------|----------|--------------|\n| 1 - No Token Left Behind | 9 | Forensics | String match |\n| 2 - Driving Me Crazy | 30 | ZFS Forensics | Hash set |\n| 3 - How did they get in? | 200 | Reverse Engineering | JSON match |\n| 4 - LLMs never lie | 200 | Log Analysis | String match |\n| 5 - The #153 | 450 | Cryptography | String match |\n| 6 - It's always DNS | 1000 | Exploitation | Functional |\n\n### Tasks (2025)\n\n| Task | Category | Verification |\n|------|----------|--------------|\n| 1 - Getting Started | Forensics | String match |\n| 2 - The hunt continues | Network Forensics | Multiline set match |\n| 3 - Digging deeper | Reverse Engineering | Multiline set match |\n| 4 - Unpacking Insight | Malware Analysis | String match |\n| 5 - Putting it all together | Cryptanalysis | String match |\n| 6 - Crossing the Channel | Vulnerability Research | Multiline set match |\n| 7 - Finale | Vulnerability Research, Exploitation | Harbor test |\n\n### Harbor-Style Verification\n\nSome tasks use `verification: \"harbor_test\"`. For these, the environment uploads a task-local test script and executes it inside the sandbox. The test script must write a reward file to `/logs/verifier/`:\n\n- `/logs/verifier/reward.txt` contains a single numeric value (1 for success, 0 for failure).\n- `/logs/verifier/reward.json` may include multiple numeric metrics; if `reward` exists it is used, otherwise a single numeric field is accepted.\n\nThe submitted answer is provided to the script at `/tmp/submission.txt` (unless you submit a sandbox path).\n\nFor \"file\" submissions:\n- Preferred: create the file inside the sandbox and submit its path (e.g. `@/tmp/payload.zip` or `/tmp/payload.zip`). Raw zip bytes are accepted by the verifier.\n- Alternative: submit the file bytes as a base64-encoded string (preferred) or a hex-encoded string.\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | 1.0 if correct, 0.0 otherwise |\n| `submitted` | First 100 chars of submitted answer |\n| `match` | Boolean for string/JSON match tasks |\n| `dns_success` | Boolean for 2024 Task 6 DNS verification |\n","encoding":"utf-8","truncated":false,"total_bytes":4379},"status":null}