{"data":{"kind":"file","path":"README.md","version_id":"dhyi0jr0n0lydkrkymh5xtvt","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2376,"modified_at":"2026-05-29T17:36:04.792000","content_hash":"cf1ed01193aca52212d43170b1f6bdf8251ea7e003ce55d41f5357d6550085f5"},"entries":[],"content":"# harbor-terminal-bench\n\nPrime Verifiers environment wrapping Harbor Terminal-Bench tasks.\n\n## Dataset\n\nDownloaded locally with:\n\n```bash\nharbor datasets download terminal-bench@2.0 -o data/terminal-bench-2\n```\n\nThe Harbor CLI stores tasks as `hash/task-name/{instruction.md,task.toml,environment,tests}`. This environment creates symlinks under `.cache/harbor-terminal-bench-flat` so Prime's built-in `HarborTaskset` can consume them.\n\n## Quick smoke eval\n\n```bash\nprime eval run ./environments/harbor_terminal_bench \\\n  -m openai/gpt-4.1-mini \\\n  -n 1 -r 1 \\\n  -a '{\"dataset_path\":\"./data/terminal-bench-2\",\"max_examples\":1,\"max_turns\":4}'\n```\n\n## Environment args\n\n- `dataset_path`: local Harbor dataset path or Harbor registry id; default uses `./data/terminal-bench-2` if present, otherwise `terminal-bench@2.0`.\n- `task_names`: explicit list of task names.\n- `split_file`: newline-delimited task-name file.\n- `split`: built-in split: `train` (60), `dev` (14), `holdout` (15), or `all` (89).\n- `max_examples`: cap used for smoke tests.\n- `harness`: one of `mini-swe-agent`, `opencode`, `pi`, `rlm`, `roder-zero`; default `mini-swe-agent`.\n- `max_turns`: max turns passed to compatible harnesses; default `80`.\n- `scope`: Prime sandbox scope: `rollout`, `group`, or `global`; default `rollout`.\n- `roder_binary_url`: public Linux AMD64 Roder binary URL for `roder-zero`; defaults to `https://dl.roder.sh/zero-coder-roder-x86_64-unknown-linux-gnu`.\n- `roder_config_url`: public zero-coder config URL for `roder-zero`; defaults to `https://dl.roder.sh/zero-coder-roder.config.toml`.\n- `roder_soft_timeout_sec`: optional soft timeout for `roder exec`; default `900`.\n- `task_ledger_required`: whether `roder-zero` passes `--task-ledger-required`; default `true`.\n\nRewards are produced by `verifiers.v1.HarborTaskset`: upload `/tests`, run `bash test.sh`, then parse `/logs/verifier/reward.txt` or `/logs/verifier/reward.json`.\n\n## Roder Zero harness\n\nThe `roder-zero` harness installs Roder inside each Linux AMD64 sandbox from the public `dl.roder.sh` zero-coder binary, writes an isolated Verifiers-compatible config, then runs:\n\n```bash\nroder exec --json --profile eval --mode bypass --skip-git-repo-check --task-ledger-required -\n```\n\nThe RL smoke config at `configs/rl/harbor-terminal-bench-smoke.toml` is configured to use this harness for Harbor Prime Intellect runs.\n","encoding":"utf-8","truncated":false,"total_bytes":2376},"status":null}