{"data":{"kind":"file","path":"README.md","version_id":"v60zwkfu40mygv1pi7kkp076","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":4112,"modified_at":"2026-01-30T22:34:53.878000","content_hash":"5c25cb34b2014cfd492094e3fdf4f8831aaa7c936e81181912c0e14277c3cc48"},"entries":[],"content":"# harbor-tasks\n\n### Overview\n- **Environment ID**: `harbor-tasks`\n- **Short description**: Harbor tasks environment with Terminus agent and Prime Intellect INTELLECT model support\n- **Tags**: eval, harbor, terminus, rl\n\n### Datasets\n- **Primary dataset(s)**: Harbor tasks in `datasets-batch-2/` directory\n- **Task format**: Harbor-compatible (instruction.md, task.toml, environment/, tests/, solution/)\n- **Tasks included**: \n  - ws-session-persistence\n  - ws-transaction-rollback\n  - xml-external-entity-injection\n  - xml-json-to-csv-hard\n  - xpath-injection-vulnerability\n  - xxe-vulnerability\n  - yaml-to-dotenv\n  - zap-report-to-risk-policy-mapper\n  - zod-coercion-bug\n  - zod-superrefine-early-exit\n  - zone-file-axfr-parser\n\n### Task\n- **Type**: multiturn, cli_agent, harbor\n- **Agent**: Terminus-2 (Harbor agent framework)\n- **Model**: Prime Intellect INTELLECT-3 (via Prime Tunnel)\n- **Rubric overview**: Binary reward from test.sh execution\n\n### Quickstart\n\nRun an evaluation with default settings:\n\n```bash\n# Install the environment first\nprime env install harbor-tasks\n\n# Run evaluation (local)\nprime eval run harbor-tasks\n```\n\nRun with specific tasks and configuration:\n\n```bash\nprime eval run harbor-tasks \\\n  -m PrimeIntellect/INTELLECT-3 \\\n  -n 5 -r 3 -t 1024 -T 0.7 \\\n  -a '{\"tasks\": [\"ws-session-persistence\", \"yaml-to-dotenv\"]}'\n```\n\nRun training:\n\n```bash\n# Create training config (configs/lab/harbor-tasks-lite.toml)\nprime rl run configs/lab/harbor-tasks-lite.toml\n```\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `dataset_path` | str | `\"./datasets-batch-2\"` | Path to Harbor tasks directory |\n| `tasks` | list[str] | None | Specific task names to load (None = all) |\n| `num_tasks` | int | None | Limit number of tasks |\n| `agent_name` | str | `\"terminus-2\"` | Harbor agent name |\n| `model_name` | str | `\"openai/gpt-4\"` | Model name (routed via Prime Tunnel) |\n| `agent_workdir` | str | `\"/app\"` | Working directory for agent |\n| `timeout_minutes` | int | 60 | Timeout for agent execution |\n| `cpu_cores` | int | 2 | CPU cores for sandbox |\n| `memory_gb` | int | 4 | Memory in GB |\n| `disk_size_gb` | int | 10 | Disk size in GB |\n| `include_tests_for_agent` | bool | False | Whether agent can see tests |\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Binary reward (0.0 or 1.0) from test.sh execution |\n\n### How It Works\n\n1. Creates a Prime Sandbox for each Harbor task\n2. Installs dependencies (curl, git, uv, Python 3.12)\n3. Installs Harbor library via uv\n4. Configures Terminus agent with Prime Intellect model via Prime Tunnel\n5. Runs Terminus agent on the task instruction\n6. Intercepts all API requests through Prime Tunnel\n7. Computes reward by running Harbor test scripts\n\n### Requirements\n\n- Prime Sandboxes API access (set `PRIME_API_KEY` or run `prime login`)\n- Harbor tasks directory with proper structure:\n  - `instruction.md` - Task description\n  - `task.toml` - Task configuration\n  - `environment/Dockerfile` - Environment setup\n  - `tests/test.sh` - Test script (writes to `/logs/verifier/reward.txt`)\n  - `solution/solve.sh` - Oracle solution (uploaded after agent completes)\n\n### Reward Computation\n\n- Runs `tests/test.sh` after agent completion\n- Reads reward from `/logs/verifier/reward.txt` (primary) or `/logs/verifier/reward.json` (fallback)\n- Returns float reward value (typically 0.0 or 1.0)\n\n### Model Configuration\n\nThe environment is designed to work with Prime Intellect models:\n\n```toml\n# In training config\nmodel = \"PrimeIntellect/INTELLECT-3\"\n\n[env.args]\nagent_name = \"terminus-2\"\nmodel_name = \"openai/gpt-4\"  # Routed to Prime Intellect via tunnel\n```\n\nThe `model_name` argument is passed to the Harbor agent, which uses it for API calls. The Prime Tunnel intercepts these calls and routes them to the actual model specified in the training config.\n\n### Notes\n\n- Agent logs are saved to `/logs/` in the sandbox\n- The environment uses Harbor's agent framework, not custom agents\n- Supports both evaluation and RL training\n- Compatible with Prime Hosted Training platform\n","encoding":"utf-8","truncated":false,"total_bytes":4112},"status":null}