{"data":{"kind":"file","path":"README.md","version_id":"znkcygj9rz3950s1eg6nqm3h","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3247,"modified_at":"2026-01-10T07:28:07.221000","content_hash":"dc8e82639e6c75fac1733965fefa70d35f74c44a19dae1188b2d9a99a3c05d2b"},"entries":[],"content":"# dummy-harbor-env\n\n<a href=\"https://github.com/PrimeIntellect-ai/verifiers/tree/main/environments/dummy_harbor_env\">\n<img src=\"https://img.shields.io/badge/GitHub-181717?style=for-the-badge&logo=github&logoColor=white\" alt=\"Source Code\">\n</a>\n\n### Overview\n\n- **Environment ID**: `dummy-harbor-env`\n- **Short description**: Minimal Harbor environment for testing the CLI agent interception framework\n- **Tags**: `dummy`, `testing`, `cli-agent`, `harbor`\n\n### Datasets\n\n- **Primary dataset**: Harbor-format tasks in `tasks/` directory\n- **Source**: Bundled with environment\n- **Tasks**: 1 dummy task (`hello-world`)\n\n### Task\n\n- **Type**: single-turn (via HarborEnv)\n- **Base class**: `HarborEnv` (extends `CliAgentEnv`)\n- **Rubric overview**:\n  - Reward computed by `tests/test.sh` which runs pytest on `test_state.py`\n  - Returns 1.0 if `/app/hello.txt` contains \"Hello, world!\", 0.0 otherwise\n\n### Quickstart\n\nRun an evaluation with default settings:\n\n```bash\nprime eval run dummy-harbor-env\n```\n\nConfigure model and sampling:\n\n```bash\nprime eval run dummy-harbor-env -m gpt-4.1-mini -n 1 -r 1\n```\n\n### How It Works\n\nThis environment demonstrates the HarborEnv/CliAgentEnv data flow:\n\n1. **Harbor Task Loading**: Task is loaded from `tasks/hello-world/` with `task.toml`, `instruction.md`, and `tests/`\n2. **Sandbox Creation**: A Docker sandbox is created with the task instruction uploaded to `/task/`\n3. **Agent Execution**: A Python script reads the instruction and makes an OpenAI API call\n4. **Interception**: The API call is intercepted by CliAgentEnv's HTTP proxy server (via Cloudflare tunnel)\n5. **LLM Response**: The LLM returns a bash command to complete the task\n6. **Execution**: The agent executes the command in `/app`\n7. **Testing**: Harbor's `tests/test.sh` runs pytest to verify the result\n\n### Agent Script Details\n\nThe embedded agent script:\n\n- Reads task instruction from `/task/instruction.md`\n- Asks the LLM for a bash command to complete the task\n- Executes the returned command in `/app`\n\nFor the `hello-world` task, the LLM should respond with something like:\n```bash\necho \"Hello, world!\" > hello.txt\n```\n\n### Environment Arguments\n\n| Argument          | Type                | Default            | Description                              |\n| ----------------- | ------------------- | ------------------ | ---------------------------------------- |\n| `dataset_path`    | `str \\| Path`       | `./tasks`          | Path to Harbor-format tasks directory    |\n| `tasks`           | `list[str] \\| None` | `None`             | Specific task names to load (None = all) |\n| `agent_workdir`   | `str`               | `/app`             | Working directory for agent in sandbox   |\n| `docker_image`    | `str`               | `python:3.11-slim` | Docker image for sandbox                 |\n| `timeout_seconds` | `float`             | `300.0`            | Overall rollout timeout                  |\n| `max_turns`       | `int`               | `-1`               | Max turns (-1 = unlimited)               |\n\n### Metrics\n\n| Metric   | Meaning                                              |\n| -------- | ---------------------------------------------------- |\n| `reward` | 1.0 if pytest passes (hello.txt correct), 0.0 otherwise |\n","encoding":"utf-8","truncated":false,"total_bytes":3247},"status":null}