{"data":{"kind":"file","path":"README.md","version_id":"zeh2ddpyd5e8mpw02loacowr","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1531,"modified_at":"2026-03-22T21:47:34.491000","content_hash":"3e774fdc1aff5d77d7a40acd4e9f6329947a814a346ab82a55810b4639eff027"},"entries":[],"content":"# cwe-agent-prod-1\n\n### Overview\n- **Environment ID**: `cwe-agent-prod-1`\n- **Short description**: Exact-match CWE classification over a frozen Top-25 benchmark.\n- **Tags**: security, cwe, classification, eval\n\n### Datasets\n- **Primary dataset(s)**: bundled `top25-test.jsonl`, one example per 2025 MITRE Top-25 CWE\n- **Source links**: repo-local benchmark derived from the project evaluation set\n- **Split sizes**: eval 25 examples by default\n\n### Task\n- **Type**: single-turn\n- **Output format expectations**: return exactly one CWE ID such as `CWE-918`\n- **Rubric overview**: exact CWE ID match extracted from the model completion\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nprime eval run ./environments/cwe_agent_prod_1\n```\n\nConfigure model and sampling:\n\n```bash\nprime eval run ./environments/cwe_agent_prod_1 \\\n  -m gpt-4.1-mini \\\n  -n 10 \\\n  -r 1 \\\n  -a '{\"num_examples\": 10}'\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n- Use `prime env push ./environments/cwe_agent_prod_1` when you are ready to publish it to Env Hub.\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `dataset_path` | str or null | `null` | Optional override for the eval JSONL file |\n| `num_examples` | int | `-1` | Limit rows for quick test runs, `-1` uses all rows |\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Exact-match reward: `1.0` if predicted CWE ID matches the target, else `0.0` |\n","encoding":"utf-8","truncated":false,"total_bytes":1531},"status":null}