{"data":{"kind":"file","path":"README.md","version_id":"qkwdgabcly8h5gr2x6ab4q7v","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1956,"modified_at":"2025-10-01T10:11:36.784000","content_hash":"52ec7724d68f997f361d8c18181c5635cfa7d93b722ff16eb733bf75f2f28250"},"entries":[],"content":"# agentclinic-nejm\n\n### Overview\n- **Environment ID**: `agentclinic-nejm`\n- **Short description**: AgentClinic NEJM environment with image-based medical cases for specialized medical evaluation\n- **Tags**: medical, multiturn, evaluation, verifiable-reward, nejm, imaging\n\n### Datasets\n- **Primary dataset(s)**: agentclinic_nejm_extended.jsonl (120 NEJM medical cases with images)\n- **Source links**: AgentClinic project, NEJM Clinical Images\n- **Split sizes**: 120 cases total\n\n### Task\n- **Type**: multiturn (conversational medical diagnosis)\n- **Parser**: Boxed answer extraction\n- **Rubric overview**: Accuracy-based evaluation with exact and fuzzy matching for image-based medical cases\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nvf-eval agentclinic-nejm\n```\n\nConfigure model and sampling:\n\n```bash\nvf-eval agentclinic-nejm -m gpt-4.1-mini -n 20 -r 3 -t 1024 -T 0.7\n```\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `dataset_path` | str | `None` | Path to the JSONL dataset file |\n| `use_think` | bool | `False` | Whether to use think mode (step-by-step reasoning) |\n| `max_turns` | int | `10` | Maximum number of turns per conversation |\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `accuracy` | Exact match on target diagnosis (0.0 or 1.0) |\n\n### Special Features\n- **Image-based cases**: Supports medical image analysis\n- **Specialty tagging**: Cases tagged by medical specialty\n- **Multiturn interaction**: Agent can ask for additional information\n- **Comprehensive prompts**: Includes patient info, physical exams, and test results\n\n\n    \"\"\"\n    Example CLI:\n        uv pip install -e .             \n\n      uv run --active -m verifiers.scripts.eval \\\n        -m mistral-small-latest \\\n        -b https://api.mistral.ai/v1 \\\n        -k MISTRAL_API_KEY \\\n        agentclinic_nejm  -n 120  --max-concurrent 4  --rollouts-per-example 3 -T 0.0 -s\n    \"\"\"","encoding":"utf-8","truncated":false,"total_bytes":1956},"status":null}