{"data":{"kind":"file","path":"README.md","version_id":"toawb88q60qdrths0key8ldd","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1278,"modified_at":"2025-09-27T18:59:40.903000","content_hash":"69dcdb11b5c958441793e3866fad029c2f6759b754754a1b8928aee4764d0717"},"entries":[],"content":"# agentclinic-extended\n\n### Overview\n- **Environment ID**: `agentclinic-extended`\n- **Short description**: Extended AgentClinic environment with enhanced multiturn capabilities for MEDQA cases\n- **Tags**: medical, multiturn, evaluation, verifiable-reward\n\n### Datasets\n- **Primary dataset(s)**: agentclinic_medqa_extended.jsonl (214 medical cases)\n- **Source links**: AgentClinic project\n- **Split sizes**: 214 cases total\n\n### Task\n- **Type**: single-turn (simplified for initial implementation)\n- **Parser**: Boxed answer extraction\n- **Rubric overview**: Accuracy-based evaluation with exact and fuzzy matching\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nvf-eval agentclinic-extended\n```\n\nConfigure model and sampling:\n\n```bash\nvf-eval agentclinic-extended -m gpt-4.1-mini -n 20 -r 3 -t 1024 -T 0.7\n```\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `dataset_path` | str | `None` | Path to the JSONL dataset file |\n| `use_think` | bool | `False` | Whether to use think mode (step-by-step reasoning) |\n| `max_turns` | int | `10` | Maximum number of turns per conversation |\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `accuracy` | Exact match on target diagnosis (0.0 or 1.0) |\n\n","encoding":"utf-8","truncated":false,"total_bytes":1278},"status":null}