{"data":{"kind":"file","path":"README.md","version_id":"iww5x8awcpanjsmhqf8kn14b","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1414,"modified_at":"2026-02-20T23:11:44.209000","content_hash":"fcf9250b084b13b3759fa764397a19d7530b0d61f2fc26a76048d97115683a01"},"entries":[],"content":"# MetaMedQA\n\nEvaluation environment for the MetaMedQA dataset.\n\n## Overview\n- **Environment ID**: `metamedqa`\n- **Short description**: Single-turn medical multiple-choice QA drawn from multiple medical exam sources\n- **Tags**: medical, single-turn, multiple-choice, eval\n\n## Datasets\n- **Primary dataset(s)**: MetaMedQA\n- **Source links**: [maximegmd/MetaMedQA](https://huggingface.co/datasets/maximegmd/MetaMedQA)\n- **Split sizes**: Uses provided test split\n\n## Task\n- **Type**: single-turn\n- **Rubric overview**: Binary scoring (1.0 / 0.0) based on correct letter or answer text match\n\n## Quickstart\nRun an evaluation with default settings:\n\n```bash\nprime eval run metamedqa -m \"openai/gpt-5-mini\" -n 5 -s\n```\n\nConfigure model and sampling:\n\n```bash\nmedarc-eval metamedqa -m \"openai/gpt-5-mini\" -n 20 --shuffle-answers --shuffle-seed 1618\n```\n\n## Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `split` | str | `\"test\"` | Dataset split to use |\n| `shuffle_answers` | bool | `False` | Whether to shuffle answer choices |\n| `shuffle_seed` | int \\| None | `1618` | Seed for deterministic answer shuffling |\n\n## Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `accuracy` | (weight 1.0): 1.0 if parsed letter matches the gold letter, else 0.0 |\n\n## Authors\nThis environment has been put together by:\n\nAymane Ouraq - ([@aymaneo](https://github.com/aymaneo))\n","encoding":"utf-8","truncated":false,"total_bytes":1414},"status":null}