{"data":{"kind":"file","path":"README.md","version_id":"tsijvmppsv7c0o2swqom2f1w","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1553,"modified_at":"2026-02-20T23:11:44.201000","content_hash":"f746b02cd135a0e08b397e40e42384231477f6fd80cb5037e040c9a9d70a971c"},"entries":[],"content":"# medconceptsqa\n\n## Overview\n- **Environment ID**: `medconceptsqa`\n- **Short description**: MedConcepts QA - an MCQ dataset involving medical codes.\n- **Tags**: medical, clinical, single-turn, multiple-choice, classification, test\n\n## Datasets\n- **Primary dataset(s)**: `medconceptsqa`\n- **Source links**: [Paper](https://www.sciencedirect.com/science/article/pii/S0010482524011740), [Github](https://github.com/nadavlab/MedConceptsQA/tree/master), [HF Dataset](https://huggingface.co/datasets/ofir408/MedConceptsQA)\n- **Split sizes**: 60 (dev / few-shot), 820k (test)\n\n## Task\n- **Type**: single-turn\n- **Rubric overview**: Binary scoring based on correct answer choice\n\n## Quickstart\nRun an evaluation with default settings:\n\n```bash\nprime eval run medconceptsqa -m \"openai/gpt-5-mini\" -n 5 -s\n```\n\nConfigure model and sampling:\n\n```bash\nmedarc-eval medconceptsqa -m \"openai/gpt-5-mini\" -n 20 --num-few-shot 4\n```\n\nNotes:\n- Use direct environment flags with `medarc-eval` (for example, `--split validation` or `--judge-model gpt-5-mini`).\n\n## Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_few_shot` | int | `0` | Number of few-shot examples to include in the prompt |\n| `use_think` | bool | `False` | Whether to use `<think>...</think>` formatting with `ThinkParser` |\n\n## Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `accuracy` | Exact match on target answer |\n\n## Authors\nThis environment has been put together by:\n\nAnish Mahishi - ([@macandro96](https://github.com/macandro96))\n","encoding":"utf-8","truncated":false,"total_bytes":1553},"status":null}