{"data":{"kind":"file","path":"README.md","version_id":"rk1wsz41c5rgwgkdisjvzov8","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3145,"modified_at":"2026-02-20T23:11:44.196000","content_hash":"8c3e17f3d4a3f773a3a96034b23760454ce3ef480e1af064448fae811b9ac553"},"entries":[],"content":"# MEDMCQA\n\nEvaluation environment for the MEDMCQA dataset.\n\n## Overview\n- **Environment ID:** `med_mcqa`\n- **Short description:** Single-turn medical multiple-choice QA\n- **Tags:** medical, single-turn, multiple-choice, train, eval\n\n## Datasets\n- **Primary dataset(s):** MedMCQA (HF datasets)\n- **Source links:** [lighteval/med_mcqa](https://huggingface.co/datasets/lighteval/med_mcqa)\n- **Split sizes:** Uses provided train and validation splits\n\n## Task\n- **Type:** Single-turn\n- **Parser:** `Parser` (standard) or `ThinkParser` (if using reasoning mode) depending on `use_think`\n- **Rubric overview:** Binary scoring (1.0 / 0.0), based on correct letter or answer text match.  \n- **Reward function:** `accuracy` — returns 1.0 if the predicted answer matches, else 0.0.\n\n## Model Input Format\nEach example is formatted as a single-turn user message: \n\n```\nGive a letter answer among A, B, C or D.\nQuestion: {question}\nA. {opa}\nB. {opb}\nC. {opc}\nD. {opd}\nAnswer:\n```\n\nThe model should respond with a letter choice (A–D).\n\n## Quickstart\nRun an evaluation with default settings:\n\n```bash\nprime eval run med_mcqa -m \"openai/gpt-5-mini\" -n 5 -s\n```\n\n## Usage\nTo run an evaluation using `medarc-eval` with the OpenAI API:\n\n```bash\nexport OPENAI_API_KEY=sk-...\nmedarc-eval med_mcqa -m \"openai/gpt-5-mini\" -n 5 -s\n\n# Shuffled-answers example (seed 1618), with one change from defaults (`--use-think`).\nmedarc-eval med_mcqa -m \"openai/gpt-5-mini\" -n 5 -s --shuffle-answers --shuffle-seed 1618 --use-think\n```\nReplace `OPENAI_API_KEY` with your actual API key.\n\n## Authors\nThis environment has been put together by:\n\nRatna Sagari Grandhi - ([@sagarigrandhi](https://github.com/sagarigrandhi))\n\n## Credits \nDataset:\n\n```bibtex\n@InProceedings{pmlr-v174-pal22a,\n  title = \t {MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering},\n  author =       {Pal, Ankit and Umapathi, Logesh Kumar and Sankarasubbu, Malaikannan},\n  booktitle = \t {Proceedings of the Conference on Health, Inference, and Learning},\n  pages = \t {248--260},\n  year = \t {2022},\n  editor = \t {Flores, Gerardo and Chen, George H and Pollard, Tom and Ho, Joyce C and Naumann, Tristan},\n  volume = \t {174},\n  series = \t {Proceedings of Machine Learning Research},\n  month = \t {07--08 Apr},\n  publisher =    {PMLR},\n  pdf = \t {https://proceedings.mlr.press/v174/pal22a/pal22a.pdf},\n  url = \t {https://proceedings.mlr.press/v174/pal22a.html},\n  abstract = \t {This paper introduces MedMCQA, a new large-scale, Multiple-Choice Question Answering (MCQA) dataset designed to address real-world medical entrance exam questions. More than 194k high-quality AIIMS & NEET PG entrance exam MCQs covering 2.4k healthcare topics and 21 medical subjects are collected with an average token length of 12.77 and high topical diversity. Each sample contains a question, correct answer(s), and other options which requires a deeper language understanding as it tests the 10+ reasoning abilities of a model across a wide range of medical subjects & topics. A detailed explanation of the solution, along with the above information, is provided in this study.}\n}\n```\n","encoding":"utf-8","truncated":false,"total_bytes":3145},"status":null}