{"data":{"kind":"file","path":"README.md","version_id":"i9x0xk9i6ahuqkdoygn27s57","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2940,"modified_at":"2025-12-24T05:43:51.758000","content_hash":"ca9776c112b5c27ac333aff3cd15f5942a5c23b75687d911f205054f44d19d8c"},"entries":[],"content":"# MEDMCQA\n\nEvaluation environment for the MEDMCQA dataset.\n\n### Overview\n- **Environment ID:** `med_mcqa`\n- **Short description:** Single-turn medical multiple-choice QA\n- **Tags:** medical, single-turn, multiple-choice, train, eval\n\n### Datasets\n- **Primary dataset(s):** MedMCQA (HF datasets)\n- **Source links:** [lighteval/med_mcqa](https://huggingface.co/datasets/lighteval/med_mcqa)\n- **Split sizes:** Uses provided train and validation splits\n\n### Task\n- **Type:** Single-turn\n- **Parser:** `Parser` (standard) or `ThinkParser` (if using reasoning mode) depending on `use_think`\n- **Rubric overview:** Binary scoring (1.0 / 0.0), based on correct letter or answer text match.  \n- **Reward function:** `accuracy` — returns 1.0 if the predicted answer matches, else 0.0.\n\n### Model Input Format\nEach example is formatted as a single-turn user message: \n\n```\nGive a letter answer among A, B, C or D.\nQuestion: {question}\nA. {opa}\nB. {opb}\nC. {opc}\nD. {opd}\nAnswer:\n```\n\nThe model should respond with a letter choice (A–D).\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval med_mcqa\n```\n\n### Usage\nTo run an evaluation using `vf-eval` with the OpenAI API:\n\n```bash\nexport OPENAI_API_KEY=sk-...\nuv run vf-eval \\\n  -m gpt-4.1-mini \\\n  -n 5 \\\n  -s \\\n  med_mcqa\n```\nReplace `OPENAI_API_KEY` with your actual API key.\n\n### Authors\nThis environment has been put together by:\n\nRatna Sagari Grandhi - ([@sagarigrandhi](https://github.com/sagarigrandhi))\n\n### Credits \nDataset:\n\n```bibtex\n@InProceedings{pmlr-v174-pal22a,\n  title = \t {MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering},\n  author =       {Pal, Ankit and Umapathi, Logesh Kumar and Sankarasubbu, Malaikannan},\n  booktitle = \t {Proceedings of the Conference on Health, Inference, and Learning},\n  pages = \t {248--260},\n  year = \t {2022},\n  editor = \t {Flores, Gerardo and Chen, George H and Pollard, Tom and Ho, Joyce C and Naumann, Tristan},\n  volume = \t {174},\n  series = \t {Proceedings of Machine Learning Research},\n  month = \t {07--08 Apr},\n  publisher =    {PMLR},\n  pdf = \t {https://proceedings.mlr.press/v174/pal22a/pal22a.pdf},\n  url = \t {https://proceedings.mlr.press/v174/pal22a.html},\n  abstract = \t {This paper introduces MedMCQA, a new large-scale, Multiple-Choice Question Answering (MCQA) dataset designed to address real-world medical entrance exam questions. More than 194k high-quality AIIMS & NEET PG entrance exam MCQs covering 2.4k healthcare topics and 21 medical subjects are collected with an average token length of 12.77 and high topical diversity. Each sample contains a question, correct answer(s), and other options which requires a deeper language understanding as it tests the 10+ reasoning abilities of a model across a wide range of medical subjects & topics. A detailed explanation of the solution, along with the above information, is provided in this study.}\n}\n```\n","encoding":"utf-8","truncated":false,"total_bytes":2940},"status":null}