{"data":{"kind":"file","path":"README.md","version_id":"zqql58hhmsyv25k2m1voswue","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1626,"modified_at":"2026-01-11T21:00:37.896000","content_hash":"0dfc26f17179868b22547df8e51c24bd348b70fb1f8482fbfd6e25bb1dfa3f43"},"entries":[],"content":"# OpenMed MedXpertQA\n\nExpert-level medical question-answering environment using the [TsinghuaC3I/MedXpertQA](https://huggingface.co/datasets/TsinghuaC3I/MedXpertQA) benchmark.\n\n## Overview\n\nMedXpertQA is a significantly harder medical benchmark than MedMCQA, featuring:\n\n- **4,460 expert-level questions** from specialty board exams\n- **10 answer options (A-J)** vs 4 in standard MCQs - much harder\n- **Filtered to remove easy/similar questions** - only challenging cases remain\n- **Deep clinical reasoning required** - questions span diagnosis, treatment, and basic medicine\n\n## Difficulty\n\n| Model | MedMCQA Accuracy | MedXpertQA Accuracy |\n|-------|------------------|---------------------|\n| GPT-4.1-mini | ~60-70% | ~40% |\n| Random baseline | 25% | 10% |\n\n## Dataset Split\n\n- **Train**: 1,960 examples (80% of test set)\n- **Eval**: 490 examples (20% of test set)\n\n## Reward Structure\n\n- **80% Accuracy**: Exact match on the final letter (A-J) parsed from `\\boxed{}`\n- **20% Format**: Proper `\\boxed{}` usage with sufficient reasoning (100+ chars, 4+ sentences for full credit)\n\n## Usage\n\n```python\nfrom OpenMed_MedXpertQA import load_environment\n\nenv = load_environment()\nprint(f\"Train: {len(env.dataset)} examples\")\nprint(f\"Eval: {len(env.eval_dataset)} examples\")\n```\n\n## Citation\n\n```bibtex\n@article{zuo2025medxpertqa,\n  title={MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding},\n  author={Zuo, Yuxin and Qu, Shang and Li, Yifei and Chen, Zhangren and Zhu, Xuekai and Hua, Ermo and Zhang, Kaiyan and Ding, Ning and Zhou, Bowen},\n  journal={arXiv preprint arXiv:2501.18362},\n  year={2025}\n}\n```\n","encoding":"utf-8","truncated":false,"total_bytes":1626},"status":null}