{"data":{"kind":"file","path":"README.md","version_id":"v1bl1a0elmurasng7q4kbuop","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1569,"modified_at":"2026-02-08T18:08:22.896000","content_hash":"20488547532d7b8c7f841343ef135b25434cbbe0a4f58a3a5a3603f79c1ad886"},"entries":[],"content":"# OpenMed MedAbstracts\n\nMedical abstract disease category classification environment for RL fine-tuning.\n\n## Task\n\nGiven a PubMed abstract, classify the primary disease category into one of 5 classes:\n- **A. Neoplasms** - Cancer, tumors, oncology\n- **B. Digestive System Diseases** - GI disorders\n- **C. Nervous System Diseases** - Neurological conditions\n- **D. Cardiovascular Diseases** - Heart and vascular diseases\n- **E. General Pathological Conditions** - Other pathologies\n\n## Dataset\n\n- **Source**: `TimSchopf/medical_abstracts`\n- **Size**: 14,438 abstracts (train: 11,550, test: 2,888)\n- **Labels**: 5 disease categories\n- **Distribution**: Neoplasms 22%, Digestive 10%, Nervous 13%, Cardiovascular 21%, General 33%\n\n## Reward Structure\n\n| Reward | Weight | Description |\n|--------|--------|-------------|\n| Accuracy | 45% | Exact category match |\n| Partial Match | 20% | Credit for valid category prediction |\n| Thinking | 20% | Quality of medical reasoning in `<think>` tags |\n| Format | 15% | Proper `\\boxed{}` or `|answer|` format |\n\n## Example\n\n**Input**: \"This study investigated the role of EGFR mutations in non-small cell lung cancer...\"\n\n**Expected Output**:\n```\n<think>\nThe abstract discusses EGFR mutations in lung cancer (NSCLC), which is a malignant\nneoplasm. The focus is on oncology and tumor biology.\n</think>\n\\boxed{neoplasms}\n```\n\n## Citation\n\n```\n@misc{schopf-medical-abstracts,\n  title={Medical Abstracts Dataset},\n  author={Tim Schopf},\n  publisher={HuggingFace},\n  url={https://huggingface.co/datasets/TimSchopf/medical_abstracts}\n}\n```\n","encoding":"utf-8","truncated":false,"total_bytes":1569},"status":null}