{"data":{"kind":"file","path":"README.md","version_id":"mbfr2lb3fhw8pt9fsszjlosl","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2006,"modified_at":"2026-01-15T13:29:35.942000","content_hash":"d154a1d076202290e8ae83398633ba6b4d2d5fcdc60c0fdc3d1af6002e2eb23e"},"entries":[],"content":"# OpenMed_DDXPlus\n\nDifferential diagnosis environment using the DDXPlus dataset for training medical reasoning models.\n\n## Overview\n\nThis environment trains models to perform differential diagnosis from patient symptoms and clinical findings. Given patient demographics, evidence (symptoms and antecedents), and a differential diagnosis list, the model must identify the most likely pathology.\n\n## Dataset\n\n- **Source**: DDXPlus (Hellisotherpeople/DDXPlus on HuggingFace)\n- **Size**: ~1.3M training examples, validation set available\n- **Task**: Multi-class classification (49 pathologies)\n- **Evidence Types**: 223 evidences (208 binary, 10 categorical, 5 multi-choice)\n\n## Task Format\n\n**Input**: Patient presentation with:\n- Demographics (age, sex)\n- Initial presenting symptom\n- Additional clinical findings\n- Differential diagnosis probabilities\n\n**Output**: Model provides reasoning in `<think>` tags and final diagnosis in `\\boxed{diagnosis_name}`\n\n## Example\n\n```\nPatient: 45-year-old M\nInitial presentation: chest pain\nAdditional findings:\n- shortness of breath\n- sweating: profuse\n- pain radiation: left arm\n\nDifferential diagnosis possibilities:\n- Myocardial infarction (45.2%)\n- Unstable angina (25.8%)\n- Panic attack (15.3%)\n...\n\n<think>\nGiven the patient's age, sex, and presentation with chest pain radiating to left arm with profuse sweating and dyspnea, this is highly suggestive of acute coronary syndrome. The differential shows myocardial infarction as most likely at 45.2%. The classic presentation with radiation and diaphoresis supports this diagnosis.\n</think>\n\n\\boxed{Myocardial infarction}\n```\n\n## Reward Structure\n\n- **80% Accuracy**: Correct pathology prediction\n- **15% Thinking**: Quality of clinical reasoning\n- **5% Format**: Proper use of `\\boxed{}` notation\n\n## Use Cases\n\n- Training diagnostic reasoning models\n- Medical education and simulation\n- Clinical decision support development\n- Differential diagnosis research\n\n## License\n\nDataset released under CC-BY license.\n","encoding":"utf-8","truncated":false,"total_bytes":2006},"status":null}