{"data":{"kind":"file","path":"README.md","version_id":"oee2nr7bcswtrwh2bozfanfd","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3323,"modified_at":"2026-02-02T16:20:45.299000","content_hash":"a18d1fdd231f87db16c837cfaf09106ddb27b260ec6e9154e37fe54a0bcd2f1d"},"entries":[],"content":"# OpenMed SymptomDx Environment\n\nSymptom-to-diagnosis prediction environment for RL fine-tuning using [gretelai/symptom_to_diagnosis](https://huggingface.co/datasets/gretelai/symptom_to_diagnosis) - mapping natural language symptom descriptions to 22 common diagnoses.\n\n## Task Description\n\nGiven a patient's symptom description in natural language, predict the most likely diagnosis. This tests clinical pattern recognition and differential diagnosis reasoning.\n\n## Diagnoses (22 Classes)\n\n| Category | Diagnoses |\n|----------|-----------|\n| Infectious | malaria, dengue, typhoid, pneumonia, common cold, chicken pox, impetigo, fungal infection |\n| Chronic | diabetes, hypertension, bronchial asthma, arthritis, psoriasis |\n| Gastrointestinal | gastroesophageal reflux disease, peptic ulcer disease, jaundice |\n| Neurological | migraine, cervical spondylosis |\n| Other | drug reaction, allergy, urinary tract infection, varicose veins |\n\n## Dataset\n\n- **Source**: [gretelai/symptom_to_diagnosis](https://huggingface.co/datasets/gretelai/symptom_to_diagnosis)\n- **Train**: 853 examples\n- **Test**: 212 examples\n- **Format**: Symptom text → Diagnosis label\n- **Balance**: ~40 examples per diagnosis\n\n## Reward Structure\n\n| Component | Weight | Description |\n|-----------|--------|-------------|\n| Exact Accuracy | 50% | Exact diagnosis match |\n| Partial Match | 20% | Credit for valid but wrong diagnosis |\n| Thinking | 20% | Clinical reasoning quality |\n| Format | 10% | Proper `\\boxed{}` usage |\n\n### Diagnosis Normalization\n- Handles common aliases (UTI → urinary tract infection, GERD → gastroesophageal reflux disease)\n- Case-insensitive matching\n- Partial string matching for minor variations\n\n## Example\n\n**Input:**\n```\nPatient presents with the following symptoms:\n\n\"I've been having headaches and migraines, and I can't sleep. My whole body\nshakes and twitches. Sometimes I feel lightheaded and dizzy.\"\n\nBased on these symptoms, what is the most likely diagnosis?\n```\n\n**Expected Output:**\n```\n<think>\nKey symptoms identified:\n- Headaches/migraines\n- Insomnia (can't sleep)\n- Body shaking and twitching\n- Lightheadedness and dizziness\n\nAnalysis:\n- These symptoms together (tremors, insomnia, headaches, dizziness) suggest\n  a systemic reaction rather than a primary neurological condition\n- Migraine alone wouldn't cause body shaking/twitching\n- The combination of symptoms is consistent with drug reaction or withdrawal\n- Not typical presentation of other conditions in the differential\n\nThe symptom pattern suggests drug reaction.\n</think>\n\\boxed{drug reaction}\n```\n\n## Usage\n\n```python\nfrom OpenMed_SymptomDx import load_environment\n\nenv = load_environment()\n```\n\n## Why Symptom-to-Diagnosis for Medical RL?\n\n1. **Clinical relevance**: Core physician skill\n2. **Verifiable answers**: Clear ground truth diagnoses\n3. **Pattern recognition**: Tests understanding of symptom clusters\n4. **Differential reasoning**: Encourages ruling out alternatives\n\n## Citation\n\n```bibtex\n@misc{gretelai_symptom_diagnosis,\n  title={Symptom to Diagnosis Dataset},\n  author={{Gretel.ai}},\n  year={2023},\n  howpublished={HuggingFace Datasets},\n  url={https://huggingface.co/datasets/gretelai/symptom_to_diagnosis},\n  note={Adapted from Symptom2Disease (Kaggle)}\n}\n```\n\n## License\n\nApache 2.0 (following Gretel dataset license)\n","encoding":"utf-8","truncated":false,"total_bytes":3323},"status":null}