{"data":{"kind":"file","path":"README.md","version_id":"amfuqszd2rmpmp9mnwybbx9d","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2761,"modified_at":"2026-01-15T13:30:45.573000","content_hash":"9b192a0c693d8fc2a3614be02e7f7fbcd518e033e4153bb9bb3f3be0178bff42"},"entries":[],"content":"# OpenMed_ICD10\n\nMedical coding environment using synthetic EHR data for training ICD-10-CM code assignment models.\n\n## Overview\n\nThis environment trains models to assign accurate ICD-10-CM diagnostic codes from clinical notes. Given a patient's clinical documentation, the model must identify all relevant diagnoses and assign the appropriate standardized codes used in healthcare billing and record-keeping.\n\n## Dataset\n\n- **Source**: FiscaAI/synth-ehr-icd10cm-prompt on HuggingFace\n- **Size**: 366k synthetic EHR-code pairs\n- **Task**: Multi-label classification (ICD-10-CM codes)\n- **Code Format**: ICD-10-CM diagnostic codes (e.g., M24.541, E11.9)\n\n## Task Format\n\n**Input**: Clinical note describing patient presentation, symptoms, and diagnoses\n\n**Output**: Model provides reasoning in `<think>` tags and codes in `\\boxed{CODE1, CODE2, ...}`\n\n## Example\n\n```\nClinical Note:\nThe patient has a history of right hand injury due to a fall three months ago.\nAn X-ray revealed a fracture of the fifth metacarpal bone. The patient was treated\nwith casting. The patient presents today with persistent pain and stiffness in the\nright hand, limiting range of motion. Physical examination reveals contracture of\nthe right hand.\n\n<think>\nThe clinical note documents a contracture of the right hand, which is the current\npresenting condition. The previous fracture is mentioned as history but the primary\ndiagnosis for this visit is the contracture. The specific location is the right hand.\nICD-10-CM code M24.541 represents \"Contracture, right hand\" which is the most\nspecific code for this condition.\n</think>\n\n\\boxed{M24.541}\n```\n\n## Reward Structure\n\n- **80% Accuracy**: Correct ICD-10-CM codes (with partial credit for hierarchical matches)\n- **15% Thinking**: Quality of coding reasoning and justification\n- **5% Format**: Proper use of `\\boxed{}` notation\n\n## Accuracy Scoring\n\nThe environment uses sophisticated accuracy scoring:\n- **Exact match**: 100% score for all codes correct\n- **Partial credit**: Jaccard similarity for partially correct sets\n- **Hierarchical matching**: Credit for parent/child code relationships\n  - E.g., M24.5 matches M24.541 (less specific but correct category)\n\n## Use Cases\n\n- Training medical coding automation models\n- Healthcare billing optimization\n- Clinical documentation improvement\n- EHR code suggestion systems\n- Medical coding education\n\n## Real-World Impact\n\nMedical coding is a multi-billion dollar industry in healthcare. Accurate automated coding can:\n- Reduce billing errors and claim denials\n- Speed up reimbursement cycles\n- Improve clinical documentation quality\n- Reduce manual coding workload\n- Ensure compliance with coding standards\n\n## License\n\nDataset is synthetic and publicly available on HuggingFace.\n","encoding":"utf-8","truncated":false,"total_bytes":2761},"status":null}