{"data":{"kind":"file","path":"README.md","version_id":"flx6wwu6g4lnvau6zeqj07n8","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":4240,"modified_at":"2026-02-02T13:44:50.296000","content_hash":"79e0dbc938ab4548f6c29aa408c9d3c0f2d53474dc03bf48e5b496f808a46947"},"entries":[],"content":"# OpenMed CochranePLS Environment\n\nMedical text simplification environment for RL fine-tuning using [Cochrane Plain Language Summaries](https://huggingface.co/datasets/GEM/cochrane-simplification) - converting complex medical abstracts to accessible plain language.\n\n## Task Description\n\nGiven a complex medical abstract from a Cochrane systematic review, generate a simplified plain language summary that preserves key information while being accessible to non-medical readers.\n\nThis is a challenging task requiring:\n- Understanding complex medical content\n- Identifying key findings and conclusions\n- Translating jargon to simple language\n- Maintaining accuracy while improving readability\n\n## Dataset\n\n- **Source**: [GEM/cochrane-simplification](https://huggingface.co/datasets/GEM/cochrane-simplification)\n- **Domain**: Cochrane systematic review abstracts\n- **Train**: 3,570 pairs\n- **Validation**: 411 pairs\n- **Test**: 480 pairs\n- **Format**: Source (complex) → Target (simplified)\n\n## Reward Structure\n\n| Component | Weight | Description |\n|-----------|--------|-------------|\n| Content Preservation | 40% | Word overlap with target summary |\n| Simplification | 30% | Reading level similarity to target |\n| Thinking | 20% | Analysis quality in `<think>` tags |\n| Format | 10% | Proper `<summary>` tag usage |\n\n### Content Preservation\nMeasures how well the generated summary captures the same content as the reference summary using word overlap metrics.\n\n### Simplification Quality\nEstimates reading level based on:\n- Average word length (shorter = simpler)\n- Average sentence length (shorter = simpler)\n- Length ratio compared to target\n\n## Example\n\n**Input:**\n```\nPlease simplify the following medical abstract into plain language:\n\nMedical Abstract:\nTwo trials met the inclusion criteria. One compared 2% ketanserin ointment\nin polyethylene glycol (PEG) with PEG alone, used twice a day by 40\nparticipants with arterial leg ulcers, for eight weeks or until healing.\nThe other compared topical application of blood-derived concentrated\ngrowth factor (CGF) with standard dressing in 61 participants. Both trials\nwere small and at high risk of bias. There was insufficient evidence to\ndetermine the effect of either intervention on ulcer healing.\n```\n\n**Expected Output:**\n```\n<think>\nThis abstract discusses two clinical trials for treating arterial leg ulcers.\nKey findings: Both studies were small with quality issues, and neither showed\nclear evidence that the treatments work.\nTerms to simplify: \"polyethylene glycol\" → \"base cream\", \"arterial leg ulcers\"\n→ \"leg wounds from poor blood flow\", \"high risk of bias\" → \"quality concerns\"\n</think>\n<summary>\nWe found two small studies testing treatments for leg wounds caused by poor\nblood flow.\n\nOne study tested a medicated cream on 40 people. The other tested a treatment\nmade from blood products on 61 people.\n\nBoth studies had quality problems that make their results less reliable.\n\nWe cannot say whether these treatments help leg wounds heal because there\nisn't enough good evidence yet.\n</summary>\n```\n\n## Usage\n\n```python\nfrom OpenMed_CochranePLS import load_environment\n\nenv = load_environment()\n```\n\n## Why Text Simplification for Medical RL?\n\n1. **Patient communication**: Helps models learn to communicate medical information clearly\n2. **Health literacy**: Addresses the gap between medical complexity and public understanding\n3. **Measurable quality**: Can evaluate content preservation + simplification\n4. **Real-world impact**: Directly applicable to patient education materials\n\n## Citation\n\n```bibtex\n@inproceedings{gehrmann2021gem,\n    title={The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics},\n    author={Gehrmann, Sebastian and others},\n    booktitle={Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021)},\n    year={2021},\n    publisher={Association for Computational Linguistics}\n}\n\n@article{devaraj2021paragraph,\n    title={Paragraph-level Simplification of Medical Texts},\n    author={Devaraj, Ashwin and Marshall, Iain and Wallace, Byron and Li, Junyi Jessy},\n    journal={Proceedings of NAACL},\n    year={2021}\n}\n```\n\n## License\n\nCC-BY-4.0 (following GEM benchmark license)\n","encoding":"utf-8","truncated":false,"total_bytes":4240},"status":null}