{"data":{"kind":"file","path":"README.md","version_id":"sjbk5zen7zcae5abijk934pe","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3993,"modified_at":"2026-02-01T16:44:47.466000","content_hash":"0eefea053aff4373b6cd7338aa03a836571e3e99a0f6c821b8dbb3374ae89420"},"entries":[],"content":"# OpenMed MedCalc Environment\n\nMedical calculation environment for RL fine-tuning using [MedCalc-Bench](https://huggingface.co/datasets/ncbi/MedCalc-Bench) - the first benchmark for evaluating LLMs as clinical calculators.\n\n## Task Description\n\nGiven a patient note and clinical question, extract relevant values and compute the correct medical calculation. This tests both information extraction AND mathematical reasoning abilities.\n\n## Calculator Categories\n\n| Category | Examples |\n|----------|----------|\n| Lab Tests | Creatinine Clearance, GFR, Anion Gap |\n| Clinical Assessment | BMI, BSA, Ideal Body Weight |\n| Risk Calculation | CHADS₂-VASc, Wells Score, HEART Score |\n| Dosing | Drug dosing adjustments, Weight-based dosing |\n| Diagnostic Criteria | SIRS Criteria, qSOFA, Rome IV |\n| Prognostic Scores | APACHE II, MELD Score, Glasgow Coma Scale |\n\n## Dataset\n\n- **Source**: [ncbi/MedCalc-Bench](https://huggingface.co/datasets/ncbi/MedCalc-Bench)\n- **Paper**: [NeurIPS 2024 Datasets Track](https://arxiv.org/abs/2406.12036)\n- **Train**: 10,500 instances\n- **Test**: 1,100 instances\n- **Calculators**: 55 different medical calculators\n- **Format**: Patient note + question → numerical answer\n\n## Smart Verification\n\nUnlike simple exact-match, this environment uses **numerical tolerance verification**:\n\n```python\n# Each instance has tolerance bounds\nGround_Truth_Answer: \"141.04 mL/min\"\nLower_Limit: \"133.99\"\nUpper_Limit: \"148.09\"\n\n# Reward logic:\n# - 1.0 if within [lower, upper] bounds\n# - 1.0 if within 5% of ground truth\n# - 0.75 if within 10%\n# - 0.5 if within 25%\n# - 0.25 if within 50%\n# - 0.0 otherwise\n```\n\nThis is critical for medical calculations where small rounding differences shouldn't be penalized.\n\n## Reward Structure\n\n| Component | Weight | Description |\n|-----------|--------|-------------|\n| Numerical Accuracy | 60% | Value within tolerance bounds (not exact match!) |\n| Thinking | 20% | Quality of calculation steps in `<think>` tags |\n| Format | 10% | Proper `\\boxed{}` answer format |\n| Units | 10% | Including appropriate medical units |\n\n## Example\n\n**Input:**\n```\nCalculator: Creatinine Clearance (Cockcroft-Gault Equation)\nCategory: lab test\n\nPatient Note:\nA 65-year-old male presents with chronic kidney disease. His weight is 70 kg,\nheight 170 cm. Laboratory results show serum creatinine of 1.8 mg/dL.\nPast medical history includes hypertension and type 2 diabetes.\n\nQuestion: What is the patient's creatinine clearance using the Cockcroft-Gault equation?\n\nCompute the answer and show your calculation steps.\n```\n\n**Expected Output:**\n```\n<think>\nFrom the patient note, I extract the relevant values:\n- Age: 65 years\n- Sex: Male\n- Weight: 70 kg\n- Serum Creatinine: 1.8 mg/dL\n\nThe Cockcroft-Gault equation for creatinine clearance is:\nCrCl = [(140 - Age) × Weight × (0.85 if female)] / (72 × Serum Creatinine)\n\nFor a male patient:\nCrCl = [(140 - 65) × 70] / (72 × 1.8)\nCrCl = [75 × 70] / 129.6\nCrCl = 5250 / 129.6\nCrCl = 40.51 mL/min\n\nThis indicates moderate-to-severe renal impairment (CKD Stage 3b).\n</think>\n\\boxed{40.51 mL/min}\n```\n\n## Usage\n\n```python\nfrom OpenMed_MedCalc import load_environment\n\nenv = load_environment()\n```\n\n## Why Numerical Tolerance Matters\n\nMedical calculations often involve:\n- Different rounding conventions (2 vs 3 decimal places)\n- Unit conversions with slight variations\n- Formula variants (e.g., different GFR equations)\n\nExact string matching would incorrectly penalize `40.5 mL/min` vs `40.51 mL/min` vs `40.509 mL/min` when all are clinically equivalent.\n\n## Citation\n\n```bibtex\n@inproceedings{khandekar2024medcalcbench,\n  title={MedCalc-Bench: Evaluating Large Language Models for Medical Calculations},\n  author={Khandekar, Nikhil and Luo, Qiao Jin and Abacha, Asma Ben and others},\n  booktitle={Proceedings of the 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and Benchmarks},\n  year={2024}\n}\n```\n\n## License\n\nCC-BY-SA-4.0 (following MedCalc-Bench license)\n","encoding":"utf-8","truncated":false,"total_bytes":3993},"status":null}