{"data":{"kind":"file","path":"README.md","version_id":"cuvm3pbmxc7c2jfq6v05rl6a","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3954,"modified_at":"2026-02-01T20:26:19.385000","content_hash":"5983886ea31e8b2d5f4adc50711ad78e0952f2eee76b34e9cf27705d77148554"},"entries":[],"content":"# OpenMed SciFact Environment\n\nScientific claim verification environment for RL fine-tuning using [SciFact](https://huggingface.co/datasets/allenai/scifact) - expert-written biomedical claims paired with evidence abstracts from scientific literature.\n\n## Task Description\n\nGiven a scientific claim and an evidence abstract from a research paper, determine whether the evidence **SUPPORTS** or **CONTRADICTS** the claim. This task requires careful analysis of scientific text and evidence-based reasoning.\n\n## Labels\n\n| Label | Description |\n|-------|-------------|\n| SUPPORT | Evidence directly confirms the claim |\n| CONTRADICT | Evidence directly refutes or negates the claim |\n\n## Dataset\n\n- **Source**: [allenai/scifact](https://huggingface.co/datasets/allenai/scifact)\n- **Paper**: [Wadden et al., EMNLP 2020](https://aclanthology.org/2020.emnlp-main.609/)\n- **Train**: 1,261 claims\n- **Validation**: 450 claims\n- **Corpus**: 5,183 scientific abstracts\n- **Format**: Claim + evidence abstract → verdict\n\n## Reward Structure\n\n| Component | Weight | Description |\n|-----------|--------|-------------|\n| Accuracy | 80% | Exact match on verdict (SUPPORT/CONTRADICT) |\n| Thinking | 15% | Quality of evidence analysis in `<think>` tags |\n| Format | 5% | Proper `\\boxed{}` answer format |\n\n### Thinking Quality Bonuses\n- Quoting evidence from abstract\n- Referencing specific sentence numbers\n- Using reasoning vocabulary (evidence, confirms, refutes, etc.)\n\n## Example\n\n**Input:**\n```\nScientific Claim:\n\"Vitamin D deficiency is associated with increased susceptibility to infection.\"\n\nEvidence from: Vitamin D and Immune Function\n\nAbstract:\n[0] Vitamin D has emerged as an important regulator of immune function.\n[1] The vitamin D receptor is expressed in most immune cells.\n[2] Epidemiological studies have linked vitamin D deficiency with increased rates of respiratory and other infections.\n[3] Mechanistically, vitamin D enhances antimicrobial peptide production.\n[4] Supplementation trials have shown mixed results on infection prevention.\n\nBased on the evidence in the abstract, does it SUPPORT or CONTRADICT the claim?\n```\n\n**Expected Output:**\n```\n<think>\nThe claim states that vitamin D deficiency is associated with increased susceptibility to infection.\n\nAnalyzing the evidence:\n- Sentence [2] directly addresses this: \"Epidemiological studies have linked vitamin D deficiency with increased rates of respiratory and other infections.\"\n- This provides direct epidemiological support for the association between deficiency and infection susceptibility.\n- Sentence [3] provides a mechanistic explanation (antimicrobial peptides).\n\nThe evidence in sentence [2] explicitly supports the claim's assertion about the association between vitamin D deficiency and infection susceptibility.\n</think>\n\\boxed{SUPPORT}\n```\n\n## Usage\n\n```python\nfrom OpenMed_SciFact import load_environment\n\nenv = load_environment()\n```\n\n## Why SciFact for Medical RL?\n\n1. **Evidence-based reasoning**: Requires analyzing scientific text, not just pattern matching\n2. **Expert annotations**: Claims written by domain experts with verified evidence\n3. **Biomedical focus**: Covers health and medical science literature\n4. **Clear verdicts**: Binary classification with unambiguous ground truth\n5. **Rationale available**: Evidence sentences identified for each claim\n\n## Citation\n\n```bibtex\n@inproceedings{wadden-etal-2020-fact,\n    title = \"Fact or Fiction: Verifying Scientific Claims\",\n    author = \"Wadden, David and Lin, Shanchuan and Lo, Kyle and Wang, Lucy Lu and van Zuylen, Madeleine and Cohan, Arman and Hajishirzi, Hannaneh\",\n    booktitle = \"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)\",\n    year = \"2020\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://aclanthology.org/2020.emnlp-main.609\",\n    pages = \"7534--7550\",\n}\n```\n\n## License\n\nCC BY-NC 2.0 (following SciFact license)\n","encoding":"utf-8","truncated":false,"total_bytes":3954},"status":null}