{"data":{"kind":"file","path":"README.md","version_id":"qghjbdowxa4x9q52mne8b4xb","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3952,"modified_at":"2026-02-03T23:19:12.938000","content_hash":"529d4a31c619fd58d046a4ad2b00de616bc89912a87341fc80fc0fb602563782"},"entries":[],"content":"# OpenMed HoC (Hallmarks of Cancer) Environment\n\nMulti-label classification environment for cancer hallmark identification using [bigbio/hallmarks_of_cancer](https://huggingface.co/datasets/bigbio/hallmarks_of_cancer) - classifying PubMed sentences by the cancer hallmarks they describe.\n\n## Task Description\n\nGiven a biomedical sentence from PubMed, identify which of the 10 Hallmarks of Cancer are described. This is a multi-label classification task where sentences may describe 0, 1, or multiple hallmarks.\n\n## The 10 Hallmarks of Cancer\n\n| Hallmark | Description |\n|----------|-------------|\n| Sustaining proliferative signaling | Uncontrolled cell growth through growth factor signaling |\n| Evading growth suppressors | Bypassing tumor suppressor mechanisms |\n| Resisting cell death | Avoiding apoptosis and other cell death pathways |\n| Enabling replicative immortality | Telomere maintenance and unlimited replication |\n| Inducing angiogenesis | Promoting blood vessel formation |\n| Activating invasion and metastasis | Spreading to distant tissues |\n| Genomic instability and mutation | DNA damage and mutation accumulation |\n| Tumor promoting inflammation | Inflammation that promotes tumor growth |\n| Cellular energetics | Altered metabolism (Warburg effect) |\n| Avoiding immune destruction | Evading immune system detection |\n\n## Dataset\n\n- **Source**: [bigbio/hallmarks_of_cancer](https://huggingface.co/datasets/bigbio/hallmarks_of_cancer)\n- **Train**: ~14K sentences\n- **Validation**: ~1.8K sentences\n- **Test**: ~1.8K sentences\n- **Labels**: Multi-label (0-10 hallmarks per sentence)\n\n## Reward Structure\n\n| Component | Weight | Description |\n|-----------|--------|-------------|\n| F1 Score | 40% | Precision + recall on predicted hallmarks |\n| Partial Match | 25% | Recall-based credit for each correct hallmark |\n| Valid Prediction | 20% | Reward for outputting valid hallmark names |\n| Thinking | 15% | Encourages biological reasoning |\n\n### Why F1 for Multi-Label?\n\nF1 score is ideal for multi-label classification because it:\n- Penalizes false positives (predicting hallmarks not present)\n- Penalizes false negatives (missing hallmarks that are present)\n- Balances precision and recall equally\n- Handles varying numbers of labels per example\n\n## Example\n\n**Input:**\n```\nAnalyze the following biomedical sentence and identify which hallmarks of cancer are present:\n\n\"The oncogene promotes cell proliferation through activation of the MAPK signaling pathway,\nwhile simultaneously inhibiting apoptosis via BCL-2 upregulation.\"\n\nList all applicable hallmarks of cancer.\n```\n\n**Expected Output:**\n```\n<think>\nThis sentence describes two distinct cancer-related processes:\n1. Cell proliferation through MAPK pathway - this relates to growth signaling\n2. Inhibition of apoptosis via BCL-2 - this relates to resisting cell death\n\nBoth are classic hallmarks of cancer mechanisms.\n</think>\n<answer>\nsustaining proliferative signaling\nresisting cell death\n</answer>\n```\n\n## Usage\n\n```python\nfrom OpenMed_HoC import load_environment\n\nenv = load_environment()\n```\n\n## Why Hallmarks of Cancer for Medical RL?\n\n1. **Foundational cancer biology**: The hallmarks framework is central to oncology\n2. **Multi-label complexity**: Tests understanding of multiple concurrent processes\n3. **Verifiable ground truth**: Expert-annotated labels allow precise evaluation\n4. **Clinical relevance**: Understanding hallmarks informs cancer treatment strategies\n\n## Citation\n\n```bibtex\n@article{baker2016automatic,\n  title={Automatic semantic classification of scientific literature according to the hallmarks of cancer},\n  author={Baker, Simon and Silins, Ilona and Guo, Yufan and Ali, Imran and H{\\\"o}gberg, Johan and Stenius, Ulla and Korhonen, Anna},\n  journal={Bioinformatics},\n  volume={32},\n  number={3},\n  pages={432--440},\n  year={2016},\n  publisher={Oxford University Press}\n}\n```\n\n## License\n\nDataset license follows bigbio/hallmarks_of_cancer terms.\n","encoding":"utf-8","truncated":false,"total_bytes":3952},"status":null}