{"data":{"kind":"file","path":"README.md","version_id":"rbij6k3wvobtxofkr7ywzc57","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2888,"modified_at":"2026-01-29T12:03:57.054000","content_hash":"845dcccfd15ed4a2ef8564a081629789cc5b157c641de0be251d8c9beff5606e"},"entries":[],"content":"# OpenMed DrugProt Environment\n\nDrug-Protein Relation Extraction environment for RL fine-tuning using the BioCreative VII Track 1 dataset.\n\n## Task Description\n\nGiven a biomedical text with marked chemical and gene/protein entities, identify the relation type between them:\n\n| Class | Label | Description |\n|-------|-------|-------------|\n| A | INDIRECT-DOWNREGULATOR | Chemical indirectly decreases protein activity/expression |\n| B | INDIRECT-UPREGULATOR | Chemical indirectly increases protein activity/expression |\n| C | DIRECT-REGULATOR | Chemical directly regulates protein (mechanism unspecified) |\n| D | ACTIVATOR | Chemical activates the protein |\n| E | INHIBITOR | Chemical inhibits the protein |\n| F | AGONIST | Chemical acts as an agonist of the receptor/protein |\n| G | AGONIST-ACTIVATOR | Chemical is both agonist and activator |\n| H | AGONIST-INHIBITOR | Chemical is agonist but inhibits downstream effects |\n| I | ANTAGONIST | Chemical acts as an antagonist of the receptor/protein |\n| J | PRODUCT-OF | Chemical is a product of the enzyme |\n| K | SUBSTRATE | Chemical is a substrate of the enzyme |\n| L | SUBSTRATE_PRODUCT-OF | Chemical is both substrate and product |\n| M | PART-OF | Chemical is part of the protein complex |\n\n## Dataset\n\n- **Source**: [OpenMed/drugprot-parquet](https://huggingface.co/datasets/OpenMed/drugprot-parquet)\n- **Original**: [BioCreative VII Track 1](https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-1/)\n- **Train**: ~4,000 examples\n- **Validation**: ~1,000 examples\n\n## Reward Structure\n\n| Component | Weight | Description |\n|-----------|--------|-------------|\n| Accuracy | 70% | Exact match on relation type (A-M) |\n| Reasoning | 20% | Quality of biomedical reasoning in `<think>` tags |\n| Format | 10% | Proper `\\boxed{}` answer format |\n\n## Example\n\n**Input:**\n```\nBiomedical Text:\nThe results suggest that aspirin inhibits COX-2 expression in endothelial cells.\n\nEntities to analyze:\n- Drug/Chemical: \"aspirin\"\n- Gene/Protein: \"COX-2\"\n```\n\n**Expected Output:**\n```\n<think>\nThe text describes aspirin and COX-2. The phrase \"inhibits COX-2 expression\" indicates\nthat aspirin reduces the activity or expression of COX-2. This is a direct inhibitory\nrelationship where the chemical (aspirin) acts to decrease the protein's function.\n</think>\n\\boxed{E}\n```\n\n## Usage\n\n```python\nfrom OpenMed_DrugProt import load_environment\n\nenv = load_environment()\n```\n\n## Citation\n\n```bibtex\n@article{miranda2021overview,\n  title={Overview of DrugProt BioCreative VII track: quality evaluation and large scale text mining of drug-gene/protein relations},\n  author={Miranda, Antonio and Mehryary, Farrokh and Luoma, Jouni and Pyysalo, Sampo and Valencia, Alfonso and Krallinger, Martin},\n  journal={Proceedings of the BioCreative VII challenge evaluation workshop},\n  year={2021}\n}\n```\n\n## License\n\nBioCreative VII Challenge License\n","encoding":"utf-8","truncated":false,"total_bytes":2888},"status":null}