{"data":{"kind":"file","path":"README.md","version_id":"ephyefrm7yg2k4wpo6rjfx95","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1593,"modified_at":"2026-02-15T13:29:32.330000","content_hash":"1d5789952e305f70a97480b8226145f1a94d50ffce586ceb6011ba78b3df4b80"},"entries":[],"content":"# OpenMed ClinVar\n\nClinical variant pathogenicity classification using ClinVar data.\n\n## Task\n\nGiven a genetic variant's context (gene, variant type, associated condition, mechanism of pathogenicity, review status), predict its clinical significance classification according to ACMG/AMP guidelines.\n\n**Format**: 5-choice MCQ (A-E) with `\\boxed{}` output\n\n## Dataset\n\n- **Source**: [tylergolato/clinvar-variant-impact](https://huggingface.co/datasets/tylergolato/clinvar-variant-impact)\n- **Size**: 1.3M+ variant records (subsampled to 50K train / 5K eval)\n- **Classes**: Pathogenic, Likely Pathogenic, Uncertain (VUS), Likely Benign, Benign\n\n## Reward Structure\n\n| Component | Weight | Description |\n|-----------|--------|-------------|\n| Accuracy | 0.55 | Exact match on classification letter |\n| Adjacent | 0.20 | Partial credit for adjacent classes on pathogenicity spectrum |\n| Thinking | 0.15 | Encourages genetic reasoning in `<think>` tags |\n| Format | 0.10 | Proper `\\boxed{}` with valid letter |\n\nThe adjacent reward gives 0.4 for one step away on the pathogenicity spectrum (e.g., predicting Likely Pathogenic when answer is Pathogenic) and 0.1 for two steps away.\n\n## Example\n\n**Input**: Gene: BRCA1, Variant: c.5266dupC, Variant type: single nucleotide variant, Associated condition: Hereditary breast and ovarian cancer syndrome, Mechanism: loss of function\n\n**Output**: `\\boxed{A}` (Pathogenic)\n\n## Usage\n\n```toml\n[[env]]\nid = \"maziyar/OpenMed_ClinVar\"\n```\n\n## Citation\n\nClinVar database: Landrum MJ, et al. ClinVar: improvements to accessing data. Nucleic Acids Research. 2020.\n","encoding":"utf-8","truncated":false,"total_bytes":1593},"status":null}