{"data":{"kind":"file","path":"README.md","version_id":"w6wj8ha7mpsnbhluikwqnact","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3498,"modified_at":"2026-01-31T08:16:12.652000","content_hash":"8fd4c76966cc4ac7642856c2e4f4362b81b8b016bdab5b306d6fd1948413777f"},"entries":[],"content":"# Geroprotector Ranking Environment\n\nTrain models to rank geroprotector compounds by translation potential using real preclinical data.\n\n## v0.7: Preference-Weighted Utility\n\n**Key insight**: There is no single \"best\" compound. Different stakeholders weight dimensions differently:\n- A **conservative clinician** prioritizes safety > feasibility > evidence\n- An **aggressive researcher** prioritizes evidence > mechanism > feasibility\n- A **drug developer** prioritizes feasibility > safety > evidence\n\nBy sampling preference regimes, we force the model to learn the underlying dimensions rather than memorize a fixed ranking.\n\n### How It Works\n\n1. Each episode samples a **preference regime** (unknown to the model)\n2. Compounds are scored on **5 dimensions**:\n   - Evidence strength (lifespan data quality)\n   - Human feasibility (approval status, dosing)\n   - Safety profile (risk level, inverse)\n   - Mechanistic plausibility (aging pathway overlap)\n   - Uncertainty (data missingness)\n3. Ground truth = `argmax(utility)` where utility depends on the regime\n4. The model sees dimension scores but not the preference weights\n\n### Example: Same Pair, Different Winners\n\n| Regime | Rapamycin | Metformin | Winner |\n|--------|-----------|-----------|--------|\n| Evidence-focused | 9.00 | 8.34 | Rapamycin |\n| Safety-first | 7.26 | 7.97 | Metformin |\n| Conservative | 7.67 | 8.13 | Metformin |\n\nThis prevents memorization: the model must understand *why* each dimension matters.\n\n## Quick Start\n\n```python\nfrom geroprotector_ranking import load_environment\n\n# Default: preference mode (v0.7)\nenv = load_environment(\n    mode=\"preference\",\n    split=\"train\",\n    num_examples=10000,\n)\n\n# Legacy modes still available\nenv = load_environment(mode=\"dense\")    # Fixed score ranking\nenv = load_environment(mode=\"active\")   # Information acquisition\nenv = load_environment(mode=\"v06\")      # Previous version\n```\n\n## Modes\n\n| Mode | Description | Use Case |\n|------|-------------|----------|\n| `preference` | Preference-weighted utility (v0.7) | Best for learning generalizable reasoning |\n| `dense` | Fixed score comparison | Baseline |\n| `active` | Reveal tools with cost | Information acquisition |\n| `listwise` | Rank K compounds | Full ranking |\n| `v06` | Previous anti-overfitting interventions | Comparison |\n\n## Named Regimes\n\n- **conservative**: High safety, moderate feasibility\n- **aggressive**: High evidence, high mechanism\n- **translational**: Balanced across dimensions\n- **academic**: High evidence focus\n- **regulatory**: Safety-first\n- **evidence_focused**: Maximum evidence weight\n- **safety_first**: Maximum safety weight\n\n## Data Sources\n\n- **DrugAge**: 870 compounds, lifespan effects\n- **NIA ITP**: Gold-standard validation\n- **Curated targets**: 40+ compounds with mechanism data\n- **Academic evidence**: Citation counts\n\n## Training\n\n```bash\nprime rl run tylergolato/geroprotector-ranking \\\n    --env-args '{\"mode\": \"preference\", \"split\": \"train\"}' \\\n    --model Qwen/Qwen3-4B-Instruct-2507\n```\n\n## Evaluation\n\n```bash\nprime eval run tylergolato/geroprotector-ranking \\\n    --env-args '{\"mode\": \"preference\", \"split\": \"test\"}'\n```\n\n## Metrics\n\n- **Accuracy**: Correct choice rate\n- **Parse success**: Response format compliance\n- **Per-regime accuracy**: Breakdown by preference type\n\n## Version History\n\n- **v0.7**: Preference-weighted utility (prevents memorization)\n- **v0.6**: Anti-shortcut interventions (swap invariance, delta regression)\n- **v0.5**: Multi-mode environment\n","encoding":"utf-8","truncated":false,"total_bytes":3498},"status":null}