{"data":{"kind":"file","path":"README.md","version_id":"h4mzg94a2fa1mv1zht4u5t6x","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":7272,"modified_at":"2026-02-01T14:32:43.279000","content_hash":"6ffeee8d46d790bf8f5c0e9303d06194d9637da9940a0b9ade718fa13aa0cf6c"},"entries":[],"content":"# Active Perturbation Selection (LINCS L1000 Analogue)\n\nA multi-turn POMDP for budgeted information acquisition: the agent selects sequential \"assays\" that reveal noisy, partial observations of a hidden biological mechanism class, then decides when to stop and guess.\n\n**v0.4**: Major redesign to fix rapid saturation. No candidates, dense IG reward, reduced information per measurement.\n\n## What This Trains\n\n- **Active experimental design** — adaptive measurement selection under cost\n- **Optimal stopping under uncertainty** — decide when to commit vs gather more evidence  \n- **Information-theoretic reasoning** — maximize information gain per measurement\n- **Belief tracking** — maintain uncertainty estimates without hints\n- **Format-following behavior** — verifiable XML action parsing\n\n## Biological Framing\n\nInspired by LINCS L1000 perturbational profiling:\n\n| Environment | Biology Analogue |\n|-------------|------------------|\n| Hidden class | Cell-state / MoA cluster |\n| Actions | Choosing perturbation/assay panels |\n| Observations | Partial gene-expression readouts + noise |\n\n**Goal**: Identify the latent mechanism using as few experiments as possible.\n\nThis release uses **real LINCS L1000 MoA embeddings** (515 mechanism classes from 214k signatures). Supports episode-randomized cost/budget to train non-trivial stopping policies.\n\n## Task\n\nAt each step, output one XML action:\n\n```xml\n<action>measure</action>\n<id>MASK_ID</id>\n```\nor\n```xml\n<action>guess</action>\n<id>CLASS_ID</id>\n```\n\n**Budget**: At most `budget_T` measurements per episode. Forced random guess if exhausted.\n\n## Reward\n\n| Component | Value |\n|-----------|-------|\n| Correct guess | +1.0 |\n| Incorrect guess | -1.0 |\n| Per measurement | -`step_cost` |\n| Entropy reduction | +`entropy_bonus` × ΔH |\n\nInvalid format → penalty + termination.\n\n## Key Parameters (v0.4)\n\n| Param | Default | Description |\n|-------|---------|-------------|\n| `top_k_classes` | 150 | Number of MoA classes (v0.4: increased from 50) |\n| `budget_T` | 6 | Max measurements |\n| `sigma` | 1.5 | Observation noise (v0.4: increased from 1.0) |\n| `centroid_sigma` | 0.8 | Within-class variability (v0.4: increased from 0.5) |\n| `dims_per_mask` | 2 | Dimensions revealed per measurement (v0.4: reduced from 8) |\n| `step_cost` | 0.05 | Cost per measurement |\n\n### Reward Structure (v0.4)\n\nDense information gain + terminal correctness:\n\n| Param | Default | Description |\n|-------|---------|-------------|\n| `ig_weight` | 1.0 | Weight for normalized IG per step |\n| `cost_weight` | 0.05 | Weight for step cost |\n| `terminal_correct` | 1.0 | Reward for correct final guess |\n| `terminal_incorrect` | 0.0 | Reward for incorrect guess |\n| `min_measures` | 0 | Minimum measurements before guessing allowed |\n\n**Step reward**: `ig_weight * (ΔH / log K) - cost_weight * step_cost`\n**Total reward**: `sum(step_rewards) + terminal_reward`\n\n### Episode-Randomized Parameters\n\n| Param | Example | Description |\n|-------|---------|-------------|\n| `budget_T_dist` | `[4, 5, 6, 7]` | Sample budget per episode |\n| `step_cost_dist` | `[0.03, 0.05, 0.08]` | Sample cost per episode |\n\n## Quickstart\n\n```bash\n# Evaluation (uses real LINCS data by default)\nprime eval run tylergolato/lincs-active-probing -m openai/gpt-4o-mini\n\n# Training with episode-randomized cost/budget\nprime rl run configs/lab/lincs-active-probing.toml\n```\n\n### Recommended Training Config (v0.4.0)\n\n```toml\n[[env]]\nid = \"tylergolato/lincs-active-probing\"\n\n[env.args]\nuse_lincs = true\ntop_k_classes = 150\ndims_per_mask = 2\nsigma = 1.5\ncentroid_sigma = 0.8\nbudget_T_dist = [4, 5, 6, 7]\nstep_cost_dist = [0.03, 0.05, 0.08]\n# v0.4 reward settings\nig_weight = 1.0\ncost_weight = 0.05\nterminal_correct = 1.0\nterminal_incorrect = 0.0\nmin_measures = 2\nnum_examples = 2000\n```\n\n## LINCS Data\n\nThe environment includes **real L1000 signature data** with **515 MoA classes** derived from 214,020 signatures (978 landmark genes).\n\n```bash\n# Use real LINCS MoA data (default)\nprime eval run tylergolato/lincs-active-probing -m openai/gpt-4o-mini \\\n    --env-args '{\"use_lincs\": true, \"top_k_classes\": 30}'\n```\n\n### Data Summary\n\n| File | Shape | Description |\n|------|-------|-------------|\n| `signature_vectors.npy` | (214020, 978) | Per-signature gene expression vectors |\n| `signature_meta.parquet` | 214020 rows | Metadata: sig_id, cell_line, dose, time, moa |\n| `embeddings.npy` | (214020, 32) | PCA-reduced signatures |\n| `class_means.npy` | (515, 32) | MoA centroids (used by environment) |\n| `class_names.json` | 515 | All MoA class names |\n\n### Top MoA Classes\n\n| Rank | Mechanism of Action | Signatures |\n|------|---------------------|------------|\n| 1 | HDAC inhibitor | 10,445 |\n| 2 | PDGFR inhibitor | 6,397 |\n| 3 | Dopamine receptor antagonist | 6,097 |\n| 4 | FLT3 inhibitor | 6,029 |\n| 5 | VEGFR inhibitor | 6,009 |\n| ... | **515 total classes** | |\n\n### Data Pipeline\n\n```bash\n# Full pipeline: downloads GEO metadata, generates signature vectors\npython data/download_l1000_signatures.py\n```\n\n**Data sources**:\n- [GEO GSE92742](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE92742) - L1000 Phase 1 signature metadata\n- [LINCS Data Portal](https://clue.io) - Compound MoA annotations\n\nThe packaged environment includes centroids and embeddings. Full signature vectors (798MB) can be regenerated locally.\n\n---\n\n## Design Notes (for developers)\n\n<details>\n<summary>Click to expand</summary>\n\n### POMDP Structure\n- **Hidden state**: True class z ∈ {0..K-1}, fixed per episode\n- **Belief state**: Bayesian posterior over classes (updated internally)\n- **Observations**: Masked embedding dims + Gaussian noise\n- **Episode variability**: Budget and cost sampled per episode (v0.2.0+)\n\n### Preventing Reward Ceiling / Policy Collapse\n\nWith fixed budget and cost, the optimal policy often becomes \"measure exactly N times, then guess\" — a fixed-length test. Once the agent finds this, variance collapses and learning stalls.\n\n**v0.2-v0.3 attempts (insufficient):**\n- Episode-randomized budget/cost\n- Within-class variability\n- Candidate hints with controlled recall\n\n**v0.4 solution:**\nThe fundamental problem was information leakage through candidates and too much information per measurement.\n\n1. **Remove candidates entirely** — No hints, agent must track belief internally\n2. **Reduce information per measurement** — `dims_per_mask=2` (was 8)\n3. **Dense IG reward** — Step reward = normalized_IG - cost, not just terminal\n4. **More classes + more noise** — `K=150`, `sigma=1.5`, `centroid_sigma=0.8`\n\n### Key Diagnostic Metrics (v0.4)\n\n| Metric | Target | Meaning |\n|--------|--------|---------|\n| `top1_prob_after_2_metric` | 0.2-0.5 | Task difficulty (if ~1.0, too easy) |\n| `entropy_after_2_metric` | >2.0 | Uncertainty after 2 measures |\n| `total_ig_metric` | varies | Total information gained |\n| `num_measures_metric` | varies | Should NOT be constant |\n\n**Success criteria:**\n- `top1_prob_after_2` is NOT near 1.0\n- `num_measures` distribution has variance (policy adapts)\n- Reward improves alongside IG improvement, not just accuracy\n\n### Real-World Capability\nTrains budgeted information acquisition + stopping policies under partial observability. Maps to active learning and experimental design workflows.\n\n</details>\n","encoding":"utf-8","truncated":false,"total_bytes":7272},"status":null}