{"data":{"kind":"file","path":"README.md","version_id":"bmi665ib4jcl82chdiofu4ex","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":4719,"modified_at":"2026-05-07T02:10:31.838000","content_hash":"0961af0b15a2baea75c29354d68ebfbf225faccb75132445a655c3eff4809454"},"entries":[],"content":"# GenomeGym-BRCA1\n\nPrime Verifiers environment for safe, research-only BRCA1 variant-effect\nprediction inspired by Evo 2.\n\nEach episode gives an agent one public BRCA1 single-nucleotide variant and asks\nfor calibrated `<p_lof>` output: the probability that the variant is\nloss-of-function. The agent may call read-only, variant-id-scoped tools:\n\n- `score_variant_with_evo2(variant_id)`\n- `get_variant_metadata(variant_id)`\n\nThe main reward is deterministic: calibrated LOF probability against the hidden\n`LOF` vs `FUNC_INT` label.\n\n## Safety\n\nResearch-only. No medical advice. No biological design. No wet-lab protocols.\nTools are read-only and scoped to the current rollout's `variant_id`.\n\n## Install\n\nFrom this directory:\n\n```bash\nuv venv\nuv pip install -e . -e ../verifiers\n```\n\nFor one-off local runs without activating a venv, this is the workflow used by\nthe smoke tests:\n\n```bash\nuv run --with-editable . --with-editable ../verifiers python -c \"import genomegym_brca1; print(genomegym_brca1.__version__)\"\n```\n\n## Smoke Load\n\nThe default mode uses the packaged Findlay et al. BRCA1 records and the\nModal-generated Evo 2 score cache. A separate `fixture` mode uses a tiny\nsynthetic dataset and synthetic Evo2-like scores for tests and smoke evals.\n\n```bash\nuv run --with-editable . --with-editable ../verifiers python - <<'PY'\nfrom genomegym_brca1 import load_environment\n\nenv = load_environment(data_source=\"fixture\")\nprint(len(env.get_dataset()), len(env.get_eval_dataset()))\nprint(env.tool_defs[0].name)\nPY\n```\n\n## Evaluate\n\nRun deterministic full-data baselines:\n\n```bash\nuv run --with-editable . --with-editable ../verifiers python scripts/run_baselines.py\n```\n\nRun the unit tests:\n\n```bash\nuv run --with-editable . --with-editable ../verifiers --with pytest --with pytest-asyncio python -m pytest -q\n```\n\nOnce the package is installed, a small local API-model eval can run against the\nfull packaged environment:\n\n```bash\nprime eval run genomegym-brca1 -m openai/gpt-4.1-mini -n 5 -r 3\n```\n\nUse OpenRouter credits for local evaluation by exporting an OpenRouter key and\nselecting the OpenRouter provider:\n\n```bash\nexport OPENROUTER_API_KEY=...\n\nprime eval run genomegym-brca1 \\\n  --provider openrouter \\\n  -m openai/gpt-4.1-mini \\\n  -n 5 -r 1 \\\n  --skip-upload\n```\n\nOmit `--skip-upload` if you want Prime to store the local evaluation results.\n\n## Prime Environments Hub\n\nThe package is self-contained for Hub use. It includes the full public BRCA1\nvariant records and the cached Evo 2 scores, so hosted evals do not need the\nlocal `../evo2` checkout.\n\nPublish privately first:\n\n```bash\nprime env push --path . --visibility PRIVATE\n```\n\nAfter publishing, install and inspect it:\n\n```bash\nprime env info alfaxad/genomegym-brca1\nprime env install alfaxad/genomegym-brca1@latest\n```\n\nRun a hosted evaluation with an external OpenAI-compatible endpoint such as\nOpenRouter:\n\n```bash\nprime eval run alfaxad/genomegym-brca1 \\\n  --hosted \\\n  --api-base-url https://openrouter.ai/api/v1 \\\n  --api-key-var OPENROUTER_API_KEY \\\n  --custom-secrets '{\"OPENROUTER_API_KEY\":\"...\"}' \\\n  -m openai/gpt-4.1-mini \\\n  -n 5 -r 1 \\\n  --follow\n```\n\nHosted RL training uses Prime Hosted Training models and Prime billing, not\nOpenRouter billing:\n\n```toml\nmodel = \"Qwen/Qwen3.5-0.8B\"\nmax_steps = 50\nbatch_size = 128\nrollouts_per_example = 8\n\n[sampling]\nmax_tokens = 512\n\n[[env]]\nid = \"alfaxad/genomegym-brca1\"\nargs = { data_source = \"full\" }\n```\n\nLaunch that config with:\n\n```bash\nprime train configs/lab/genomegym-brca1-smoke.toml\n```\n\n## Full BRCA1 Data\n\nThe implementation can read the local Evo 2 notebook copy of the Findlay et al.\nsupplementary Excel file:\n\n```text\n../evo2/notebooks/brca1/41586_2018_461_MOESM3_ESM.xlsx\n```\n\nThe packaged environment includes:\n\n```text\ndata/variants/brca1_findlay_variants.jsonl\ndata/cache/brca1_evo2_7b_8k_scores.jsonl\n```\n\nTo regenerate the cached score file:\n\n```bash\npython scripts/precompute_evo2_scores.py \\\n  --model evo2_7b_base \\\n  --window-size 8192 \\\n  --out data/cache/brca1_evo2_7b_8k_scores.jsonl\n\nprime eval run genomegym-brca1 -m openai/gpt-4.1-mini -n 50 -r 3\n```\n\nDo not report fixture scores as Evo 2 results.\n\n## Modal Evo 2 Scoring\n\nIf local CUDA hardware is unavailable, use the Modal runner. Start with a small\nsmoke run:\n\n```bash\nmodal run scripts/modal_precompute_evo2_scores.py --limit 2 --out /tmp/brca1_modal_smoke.jsonl\n```\n\nThen generate the full cache:\n\n```bash\nmodal run scripts/modal_precompute_evo2_scores.py \\\n  --model-name evo2_7b_base \\\n  --window-size 8192 \\\n  --batch-size 1 \\\n  --out data/cache/brca1_evo2_7b_8k_scores.jsonl\n```\n\nThe Modal job uses a persistent volume named\n`genomegym-brca1-evo2-cache` for Hugging Face/model cache reuse across runs.\n","encoding":"utf-8","truncated":false,"total_bytes":4719},"status":null}