{"data":{"kind":"file","path":"README.md","version_id":"uvqcyn4ilxe8d4npgeuqrhzs","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":4074,"modified_at":"2026-05-11T14:27:45.481000","content_hash":"3abe220b5c6d542a2114fde73a1d44af2d642d71624f6e4adc78406877a9c0f1"},"entries":[],"content":"# science-sim\n\n`science-sim` is the reusable Scientific Decision Environment framework layer for Prime/Verifiers RL work.\n\nIt is not the materials, biology, SciML, HPC, or lab-ops benchmark itself. Those should be separate environments that implement this framework contract through their own domain adapters.\n\nFor the high-level RL objective, reward design, constraint handling, and anti-hacking rules, see [RL_DESIGN.md](./RL_DESIGN.md).\n\n## Overview\n\n- **Environment ID**: `science-sim`\n- **Type**: multi-turn StatefulToolEnv\n- **Purpose**: define and smoke-test the shared scientific tool-use loop\n- **Current adapter**: `framework_smoke`, a neutral synthetic adapter used only for install/eval validation\n\n## Framework Boundary\n\nScience Sim separates three concerns:\n\n```text\nScientific Reality\n  -> hidden simulator, task truth, validity constraints\n\nScientific Interface\n  -> shared tools, observations, experiment reports, final decision schema\n\nScientific Curriculum\n  -> difficulty, noise, partial observability, budget pressure, horizon length\n```\n\nField-specific environments should keep those boundaries intact.\n\nExamples of separate downstream environments:\n\n- `science-sim-sciml`\n- `science-sim-bio`\n- `science-sim-materials`\n- `science-sim-hpc`\n- `science-sim-labops`\n\n## Shared Tool Contract\n\nEvery downstream Scientific Decision Environment should expose this high-level loop:\n\n```text\ninspect_problem()\n  -> observe objective, constraints, budget, observables\n\nsearch_candidates()\n  -> find hypotheses, interventions, designs, policies, or experiments\n\nrank_candidates()\n  -> return a task-specific shortlist for measurement planning\n\nevaluate_candidate()\n  -> run a cheap measurement or simulator pass\n\nrun_experiment()\n  -> spend more budget to reduce uncertainty\n\ncompare_candidates()\n  -> compare measured candidates under the objective\n\nsubmit_decision()\n  -> submit the final structured scientific decision\n```\n\nThe current smoke environment exposes:\n\n- `inspect_problem(problem_id)`\n- `search_candidates(max_cost=10000, min_prior_score=0.0, require_feasible_hint=True)`\n- `rank_candidates(top_k=4)`\n- `evaluate_candidate(candidate_id, experiment_type=\"cheap_measurement\")`\n- `run_experiment(candidate_id, experiment_type=\"full_experiment\")`\n- `compare_candidates(candidate_ids)`\n- `submit_decision(candidate_id, decision, expected_utility, rationale)`\n- `inspect_framework_contract()`\n\n## Dataset\n\nThe bundled dataset is deterministic and synthetic. It exists to verify that the framework loads, supports tool-use rollouts, and produces non-binary reward signal.\n\nDefault split sizes:\n\n| Split | Count |\n| --- | ---: |\n| train | 48 |\n| eval | 16 |\n\n## Reward\n\nThe rubric combines:\n\n- exact final decision accuracy\n- structured field accuracy\n- regret against hidden utility ranking\n- required tool usage\n- workflow efficiency\n- evidence that the selected candidate was measured\n\nThis keeps the framework trainable while preserving continuous scientific decision signal.\n\n## Quickstart\n\nInstall locally:\n\n```bash\nprime --plain env install science-sim\n```\n\nRun a small evaluation:\n\n```bash\nprime --plain eval run science-sim -m qwen3-30b-i -n 5 -r 1\n```\n\nRun a harder smoke evaluation:\n\n```bash\nprime --plain eval run science-sim -m qwen3-30b-i -n 10 -r 3 -a '{\"curriculum\":\"hard\"}'\n```\n\n## Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | --- | --- | --- |\n| `train_size` | int | `48` | Number of synthetic train cases |\n| `eval_size` | int | `16` | Number of synthetic eval cases |\n| `seed` | int | `37` | Deterministic data seed |\n| `curriculum` | str | `\"mixed\"` | One of `easy`, `mixed`, `hard` |\n| `adapter_name` | str | `\"framework_smoke\"` | Adapter used by this framework smoke env |\n| `max_turns` | int | `10` | Maximum tool-use turns per rollout |\n\n## Downstream Adapter Rule\n\nMaterials, biology, SciML, HPC, and lab-ops should be separate Prime environments. Each one should implement its own hidden scientific reality and adapter-specific candidate/experiment logic while preserving the shared Science Sim interface.\n","encoding":"utf-8","truncated":false,"total_bytes":4074},"status":null}