{"data":{"kind":"file","path":"README.md","version_id":"wn9zdoy3uz7v1fdegw116tqa","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2336,"modified_at":"2026-03-22T21:20:47.545000","content_hash":"efd111c175884c19b9c79e27465fa83b6c12f8e01d0ed13297dfb62176cf331b"},"entries":[],"content":"# Research Hypothesis Analysis\n\nSynthetic multi-turn Bayesian environment for testing:\n\n1. experiment selection under uncertainty\n2. belief revision under conflicting evidence\n\nThe agent sees a research question, three hypotheses, a prior, five candidate experiments, and a budget of three evidence turns. The environment hides the true hypothesis, likelihood tables, and exact posterior updates.\n\nTool actions:\n\n- `run_experiment(experiment_id)`\n- `report_belief(belief={\"H1\": ..., \"H2\": ..., \"H3\": ...}, stop=true|false)`\n\n## Overview\n\n- Environment ID: `research-hypothesis-analysis`\n- Type: multi-turn tool environment\n- Hypotheses: `3`\n- Candidate experiments: `5`\n- Evidence budget: `3`\n- Modes: `70%` active, `30%` passive\n\n## Frozen Data\n\nThe frozen dataset lives in `research_hypothesis_analysis/data/`:\n\n- `train.jsonl`: `4000`\n- `dev.jsonl`: `500`\n- `test.jsonl`: `500`\n\nIt is generated by `/Users/jarrodbarnes/ai-scientist-training/scripts/generate_dataset.py`.\n\n## Environment Args\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `split` | `str` | `\"train\"` | Dataset split to load |\n| `max_examples` | `int` | `-1` | Optional cap on loaded examples |\n| `seed` | `int` | `0` | Shuffle seed for dataset order |\n| `trajectory_dump_path` | `str \\| None` | `None` | Optional path to write finished trajectory JSONL rows |\n\n## Metrics\n\nThe environment logs separate reward components for hosted GRPO analysis:\n\n- `reward`\n- `mean_experiment_reward`\n- `mean_calibration_reward`\n- `final_map_bonus`\n- `extra_turn_penalty_total`\n- `invalid_belief_penalty_total`\n- `malformed_action_penalty_total`\n- `mean_regret`\n- `mean_brier`\n- `final_map_correct`\n\n## Quickstart\n\nInstall locally:\n\n```bash\ncd /Users/jarrodbarnes/ai-scientist-training\nprime env install research-hypothesis-analysis -p ./environments\n```\n\nRun a smoke eval:\n\n```bash\ncd /Users/jarrodbarnes/ai-scientist-training\nprime eval run research-hypothesis-analysis \\\n  --env-dir-path ./environments \\\n  --env-args '{\"split\":\"dev\",\"max_examples\":1}' \\\n  --provider prime \\\n  --model 'Qwen/Qwen3-30B-A3B-Instruct-2507' \\\n  --num-examples 1 \\\n  --rollouts-per-example 1 \\\n  --max-concurrent 1 \\\n  --sampling-args '{\"temperature\":0.3,\"max_tokens\":1024}' \\\n  --state-columns episode_summary,trajectory_log,posterior_trace \\\n  --save-results\n```\n","encoding":"utf-8","truncated":false,"total_bytes":2336},"status":null}