{"data":{"kind":"file","path":"README.md","version_id":"kahk51kcqf06pi7c4le40cyt","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3229,"modified_at":"2026-03-05T12:21:59.568000","content_hash":"ca5af3528b71d7e6a87b9cb2ad9b2bed4dfbaaed83eba7634a1197513483e6e8"},"entries":[],"content":"# rubric_discovery\n\n### Overview\n\n- **Environment ID**: `rubric_discovery`\n- **Short description**: RLM environment that learns grading rubrics from scored `(input, response, score)` examples.\n- **Tags**: multi-turn,tool-use,rlm,meta-learning,rubric\n\n### Task\n\n- **Type**: multi-turn + tool use (`RLMEnv`)\n- **Goal**: synthesize `rubric_fn(input_text: str, response: str) -> float`\n- **Root tools**:\n  - `get_rubric_run_result(fn_code, examples, timeout_s=...)`\n  - `validate_rubric(fn_code)` (AST-only validation, no code execution)\n  - built-in RLM tools (`call_python_repl`, `llm_batch`)\n\n`fn_code` must be a non-empty source-code string containing `def rubric_fn(...)`.\nPassing function objects is rejected with a clear error to avoid brittle REPL source introspection.\n\n### Dataset Contract\n\nRows include:\n- `train_examples` (used in prompt only)\n- `test_examples` (used for reward only)\n- metadata fields (`source_env`, `category`, `rubric_type`, `task_hint`)\n\n### Reward\n\n- `generalization_reward` (0.50): agreement on held-out `test_examples`\n- `calibration_reward` (0.25): `1 - MAE`\n- `discrimination_reward` (0.15): score variance\n- `iteration_reward` (0.10): iterative REPL + tool usage\n\nFor RLM correctness, rubric extraction for scoring uses `state[\"final_answer\"]` first (with completion fallback only for compatibility).\n\n### Quickstart\n\nInstall from repository root:\n\n```bash\nuv pip install -e ./environments/rubric_discovery\n```\n\nRun one debug rollout:\n\n```bash\nuv run vf-eval --env rubric_discovery -d -v -n1 -r1\n```\n\nRun multiple saved rollouts:\n\n```bash\nuv run vf-eval --env rubric_discovery -n5 -r3 -s\n```\n\n### Environment Arguments\n\n`load_environment(config=None, **env_args)` accepts direct kwargs and optional `config` mapping.\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `max_turns` | int | 10 | Max RLM iterations |\n| `sub_llm_max_turns` | int | 5 | Max turns per sub-LLM call |\n| `rlm_model` | str | `\"gpt-4.1-mini\"` | Sub-LLM model for `llm_batch` |\n| `execution_backend` | `\"subprocess\" \\| \"sandbox\"` | `\"subprocess\"` | Backend for candidate rubric execution in rewards/tools |\n| `rlm_execution_backend` | `\"local\" \\| \"sandbox\"` | `\"local\"` | Backend for RLM REPL execution |\n| `eval_margin` | float | 0.3 | Agreement threshold `|pred-gold| <= margin` |\n| `eval_timeout_s` | int | 10 | Timeout for rubric evaluation |\n| `dataset_path` | str \\| None | None | Override JSONL dataset path |\n| `categories` | list[str] \\| None | None | Category filter |\n| `max_examples` | int \\| None | None | Max loaded examples |\n| `shuffle` | bool | False | Shuffle dataset |\n| `seed` | int \\| None | None | Shuffle seed |\n\nUse direct args with `vf-eval`:\n\n```bash\nuv run vf-eval --env rubric_discovery -a '{\"max_turns\": 8, \"execution_backend\": \"subprocess\"}'\n```\n\n### Dataset Generation\n\nGenerate a starter dataset:\n\n```bash\ncd environments/rubric_discovery\nuv run python -m scripts.generate_dataset \\\n  --config-path scripts/source_envs_small.yaml \\\n  --target-size 4 \\\n  --responses-per-example 2 \\\n  --model-name gpt-4.1-mini\n```\n\n### Development Checks\n\n```bash\nuv run ruff check ./environments/rubric_discovery\nuv run ruff format --check ./environments/rubric_discovery\n```\n","encoding":"utf-8","truncated":false,"total_bytes":3229},"status":null}