{"data":{"kind":"file","path":"README.md","version_id":"fyc9nrv32o23uomvvfclern9","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1799,"modified_at":"2025-09-10T16:13:52.649000","content_hash":"54ad22d263580f7dcacb473d26e27417655859b8d6b5301cd7f7418d3e3269c7"},"entries":[],"content":"# sensible-thinker\n\nThis addon environment helps improve senses of reasoning by asking an auxiliary model to understand generated thinking process from new policy model.\nThe auxiliary model size should be smaller than policy model. We think it will mimic teacher-student process.\n\n### How it works\n\n1. The **policy model** generates an answer along with a hidden chain-of-thought inside `<think>...</think>` tags.\n2. The **sensible-thinker environment** extracts this chain-of-thought and sends it to a smaller **judge model**.\n3. The judge model analyzes the reasoning and predicts a boxed final answer.\n4. This process is repeated several times (`n_rolls`) to reduce randomness.\n5. The predicted answers are compared against the gold answer, and the **reward** is the fraction of correct matches.\n6. This reward is added to the base environment’s rubric with a fixed weight (default `0.5`).\n\nIn short: the addon encourages models not only to output the right answer but also to produce reasoning that is **interpretable and verifiable by another model**.\n\n\n### Overview\n- **Environment ID**: `sensible-thinker`\n- **Short description**: \"Use LLM to Judge on thinking process\"\n- **Tags**: enhance, addon, sensible\n\n### Datasets\n- **Primary dataset(s)**: built on any other environments, currently only support `gsm8k`\n\n### Quickstart\nInstall base environment & addon environment\n\n```bash\nuv run vf-install gsm8k\nuv run vf-install sensible-thinker\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval sensible-thinker -m gpt-4.1   -n 20 -r 3 -t 1024 -T 0.7 \\\n-a '{\"base_env\": \"gsm8k\", \"judge_model_name\": \"gpt-4o-mini\", \"judge_base_url\": \"https://api.openai.com/v1\", \"judge_api_key\": \"...\"}'\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n\n","encoding":"utf-8","truncated":false,"total_bytes":1799},"status":null}