{"data":{"kind":"file","path":"README.md","version_id":"dm5v12a6u7j0acn9ksneelg1","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1465,"modified_at":"2026-03-03T23:24:34.271000","content_hash":"372b8e0e32b2c0402e27c0800a47209632c5629c8be0e6110bd814bc729f6c83"},"entries":[],"content":"# Math Reasoning (SingleTurnEnv + MathRubric)\n\nA `SingleTurnEnv` paired with `MathRubric` for GSM8K-style math reasoning.\n\n## What it does\n\nLoads the OpenAI GSM8K dataset (`openai/gsm8k`, `main` split) and maps it to `{question, answer}`.\nThe model must answer in `\\boxed{}` format. `vf.MathRubric` extracts and verifies the final answer.\n\nIf HF download is unavailable, it falls back to a small synthetic arithmetic dataset so the environment remains runnable.\n\n## Setup\n\n```bash\nuv sync\n```\n\n## Quick eval (no training)\n\n```bash\nprime eval run config.toml\n```\n\n## Training run\n\n```bash\nprime rl run config.toml\n```\n\nExpected metrics after training:\n| Step | Mean Reward | Notes |\n|------|-------------|-------|\n| 0    | ~0.90       | Strong base for small math |\n| 5    | ~0.97       | Near saturation |\n| 10+  | 1.00        | Use a harder dataset |\n\n## When to use this pattern\n\n- **Sanity checking** a new model or training setup\n- **Template** for any single-turn Q&A task\n- **Baseline** before introducing tools or multi-turn reasoning\n\n## Tips for harder math\n\n- Replace the generated dataset with **GSM8K**: `from datasets import load_dataset; ds = load_dataset(\"gsm8k\", \"main\", split=\"train\")`\n- For MATH competition problems: `load_dataset(\"lighteval/MATH\", split=\"train\")`\n- Increase `max_tokens` to 1024+ for chain-of-thought reasoning\n- Target difficulty: model solves 30–70% of problems before training starts; adjust problem difficulty accordingly\n","encoding":"utf-8","truncated":false,"total_bytes":1465},"status":null}