{"data":{"kind":"file","path":"README.md","version_id":"r2ka09ia5df6j3gmno1yqihj","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1441,"modified_at":"2026-02-03T19:58:13.960000","content_hash":"4c5d4eb14b6cf81cc5842f9ac8ec750771100cc37e1062d436ec366dfd608f79"},"entries":[],"content":"# toy-math\n\n### Overview\n- **Environment ID**: `toy-math`\n- **Short description**: A simple single-turn arithmetic environment for testing and demonstration\n- **Tags**: single-turn, math, arithmetic, train, eval\n\n### Datasets\n- **Primary dataset(s)**: Built-in dataset of 5 basic arithmetic questions\n- **Source links**: Inline (hardcoded in `toy_math.py`)\n- **Split sizes**: 5 examples (train and eval share the same dataset)\n\n### Task\n- **Type**: single-turn\n- **Parser**: None (extracts last numeric value from response)\n- **Rubric overview**: Single reward function `correct_answer` that compares the extracted numeric answer against the ground truth\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nprime eval run toy-math -m gpt-4.1-nano\n```\n\nConfigure model and sampling:\n\n```bash\nprime eval run toy-math \\\n  -m gpt-4.1-mini \\\n  -n 5 -r 3 -t 512 -T 0.7\n```\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_examples` | int | `-1` | Limit on dataset size (-1 for all 5 examples) |\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Main scalar reward (1.0 if correct, 0.0 otherwise) |\n| `correct_answer` | Exact match on target numeric answer |\n\n### Example Questions\n\n| Question | Answer |\n| -------- | ------ |\n| What is 2 + 2? | 4 |\n| What is 7 * 8? | 56 |\n| What is 100 - 37? | 63 |\n| What is 9 + 14? | 23 |\n| What is 15 * 3? | 45 |\n","encoding":"utf-8","truncated":false,"total_bytes":1441},"status":null}