{"data":{"kind":"file","path":"README.md","version_id":"ykts0g97rlisv1nus5xncv6o","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1938,"modified_at":"2026-02-06T11:15:14.817000","content_hash":"20a9757060dd5f16a21891eb5f677be116444ca9a9dbe08ff5930ea5ad561a5b"},"entries":[],"content":"# math-python\n\n<a href=\"https://github.com/PrimeIntellect-ai/verifiers/tree/main/environments/math_python\">\n<img src=\"https://img.shields.io/badge/GitHub-181717?style=for-the-badge&logo=github&logoColor=white\" alt=\"Source Code\">\n</a>\n\n### Overview\n- **Environment ID**: `math-python`\n- **Short description**: Tool-using math environment requiring Python tool calls to compute answers (via `PythonEnv` + `prime` sandboxes); graded by symbolic equivalence.\n- **Tags**: math, tools, python, single-turn, boxed-answer\n\n### Datasets\n- **Primary dataset(s)**: Example `math` dataset via `load_example_dataset`\n- **Source links**: Uses example loader in `verifiers.utils.data_utils`\n- **Split sizes**: Configurable via args; defaults to `train` split and all examples\n\n### Task\n- **Type**: tool use (single-turn ToolEnv)\n- **Rubric overview**: Correctness by `math_verify.parse` + `verify`; logs auxiliary metrics (#turns, #tool calls, #errors)\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nprime eval run math-python\n```\n\nConfigure model and sampling:\n\n```bash\nprime eval run math-python \\\n  -m gpt-4.1-mini \\\n  -n 20 -r 3 -t 1024 -T 0.7 \\\n  -a '{\"dataset_name\": \"math\", \"dataset_split\": \"train\", \"num_train_examples\": -1}'\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n\n### Environment Arguments\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `dataset_name` | str | `\"math\"` | Example dataset to load |\n| `dataset_split` | str | `\"train\"` | Split to load |\n| `num_train_examples` | int | `-1` | Limit dataset size (`-1` for all) |\n\n### Metrics\n| Metric | Meaning |\n| ------ | ------- |\n| `correct_answer_reward_func` | 1.0 if symbolic verification passes, else 0.0 |\n| `num_turns` | Number of assistant messages in completion |\n| `num_tool_calls` | Number of tool messages in completion |\n| `num_errors` | Count of tool error messages |\n","encoding":"utf-8","truncated":false,"total_bytes":1938},"status":null}