{"data":{"kind":"file","path":"README.md","version_id":"zdked0sn3lp7j4o3rxx2a4cj","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1725,"modified_at":"2026-02-06T11:15:16.036000","content_hash":"74ea228d3b00e7968054382555014e9f393b5dd82920e74692da9ccbe3c666a9"},"entries":[],"content":"# doublecheck\n\n<a href=\"https://github.com/PrimeIntellect-ai/verifiers/tree/main/environments/doublecheck\">\n<img src=\"https://img.shields.io/badge/GitHub-181717?style=for-the-badge&logo=github&logoColor=white\" alt=\"Source Code\">\n</a>\n\n### Overview\n- **Environment ID**: `doublecheck`\n- **Short description**: Two-turn math QA that asks the model to answer, then prompts “Are you sure?”; scored with a math rubric.\n- **Tags**: math, multi-turn, xml, think-answer, verification\n\n### Datasets\n- **Primary dataset(s)**: `math` (example dataset loaded via `load_example_dataset`)\n- **Source links**: Uses the example loader in `verifiers.utils.data_utils`\n- **Split sizes**: Configurable via args; defaults to `train` split and all examples\n\n### Task\n- **Type**: multi-turn\n- **Rubric overview**: `MathRubric` combining exact/equivalence math grading and a small format component\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nprime eval run doublecheck\n```\n\nConfigure model and sampling:\n\n```bash\nprime eval run doublecheck \\\n  -m gpt-4.1-mini \\\n  -n 20 -r 3 -t 1024 -T 0.7 \\\n  -a '{\"dataset_name\": \"math\", \"dataset_split\": \"train\", \"num_train_examples\": -1}'\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n\n### Environment Arguments\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `dataset_name` | str | `\"math\"` | Example dataset name for math problems |\n| `dataset_split` | str | `\"train\"` | Dataset split to load |\n| `num_train_examples` | int | `-1` | Limit on dataset size (`-1` for all) |\n\n### Metrics\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Math answer correctness (symbolic/numeric equivalence) | \n","encoding":"utf-8","truncated":false,"total_bytes":1725},"status":null}