{"data":{"kind":"file","path":"README.md","version_id":"z4hrqwrg1xdgbcfyalcp8yq2","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2773,"modified_at":"2025-12-10T14:14:06.529000","content_hash":"341e862dade227ced676f71d27173700cd44d5fc7262ebdd08f501b94c519b8d"},"entries":[],"content":"# MathOlympiadBench\n\n### Overview\n- **Environment ID**: `MathOlympiadBench`\n- **Short description**: Theorem proving environment for Mathematical Olympiad problems using Lean 4 verification\n- **Tags**: theorem-proving, lean4, math-olympiad, mathematics, eval\n\n### Datasets\n- **Primary dataset(s)**: MathOlympiadBench - 360 human-verified mathematical competition problem formalizations\n- **Source links**: [https://huggingface.co/datasets/Goedel-LM/MathOlympiadBench](https://huggingface.co/datasets/Goedel-LM/MathOlympiadBench)\n- **Split sizes**: Configurable filtering by mathematical domain, problem count, solved status\n\n### Task\n- **Type**: single-turn\n- **Parser**: Custom Lean 4 code block extractor (```lean4 or ```lean)\n- **Rubric overview**: Binary verification (1.0/0.0) via kimina-lean-server compilation\n\n### Prerequisites\n- **Docker**: Required for kimina-lean-server (default) OR\n- **Local Setup**: kimina-lean-server installed locally (set `use_docker=false`)\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval MathOlympiadBench\n```\n\nConfigure model, sampling, and filtering:\n\n```bash\nuv run vf-eval MathOlympiadBench \\\n  -m gpt-4o-mini \\\n  -n 20 -r 1 -t 4096 -T 0.7 \\\n  -a '{\"max_examples\": 10, \"tags\": [\"number theory\"], \"solved_only\": false}'\n```\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `cache_path` | str | `~/.cache/verifiers/environments/matholympiadbench` | Local cache for dataset |\n| `tags` | list | `None` | Filter by mathematical domain tags, e.g., `[\"number theory\", \"algebra\"]` |\n| `max_examples` | int | `-1` | Limit dataset size (-1 for all) |\n| `solved_only` | bool | `None` | Filter to only solved/unsolved problems (None for all) |\n| `server_port` | int | `8000` | Port for kimina-lean-server |\n| `server_memory` | str | `\"4g\"` | Memory limit for kimina-lean-server |\n| `use_docker` | bool | `True` | Use Docker for kimina-lean-server (vs local install) |\n\n### Available Filters\n**Mathematical Domains**:\n- `number theory`\n- `algebra` \n- `geometry`\n- `analysis`\n- `combinatorics`\n- And other mathematical competition categories\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | 1.0 if proof compiles successfully, 0.0 otherwise |\n| `verification_success` | Boolean compilation success |\n| `parse_success` | Boolean code block extraction success |\n| `error` | Compilation error message if failed |\n| `compilation_output` | Full compiler output for debugging |\n\n### Setup Notes\n- First run automatically downloads MathOlympiadBench dataset from Hugging Face\n- kimina-lean-server starts automatically via Docker\n- Requires internet connection for initial setup\n- Each problem includes formal statement and Lean 4 theorem code","encoding":"utf-8","truncated":false,"total_bytes":2773},"status":null}