{"data":{"kind":"file","path":"README.md","version_id":"ogwtx65wzpti8qq7q3omc0vp","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":4174,"modified_at":"2026-05-18T11:52:45.635000","content_hash":"ac67b0d56c8452e3bbefab1821c7686c04bb7850d242a42a809c4a3b5b7503e5"},"entries":[],"content":"# CurveBench-Hard Environment\n\nA [Prime Intellect Environments Hub](https://app.primeintellect.ai/dashboard/environments/amirmohseni/curvebench-hard-env) environment for evaluating vision-language models on exact topological reasoning over nested Jordan curves — hard split.\n\n- **Paper:** [CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves](https://arxiv.org/abs/2605.14068)\n- **Dataset:** [AmirMohseni/CurveBench](https://huggingface.co/datasets/AmirMohseni/CurveBench)\n- **Collection:** [AmirMohseni/curvebench](https://huggingface.co/collections/AmirMohseni/curvebench)\n- **GitHub:** [Amir-Mohseni/CurveBench](https://github.com/Amir-Mohseni/CurveBench)\n\n---\n\n## Task Description\n\nGiven an image of pairwise non-intersecting Jordan curves, the model must recover the full rooted containment tree — where each node is a bounded region and each edge connects a region to the one directly containing it.\n\n- The root node (0) represents the unbounded outer region\n- Each curve-enclosed region is assigned a unique node number\n- Edges represent containment: parent region directly contains child region\n\n---\n\n## Dataset\n\nThis environment uses the [CurveBench](https://huggingface.co/datasets/AmirMohseni/CurveBench) hard split, with four visually distinct categories:\n\n| Split           | Examples | Description |\n|-----------------|----------|-------------|\n| `polygon`       | 199      | Piecewise-linear polygon boundaries |\n| `topographical` | 100      | Topographic-map-inspired contour arrangements |\n| `maze`          | 100      | Labyrinthine, deeply nested curves |\n| `counting`      | 57       | High-density curve configurations |\n| `combined`      | 456      | All categories merged |\n\n---\n\n## Expected Response Format\n\nModels should respond inside `<answer>...</answer>` tags. The first line is the number of nodes (excluding the root), followed by one edge per line as `u v` (child, parent):\n\n```\n<answer>\n3\n1 0\n2 0\n3 1\n</answer>\n```\n\n---\n\n## Scoring\n\n| Reward              | Weight | Description |\n|---------------------|--------|-------------|\n| `tree_reward`       | 0.7    | 1.0 if the predicted tree is isomorphic to the ground truth (up to node relabelling) |\n| `node_count_reward` | 0.3    | 1.0 if the predicted node count matches the ground truth |\n\n---\n\n## Running Evaluations\n\n### Install the environment\n\n```bash\nprime env install amirmohseni/curvebench-hard-env\n```\n\n### Evaluate a model\n\nRun against the full combined split with 4 rollouts per example:\n\n```bash\nprime eval run amirmohseni/curvebench-hard-env \\\n  -m \"google/gemma-3-27b-it\" \\\n  -n -1 \\\n  -a '{\"split\": \"combined\"}' \\\n  -r 4\n```\n\n### Evaluate a specific category\n\n```bash\n# Maze only (most spatially demanding)\nprime eval run amirmohseni/curvebench-hard-env \\\n  -m \"google/gemma-3-27b-it\" \\\n  -n -1 \\\n  -a '{\"split\": \"maze\"}' \\\n  -r 4\n```\n\n### Evaluate a custom endpoint\n\n```bash\nprime eval run amirmohseni/curvebench-hard-env \\\n  -m \"your-model-name\" \\\n  -b \"https://your-endpoint.example.com/v1\" \\\n  -k \"YOUR_API_KEY_ENV_VAR\" \\\n  -n -1 \\\n  -a '{\"split\": \"combined\"}' \\\n  -r 4\n```\n\n### CLI flags\n\n| Flag | Description |\n|------|-------------|\n| `-m` | Model name |\n| `-b` | API base URL (for custom endpoints) |\n| `-k` | Environment variable name containing the API key |\n| `-n` | Number of examples (`-1` for all) |\n| `-r` | Number of rollouts per example |\n| `-a` | JSON string of arguments passed to `load_environment()` (e.g. split) |\n\n---\n\n## Using as a Python Library\n\n```python\nfrom curvebench_hard_env import load_environment\n\nenv = load_environment()              # defaults to \"combined\" split\nenv = load_environment(split=\"maze\")  # use a specific category\n```\n\n---\n\n## Citation\n\nIf you use this environment in your research, please cite:\n\n```bibtex\n@misc{mohseni2026curvebench,\n      title={CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves},\n      author={Amirreza Mohseni and Mona Mohammadi and Morteza Saghafian and Naser Talebizadeh Sardari},\n      year={2026},\n      eprint={2605.14068},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https://arxiv.org/abs/2605.14068},\n}\n```\n","encoding":"utf-8","truncated":false,"total_bytes":4174},"status":null}