{"data":{"kind":"file","path":"README.md","version_id":"kfvlouho1eth0m0frt1yma3t","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1067,"modified_at":"2025-09-12T21:45:06.077000","content_hash":"d18e37ebea02f48e45bfcc1cab5ee241a8fce911d5c353aee2bb3f44ef500fef"},"entries":[],"content":"# gridgame\r\n\r\n### Overview\r\n- **Environment ID**: `gridgame`\r\n- **Short description**: Tests the models' ability to discern grid locations of objects.\r\n- **Tags**: multimodal,train\r\n\r\n### Datasets\r\n- **Primary dataset(s)**: gridgame\r\n- **Source links**: https://huggingface.co/datasets/camelCase12/gridgame\r\n- **Split sizes**: 100\r\n\r\n### Task\r\n- **Type**: single-turn\r\n- **Parser**: extract_boxed_answer\r\n- **Rubric overview**: Rewarded by manhattan distance from correct grid location.\r\n\r\n### Quickstart\r\nRun an evaluation with default settings:\r\n\r\n```bash\r\nuv run vf-eval gridgame\r\n```\r\n\r\nConfigure model and sampling:\r\n\r\n```bash\r\nuv run vf-eval gridgame -m gpt-4.1-mini   -n 20 -r 3 -t 1024 -T 0.7 -a '{\"split\": \"easy\"}'\r\n```\r\n\r\nThere are 3 splits: easy (default--50 problems), medium (10 problems), and hard (5 problems).\r\n\r\n### Metrics\r\nSummarize key metrics your rubric emits and how they’re interpreted.\r\n\r\n| Metric | Meaning |\r\n| ------ | ------- |\r\n| `reward` | Main scalar reward (weighted sum of criteria) |\r\n| `accuracy` | Exact match on target answer |","encoding":"utf-8","truncated":false,"total_bytes":1067},"status":null}