{"data":{"kind":"file","path":"README.md","version_id":"dw88mj2r9txbrwfhyd4jld7p","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3116,"modified_at":"2025-12-25T03:38:49.958000","content_hash":"97d15e68fa21437ccabc33e3cd358f8ce3fb03df79ced37826d122710ed9c66c"},"entries":[],"content":"# blind-cartographer\n\n> Implemented by: [@LatentLich](https://twitter.com/LatentLich)\n>\n> Inspired by [@arithmoquine](https://twitter.com/arithmoquine)'s experiment, [\"How Does A Blind Model See The Earth?\"](https://outsidetext.substack.com/p/how-does-a-blind-model-see-the-earth)\n\n### Overview\n- **Environment ID**: `blind-cartographer`\n- **Short description**: A GIS-based environment where models classify geographic grid cells as land or water.\n- **Tags**: eval, gis, geography, classification, single-turn\n\n### Datasets\n- **Primary dataset(s)**: `oliveirabruno01/blind-cartographer-data` on Hugging Face. A grid of global coordinates sampled from a high-resolution raster map.\n- **Source links**: [oliveirabruno01/blind-cartographer-data](https://huggingface.co/datasets/oliveirabruno01/blind-cartographer-data)\n- **Split sizes**: The script dynamically creates a stratified train/eval split from the full dataset (default: 90% train, 10% eval).\n\n### Task\n\n- **Type**: single-turn\n- **Parser**: Standard `vf.Parser` using `extract_boxed_answer`.\n- **Rubric overview**: Calculates `pixel_accuracy_reward` (correctness) and `format_reward` (compliance with boxed format).\n\n### Quickstart\nRun an evaluation with default settings (4.0 degree resolution):\n\n```bash\nuv run vf-eval blind-cartographer\n```\n\nConfigure the model, number of samples, and grid resolution:\n\n```bash\nuv run vf-eval blind-cartographer \\\n  -m gpt-4.1-mini \\\n  -n 256 \\\n  -a '{\"resolution\": 10.0}'\n```\n\n### Environment Arguments\nThe behavior of the environment can be configured via arguments passed with `-a` or `--env-args`.\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `resolution` | float | `4.0` | The size of each grid cell in degrees. **Valid range: (0, 180]**. A warning is issued for values < 0.5 with the default dataset. |\n| `dataset_repo_id` | str | `\"oliveirabruno01/blind-cartographer-data\"` | The Hugging Face repository ID for the source data. |\n| `dataset_split` | str | `\"train\"` | The split of the dataset to use for building both the training and evaluation datasets. |\n| `seed` | int | `3301` | The random seed for shuffling and data splitting. |\n| `eval_set_size` | float | `0.1` | The proportion of the dataset to hold out for the fixed evaluation set. **Valid range: (0, 1)**. |\n| `system_prompt` | str | `None` | A custom system prompt to use for the environment. If not provided, the default system prompt is used. |\n\n### Metrics\nThe rubric emits two primary metrics for evaluating performance.\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | The main scalar reward, equivalent to `pixel_accuracy_reward`. |\n| `pixel_accuracy_reward` | Returns 1.0 if the model's prediction for a single cell matches the ground truth, else 0.0. This is the metric used for training. |\n\n## Changelog\n\n### v0.1.1\n- **Refactor**: Updated for Verifiers v0.1.8+ compatibility using standard `vf.SingleTurnEnv`.\n- **Output Format**: Changed response requirement from \"Land/Water\" text to binary `\\boxed{1}` or `\\boxed{0}`.\n- **Metrics**: Removed batch IoU metric calculation; added `format_reward`.","encoding":"utf-8","truncated":false,"total_bytes":3116},"status":null}