{"data":{"kind":"file","path":"README.md","version_id":"iw59v9gkdappojpg5qn141xh","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1978,"modified_at":"2026-01-23T06:27:20.453000","content_hash":"17d31e31f9c0f65235b529b8ed88464bc8bb03de42b6e6da9e33e4015d4b8a7a"},"entries":[],"content":"# Vision-SR1 Environment\n\n**Blindfolded Test** for Visual Math Reasoning\n\n## Overview\n\nThis environment implements the Vision-SR1 \"Blindfolded Test\" methodology for training Vision-Language Models (VLMs) on visual math reasoning. The key insight is:\n\n> If a model truly understands an image, it should describe it well enough that it can solve the problem without looking at the image again.\n\n## Dataset\n\n**Geometry3K** - A visual geometry problem dataset from Hugging Face:\n- 3,000+ geometry problems with diagrams\n- Multiple choice and numeric answers\n- Covers angles, triangles, circles, and more\n\n## Reward Structure\n\n### Dual-Pass System\n\n1. **Pass 1 (with image)**: Model generates:\n   - `<scan>` - Visual description of the diagram\n   - `<think>` - Step-by-step reasoning\n   - `<answer>` - Final answer\n\n2. **Pass 2 (blind)**: Model solves using only the question + `<scan>` content (no image)\n\n### Rewards\n\n| Component | Reward | Description |\n|-----------|--------|-------------|\n| Format | +0.1 | Proper `<scan>`, `<think>`, `<answer>` tags |\n| Accuracy | +1.0 | Correct answer in Pass 1 |\n| Blind Correct | +1.0 | Pass 2 correct → scan was sufficient |\n| Blind Wrong | -0.5 | Pass 2 wrong → scan was insufficient |\n\n## Usage\n\n```python\nfrom vision_sr1 import load_environment\n\n# Load with default settings\nenv = load_environment()\n\n# Or customize\nenv = load_environment(\n    split=\"train\",\n    blind_verification=True,\n    max_samples=1000\n)\n```\n\n## Configuration\n\n| Parameter | Default | Description |\n|-----------|---------|-------------|\n| `split` | `\"train\"` | Dataset split (train/test) |\n| `blind_verification` | `True` | Enable Pass 2 verification |\n| `max_samples` | `None` | Limit number of samples |\n\n## References\n\n- [vision_grpo_tinker](https://github.com/chatrdh/vision_grpo_tinker) - Original GRPO training code\n- [Geometry3K Dataset](https://huggingface.co/datasets/hiyouga/geometry3k)\n- [Vision-SR1 Paper](https://www.alphaxiv.org/abs/2508.19652)\n","encoding":"utf-8","truncated":false,"total_bytes":1978},"status":null}