{"data":{"kind":"file","path":"README.md","version_id":"slw59fylcm509vv5v1d0tga3","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2912,"modified_at":"2026-05-15T18:51:43.354000","content_hash":"9f5801fd11ed2a433b3af3e05f5bc555db77cedb6d895b1ca27e4bcb16ee885f"},"entries":[],"content":"# synthetic-grounding\n\nRL environment for training multimodal models on visual grounding tasks using point localization rewards.\n\n---\n\n## Overview\n\n- **Environment ID**: `synthetic-grounding`\n- **Short description**: Visual grounding environment where models predict 2D points corresponding to objects referenced in natural language questions.\n- **Tags**: vision, grounding, multimodal, GRPO, localization, synthetic\n\n---\n\n## Datasets\n\n- **Primary dataset(s)**:\n  - `UlrickBL/grounding-dataset-synthetic`\n  - Synthetic visual grounding dataset containing:\n    - image\n    - natural language grounding question\n    - target point coordinates\n\n- **Source links**:\n  - https://huggingface.co/datasets/UlrickBL/grounding-dataset-synthetic\n\n- **Split sizes**:\n  - Train/eval splits provided directly by the dataset.\n\n---\n\n## Task\n\n- **Type**: single-turn multimodal grounding\n\n- **Input**:\n  - image\n  - grounding instruction/question\n\n- **Expected output format**:\n\n```json\n{\"point_2d\":[x,y],\"label\":\"OBJECT_NAME\"}\n```\n\nThe environment also accepts list-form outputs:\n\n```json\n[\n  {\"point_2d\":[x,y],\"label\":\"OBJECT_NAME\"}\n]\n```\n\n---\n\n## Rubric Overview\n\nThe environment uses a weighted reward rubric:\n\n| Reward Function | Weight | Description |\n| --- | --- | --- |\n| `reward_format` | 0.2 | Rewards valid JSON parsing and correct schema |\n| `reward_distance_gaussian` | 0.8 | Gaussian distance reward based on predicted point proximity to ground truth |\n\n### Distance Reward\n\nThe localization reward is:\n\n```text\nexp(-d² / (2σ²))\n```\n\nWhere:\n\n- `d` = Euclidean distance between prediction and target\n- `σ` = smoothing parameter (default: 100)\n\nThis provides:\n\n- smooth gradients\n- dense RL signal\n- stable optimization for grounding tasks\n\n---\n\n## Quickstart\n\nRun an evaluation with default settings:\n\n```bash\nprime eval run synthetic-grounding\n```\n\nConfigure model and sampling:\n\n```bash\nprime eval run synthetic-grounding \\\n  -m openai/gpt-4.1-mini \\\n  -n 20 \\\n  -r 3 \\\n  -t 1024 \\\n  -T 0.7\n```\n\nPass custom environment arguments:\n\n```bash\nprime eval run synthetic-grounding \\\n  -a '{\"max_size\":512}'\n```\n\n---\n\n## Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | --- | --- | --- |\n| `max_size` | int | `640` | Maximum image dimension after resizing |\n| `split` | str | `\"train\"` | Dataset split to load |\n| `dataset_id` | str | `\"UlrickBL/grounding-dataset-synthetic\"` | Hugging Face dataset identifier |\n\n---\n\n## Metrics\n\n| Metric | Meaning |\n| --- | --- |\n| `reward` | Final weighted reward |\n| `reward_format` | Valid output formatting score |\n| `reward_distance_gaussian` | Spatial localization quality |\n| `completion_length` | Length of generated answer |\n\n---\n\n## Example Prompt\n\n```text\nWhat is the location of the red cup?\n\nReturn ONLY valid JSON.\n\nFormat:\n{\"point_2d\":[x,y],\"label\":\"OBJECT_NAME\"}\n```\n\n---\n\n## Example Output\n\n```json\n{\"point_2d\":[421,198],\"label\":\"red cup\"}\n```\n","encoding":"utf-8","truncated":false,"total_bytes":2912},"status":null}