{"data":{"kind":"file","path":"README.md","version_id":"yeytehjcg8cbesq4w3aln209","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2545,"modified_at":"2025-09-08T20:37:20.078000","content_hash":"4ce114316d6b3b6d0a61a9c01b7822e9e58b84c23cc6a4d90684d46cd97886cc"},"entries":[],"content":"# rebus-vl-thinking\n\n> Replace the placeholders below, then remove this callout. Keep the Evaluation Reports section at the bottom intact so reports can auto-render.\n\n### Overview\n- **Environment ID**: `rebus-vl-thinking`\n  \n- **Short description**: The rebus vl thinking environment is designed for training a VLM to solve rebus puzzles-images or sets of images that must be interpreted and combined to guess a single target word.\n\nUnlike traditional single-step image classification, this environment requires the LLM to engage in multilayer reasoning:\n\nImage understanding - The LLM first interprets each individual image, extracting potential semantic meanings, objects, symbols, or words they represent.\n\nSubtask reasoning - Each interpreted element becomes a subtask: the model must consider phonetic hints, partial words, logical cues, or multilingual components.\n\nIntegration - The LLM then integrates all subtasks into a coherent whole, producing one final guess for the target word.\n\nThis creates a process of multimodal, multilingual, and multilayered thinking, where the model is not only matching patterns but also engaging in symbolic reasoning and cross-domain association.\n\nAnother key strength is that the dataset is easily extensible but difficult to master:\n\nEasy to expand because new rebus puzzles can be generated simply by mixing new images, words, or symbols.\n\nHard to master because solving requires deeper semantic connections, logical inference, and creative interpretation that go beyond surface-level recognition.\n\nIn short, the environment encourages an LLM to build a structured reasoning chain over multiple perceptual and semantic inputs, ultimately training it to solve complex, multimodal puzzles that blend vision, language, and logic.\n\n- **Tags**: reasoning, vision, semantic, images, rebus\n\n### Datasets\n- **Primary dataset(s)**: UlrickBL/rebus_french_rl for Rebus\n- **Source links**: https://huggingface.co/datasets/UlrickBL/rebus_french_rl\n- **Split sizes**: 10 000\n\n### Task\n- **Type**: single turn vision\n- **Parser**: XMLParser\n- **Rubric overview**: Exact match\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval rebus-vl-thinking\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval rebus-vl-thinking   -m gpt-4.1-mini   -n 20 -r 3 -t 1024 -T 0.7\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Main scalar reward (weighted sum of criteria) |\n\n","encoding":"utf-8","truncated":false,"total_bytes":2545},"status":null}