{"data":{"kind":"file","path":"README.md","version_id":"grnxdj804jtzbr03xcoq5ull","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1309,"modified_at":"2025-09-07T17:12:20.868000","content_hash":"d4b9fc329805cb4dd2cbd1a9f2883c1806b761096ce32a74693ebd4f52e33a71"},"entries":[],"content":"# emoji-rebus\r\n\r\n### Overview\r\n- **Environment ID**: `emoji-rebus`\r\n- **Short description**: Given an emoji clue, predict the exact target phrase.\r\n- **Tags**: emoji, puzzles, single-turn, text-only\r\n\r\n### Datasets\r\n- **Primary dataset**: Local JSON at `/dataset.json` (emoji ➜ phrase pairs)\r\n- **Split**: Single evaluation set (shuffled internally by the runner if desired)\r\n\r\n### Task\r\n- **Type**: single-turn\r\n- **Parser**: none (model must output the final phrase directly)\r\n- **Rubric overview**: exact-match reward (case-insensitive, trimmed). Reward is 1.0 on exact match, else 0.0.\r\n\r\n### Quickstart\r\nRun an evaluation with default settings:\r\n\r\n```bash\r\nuv run vf-eval emoji-rebus\r\n```\r\n\r\nConfigure model and sampling:\r\n\r\n```bash\r\nuv run vf-eval emoji-rebus \\\r\n  -m gpt-4.1-mini \\\r\n  -n 20 -r 3 -t 512 -T 0.0\r\n```\r\n\r\nNotes:\r\n- Use `-n` (num examples) and `-r` (rollouts per example) to control evaluation size.\r\n- Set a deterministic seed via the CLI if you want stable shuffling.\r\n\r\n### Environment Arguments\r\nThis environment has no custom arguments.\r\n\r\n### Metrics\r\n\r\n| Metric | Meaning |\r\n| ------ | ------- |\r\n| `reward` | 1.0 if exact match; 0.0 otherwise |\r\n| `accuracy` | Equivalent to exact-match rate |\r\n\r\n### Evaluation Reports\r\nReports generated by `vf-eval` will render here after runs.","encoding":"utf-8","truncated":false,"total_bytes":1309},"status":null}