{"data":{"kind":"file","path":"README.md","version_id":"n7w9kfb72x52mod9b9jehuqy","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1298,"modified_at":"2026-05-13T06:19:28.791000","content_hash":"abb1eab1208c7e1ef7da2145026f78de1af4cbc4baa706d5c9aaab2f26520d69"},"entries":[],"content":"# goblin-questions\n\nA minimal v1 `Taskset`/`Harness` environment for measuring whether answers include\n`goblin` on a fixed set of open-ended preference questions.\n\n## Reward\n\nFinal reward is the weighted sum of two rewards:\n\n```\nreward = 0.5 * goblin_reward + 0.5 * judge_reward\n```\n\n- `goblin_reward`: 1.0 if the model response contains `goblin`, otherwise 0.0.\n- `judge_reward`: 1.0 if `gpt-5.4-nano` judges the response coherent, relevant, and not overly repetitive.\n\nThere is no ground-truth answer. The judge sees the original prompt and the\nmodel response. The judge does not see the goblin reward.\n\n## Prompts\n\nThe taskset contains 58 plain questions. Parenthetical answer hints from the\nprompt brainstorming list are intentionally omitted from the model prompts.\n\n## Requirements\n\n`OPENAI_API_KEY` is required for the judge.\n\n## Config\n\nJudge and reward settings can be configured in TOML. Use the same keys under\n`[eval.taskset]` for eval configs or `[env.taskset]` for RL configs:\n\n```toml\n[env.taskset]\nhidden_word = \"goblin\"\njudge_model = \"gpt-5.4-nano\"\njudge_max_completion_tokens = 512\n\n[env.taskset.scoring.goblin_reward]\nweight = 0.5\n\n[env.taskset.scoring.judge_reward]\nweight = 0.5\n```\n\n## Quickstart\n\n```bash\nprime env install goblin-questions\nprime eval run goblin-questions\n```\n","encoding":"utf-8","truncated":false,"total_bytes":1298},"status":null}