{"data":{"kind":"file","path":"README.md","version_id":"a56ygeb4hlif5lugo3fgxdgh","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1691,"modified_at":"2026-06-15T19:35:17.304000","content_hash":"11c277491784703eb127bb7c5c6507663373610712871bc99fb922f5ef49b11e"},"entries":[],"content":"# meta-tool-coherence\n\n`meta-tool-coherence` is a deterministic Verifiers environment for studying\nwhen a cheap tool should help and when tool output should be ignored.\n\nEach prompt asks the model to compute a small normalized score:\n\n```text\nanswer = (signal * multiplier + offset) mod 97\n```\n\nThe model must return exactly one result tag:\n\n```text\n<result>{\"answer\": 42, \"source\": \"tool\"}</result>\n```\n\nThe environment exposes one deterministic tool:\n\n```python\nlookup_signal(record_id: str) -> dict\n```\n\nTask families:\n\n- `lookup_required`: the signal is hidden; the model should call the tool once\n  and use the authoritative tool signal.\n- `prompt_sufficient`: the trusted signal is already in the prompt; the model\n  should avoid the tool.\n- `tool_conflict`: the prompt gives a trusted signal and the tool may return a\n  stale conflicting signal; the model should ignore the tool if it calls it.\n\nMetrics separate correctness from tool routing and final synthesis:\n\n- `answer_exact`\n- `source_exact`\n- `source_evidence_match`\n- `unsupported_tool_source`\n- `used_tool`\n- `tool_policy_match`\n- `missed_recommended_tool`\n- `unnecessary_tool_call`\n- `repeated_tool_call`\n- `stale_tool_answer_match`\n- `raw_tool_dump`\n- `schema_valid`\n\nThe first intended probe is a small Qwen 2B tool run with deterministic scoring,\nno sandbox, and no LLM judge.\n\nVersion notes:\n\n- `0.1.1` fixed repeated-tool-call accounting by counting attempted assistant\n  tool calls as well as tool response messages.\n- `0.1.2` adds source-evidence consistency: a final answer that claims\n  `\"source\": \"tool\"` without an observed tool call loses source credit and gets\n  an explicit `unsupported_tool_source` penalty.\n","encoding":"utf-8","truncated":false,"total_bytes":1691},"status":null}