{"data":{"kind":"file","path":"README.md","version_id":"nhc1xr00fri0nnw22q8ha6rd","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3557,"modified_at":"2025-09-08T21:05:51.861000","content_hash":"5af5c7c7864a745ceaa0514f6445d59c8eaea78f4a9655656b0b6104a9301b05"},"entries":[],"content":"dedeuce\n\n### Overview\n- Environment ID: `dedeuce`\n- Short description: Interactive Mealy‑machine identification/control under a strict query budget. The agent must actively probe a hidden finite‑state transducer via a tool API (`act`, `submit_table`, `submit_macro`) and submit an exact hypothesis to receive reward.\n- Tags: multi-turn, tool-use, active-learning, system-identification, control, <300-LOC\n\n### Mechanics\n- Hidden system: seeded Mealy machine with `S∈{4..6}` states, input alphabet `{A,B,C}`, output alphabet `{0,1,2}`. Start state is `0`.\n- Budget: each `act(symbol)` consumes 1 query (even invalid actions). Optional trap pairs flip an irreversible `trap_hit` flag.\n- Goals (modes):\n  - `basic` (identification): use `submit_table(table)` to submit the exact transition/output table.\n  - `control` (default): use `submit_macro(seq, repeat)` to produce a specific target output sequence (given only by its hash), verified by simulation.\n- Reward: `1.0` only if hypothesis is exactly correct and no trap is hit; else `0`. Small per‑query penalty is subtracted and clamped to `[0,1]`.\n\n### Tools\n- `act(symbol: \"A\"|\"B\"|\"C\") -> {out:int, budget:int, t:int, trap_hit:bool}`\n  - Executes one input; returns emitted output and canonical telemetry: `{out, budget_left, t (elapsed), trap_hit, queries_used}`. Invalid symbols still consume a query.\n- `submit_table(table_json: string) -> {ok:bool, budget_left, queries_used, trap_hit}`\n  - Full table JSON: `{ \"n\": S, \"start\": 0, \"trans\": { \"0\": {\"A\":[ns, out], \"B\":[ns, out], \"C\":[ns, out]}, ... } }`.\n- `submit_macro(seq: string, repeat:int) -> {ok:bool, budget_left, queries_used, trap_hit}`\n  - Simulates `seq` repeated `repeat` times from start; success iff output hash matches `target_hash`. Trap traversal on the macro path also fails (end‑to‑end safety).\n\n### Prompt/Observation\n- `reset(seed)` produces a chat prompt with: `{alphabet:[\"A\",\"B\",\"C\"], budget:<int>, goal:\"basic|control\", target_hash:<hex|string or \"\">, target_len:<int>}`.\n- Use only tool calls; end the episode by calling `submit_table` or `submit_macro` and then stop.\n\n### Quickstart\nRun a quick evaluation (control mode):\n\n```bash\nuv run vf-eval dedeuce -m gpt-4.1-mini -n 8 -r 2 -s -v \\\n  -a '{\"n\":8, \"seed\":0, \"budget\":25, \"mode\":\"control\", \"trap\": true}'\n```\n\nIdentification mode:\n\n```bash\nuv run vf-eval dedeuce -m gpt-4.1-mini -n 8 -r 2 -s -v \\\n  -a '{\"n\":8, \"seed\":0, \"budget\":25, \"mode\":\"basic\", \"trap\": false}'\n```\n\n### Environment Arguments\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `n` | int | `8` | Number of instances |\n| `seed` | int | `0` | Base RNG seed |\n| `budget` | int | `25` | Query budget per episode |\n| `n_states` | int | `5` | Hidden states (`4..6` if `None`) |\n| `mode` | str | `\"control\"` | `\"control\"` or `\"basic\"` |\n| `trap` | bool | `true` | Enable trap pairs |\n| `target_len` | int | `20` | Length of target output in control mode |\n\nNote: Per‑instance variety is enabled by default for wider difficulty. Set `n_states`, `trap=false`, or adjust `budget`/`target_len` explicitly if you need fixed settings.\n\n### Metrics\n| Metric | Meaning |\n| ------ | ------- |\n| `reward_dedeuce` | Main reward (higher is better) |\n| `queries_used` | Number of `act` calls made |\n| `correct` | 1 if exact success, else 0 |\n| `trap_hit` | 1 if any trap hit, else 0 |\n| `budget_left` | Remaining budget at termination |\n\n## Evaluation Reports\n\n<!-- vf:begin:reports -->\n<!-- Reports auto-generated by vf-eval will be embedded here. -->\n<!-- vf:end:reports -->\n","encoding":"utf-8","truncated":false,"total_bytes":3557},"status":null}