{"data":{"kind":"file","path":"README.md","version_id":"vw9qimty1ntghmkupzjceho9","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":9646,"modified_at":"2026-05-30T01:24:36.513000","content_hash":"866b2563fbde8b8893759c6f62f3c5d7c51de6c30b4934d681c56b9f20eb2c9d"},"entries":[],"content":"---\ntags:\n  - zerolang\n  - reinforcement-learning\n  - verifiers\n  - code-editing\n  - tool-use\n  - graph-editing\n  - laguna-xs2\nlicense: apache-2.0\n---\n\n# Zerolang Editing\n\n`zerolang-editing` is a Verifiers/Prime RL environment for training coding agents\nto edit [Zerolang](https://github.com/vercel-labs/zerolang) programs through\nchecked graph edits instead of loose text replacement.\n\nThe RL harness is built around [Roder](https://roder.sh), using a custom\nzero-coder plugin/distribution that exposes a Zerolang-only graph toolset to the\nmodel. In training, generic source editing tools are disabled; the agent is\nexpected to use the `zero_*` tools below, especially `zero_graph_summary` and\n`zero_graph_patch`, against the rollout file on disk.\n\nThe core task is intentionally narrow: each rollout starts with a `.0` source\nfile already written to disk, asks the model for a semantic code edit, and\nscores the edited file after the model uses Zerolang tooling. The intended\nsuccessful behavior is:\n\n1. Inspect the file with Zerolang graph/check tools.\n2. Identify the relevant graph hash and semantic node.\n3. Apply a checked `zero graph patch` operation to the on-disk file.\n4. Finish with a compact JSON response pointing at the edited path.\n\nThis repository contains the environment source package, synthetic task\nbuilders, tool wrappers, and documentation. The trained checkpoint from hosted\nRL runs is published separately by the training service when a run is finalized.\n\n## Why This Exists\n\nMost code-editing agents learn to patch source through line-oriented text\noperations. Zerolang exposes a graph-level editing surface where a patch is\nguarded by the expected graph hash and the expected field value. That makes\nedits auditable and harder to apply to stale or mismatched code.\n\nThis environment is designed to train that behavior directly. It rewards\nsuccessful checked graph patches, while still checking that the resulting file\ncompiles and matches the hidden target source.\n\n## Environment Summary\n\n- **Package name:** `zerolang-editing`\n- **Prime environment ID:** `pandelis/zerolang-editing`\n- **Version in this repo:** `0.1.11`\n- **Task type:** multi-turn tool-use code editing\n- **Agent harness:** Roder with a custom Zero graph-only plugin/tool allowlist\n- **Language under edit:** Zerolang `.0`\n- **Train split:** 209 deterministic synthetic tasks\n- **Eval split:** 67 held-out deterministic synthetic tasks\n- **Primary reward target:** successful `zero_graph_patch` on the rollout file\n\n## Roder Harness\n\nThe intended RL setup runs the model inside Roder rather than a generic chat\nloop. Roder provides the coding-agent harness, while a custom zero-coder plugin\nconfigures the available tool surface for this environment.\n\nThat plugin is deliberately restrictive:\n\n- It exposes only Zerolang graph/check/fix/skills tools.\n- It removes generic text edit tools from the training harness.\n- It routes tool calls to on-disk `.0` files using `path` arguments.\n- It keeps checked graph edits as the primary affordance for code changes.\n\nThis matters because the behavior we want to train is not \"rewrite this source\nstring\". The target behavior is \"inspect the Zerolang graph and apply a checked\nsemantic graph patch to the file Roder is managing\". The Verifiers environment\nthen grades the resulting file from disk.\n\n## Rollout Contract\n\nEach task row includes an initial Zerolang source program and a hidden target\nprogram. At rollout setup time, the environment writes the initial source to:\n\n```text\n<temporary rollout workspace>/program.0\n```\n\nThe model receives that path in the user prompt. Tools must operate on `path`\narguments that point to this `.0` file. Pasting the full source into tool calls\nis rejected because the training target is disk-backed graph editing, not\nsource-string rewriting.\n\nThe environment canonicalizes recoverable path mistakes, such as missing paths\nor paths outside the rollout workspace, back to the rollout file and records\nthose corrections. The `path_argument_valid` metric rewards clean tool calls\nthat did not require correction.\n\n## Tools\n\nThe environment exposes only Zerolang-specific tools:\n\n| Tool | Purpose |\n| --- | --- |\n| `zero_check(path)` | Run `zero check --json` against a `.0` file. |\n| `zero_graph_summary(path)` | Return compact graph hash and patchable node facts. |\n| `zero_graph_dump(path)` | Run `zero graph dump` for detailed graph inspection. |\n| `zero_graph_json(path)` | Run `zero graph --json`. |\n| `zero_fix_plan(path)` | Run `zero fix --plan --json`. |\n| `zero_graph_patch(path, expect_graph_hash, op)` | Apply one checked graph patch operation to the file. `op` must be a Zero patch operation string, not a JSON object. |\n| `zero_skills_get(skill)` | Load version-matched Zerolang guidance such as `language`, `diagnostics`, or `stdlib`. |\n\nExample checked patch shape:\n\n```bash\nzero graph patch program.0 \\\n  --expect-graph-hash graph:49dd208f8361c221 \\\n  --op 'set node=\"#78ac4364\" field=\"value\" expect=\"66\" value=\"65\"'\n```\n\nThe `op` tool argument is the string after `--op`. Do not call\n`zero_graph_patch` with `op` as `{\"op\":\"set\",\"id\":\"#...\",\"value\":66}`.\n\n## Reward Metrics\n\nThe main rubric is weighted toward actually patching the graph and producing\nthe hidden target program.\n\n| Metric | Weight | Meaning |\n| --- | ---: | --- |\n| `graph_patch_success` | 0.50 | A successful `zero_graph_patch` call edited the file to the hidden target. |\n| `target_source_match` | 0.20 | The final on-disk source matches the target after whitespace normalization. |\n| `zero_check_pass` | 0.15 | The edited file passes `zero check --json`. |\n| `zerolang_surface_used` | 0.10 | The rollout used graph hashes, node IDs, `expect`, or graph-patch semantics. |\n| `path_argument_valid` | 0.05 | Tool calls used the rollout `.0` path without harness-side correction. |\n\nThe reward is intentionally not fully binary. A model can get partial credit for\nproducing compilable code and using the right interface, but the highest reward\nrequires the checked graph patch to land correctly.\n\n## Dataset Construction\n\nThe synthetic tasks are generated from canonical Zerolang snippets:\n\n1. Build an initial `.0` program.\n2. Select a patchable semantic node, usually a literal, function value, call\n   target, or printed diagnostic string.\n3. Mutate the semantic value to produce the target program.\n4. Store the target source and task metadata.\n5. During rollout, require the model to recover the target through graph tools.\n\nThe environment currently focuses on deterministic editing families where\n`zero graph patch` support is reliable. The task builders live in:\n\n- `zerolang_editing/tasks.py`\n- `zerolang_editing/train_tasks.py`\n- `zerolang_editing/task_builders.py`\n\n## Installation\n\nInstall from Prime Hub:\n\n```bash\nprime env install pandelis/zerolang-editing@0.1.11\n```\n\nInstall from this repository:\n\n```bash\nuv sync\nuv run python -m compileall zerolang_editing\n```\n\nZerolang is required at runtime. If `zero` is not already on `PATH`, the tool\nwrapper checks `$HOME/.zero/bin/zero` and can download a release binary into a\ntemporary install directory.\n\n## Local Eval\n\n```bash\nprime eval run ./environments/zerolang_editing \\\n  -m poolside/laguna-xs.2 \\\n  -n 3 -r 1 -t 2048 -T 0.4 \\\n  -a '{\"split\":\"eval\",\"max_turns\":10}' \\\n  -s -d -A\n```\n\nFor quick package-level validation:\n\n```bash\ncd environments/zerolang_editing\nuv run python -m compileall zerolang_editing\nuv run python - <<'PY'\nfrom zerolang_editing.zerolang_editing import load_environment\nenv = load_environment(split=\"eval\", max_examples=1, max_turns=2)\nprint(type(env).__name__, len(env.dataset))\nPY\n```\n\n## Hosted RL Configuration\n\nThe overnight Laguna XS.2 run uses:\n\n```toml\nmodel = \"poolside/Laguna-XS.2\"\nmax_steps = 200\nbatch_size = 64\nrollouts_per_example = 8\nlearning_rate = 1e-4\n\n[sampling]\nmax_tokens = 2048\ntemperature = 0.4\nenable_thinking = true\n```\n\nThe config is stored in:\n\n```text\nconfigs/rl/zerolang-editing-laguna-xs2-overnight.toml\n```\n\n## Previous Training Signal\n\nA 20-step stress run on `poolside/Laguna-XS.2` completed successfully before\nthe overnight scale-up:\n\n- Baseline eval Avg@1: `0.1500`\n- Step 15 eval Avg@1: `0.2357`\n- Final eval Avg@1: `0.2250`\n- First 10 train-step reward average: `0.1606`\n- Last 10 train-step reward average: `0.2056`\n- No fatal orchestrator errors, no eval truncation, no no-response.\n\nThe main failure signatures were invalid tool paths: missing `path` arguments\nand paths outside the rollout workspace. Version `0.1.11` keeps the path sandbox,\nconverts recoverable path mistakes into canonicalized calls against the rollout\nfile, and removes no-op reward paths: unchanged source no longer earns\n`zero_check_pass`, and text-only mentions of graph tools no longer earn tool\nsurface reward.\n\n## Repository Contents\n\n```text\nREADME.md\npyproject.toml\nuv.lock\nconfigs/\n  rl/\n    zerolang-editing-laguna-xs2-20step.toml\n    zerolang-editing-laguna-xs2-overnight.toml\nzerolang_editing/\n  __init__.py\n  task_builders.py\n  tasks.py\n  train_tasks.py\n  zero_tools.py\n  zerolang_editing.py\n```\n\nBuild artifacts, local virtualenvs, Zerolang caches, rollout outputs, and\ncompiled Python caches are intentionally excluded from the Hugging Face repo.\n\n## Limitations\n\n- The task distribution is synthetic and should be expanded before treating the\n  trained behavior as general Zerolang editing competence.\n- Current graph-edit families focus on reliable literal/value style patches.\n- The environment is designed for RL tool-use behavior, not as a standalone\n  benchmark of general coding ability.\n- This repo contains the environment source, not final model weights.\n","encoding":"utf-8","truncated":false,"total_bytes":9646},"status":null}