{"data":{"kind":"file","path":"README.md","version_id":"vikh37e1vb4gpoa5agvri2ud","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":4691,"modified_at":"2026-06-11T07:31:32.969000","content_hash":"f7801b63179fb2b4a8e4152ce0a2600dd43f76f309ac933841eaff3326133ce9"},"entries":[],"content":"# cad-env\n\nText-to-OpenSCAD with fully deterministic geometric rewards. The model reads a\nnatural-language description of a 3D object (exact dimensions, millimeters) and\nmust emit an OpenSCAD program. The compiled mesh is graded against a reference\nmesh — no LLM judges anywhere in the reward path.\n\n**Gallery & leaderboard**: https://www.ashantanu.com/cad-env-gallery/ —\nrendered dataset showcase plus 10-model eval leaderboard with per-rollout\nreference-vs-completion tiles.\n\n### Overview\n- **Environment ID**: `cad-env`\n- **Short description**: Text-to-OpenSCAD code generation graded by compile success, watertightness, connected components, and chamfer distance against procedurally generated references\n- **Tags**: cad, 3d, code-generation, procedural, train, eval\n\n### Reward\n\n| Term | Weight | Check |\n|---|---|---|\n| compiles | 0.2 | `openscad` produces a valid non-empty mesh |\n| watertight | 0.1 | closed manifold surface (`trimesh.is_watertight`) |\n| single component | 0.1 | no disconnected floating parts |\n| shape similarity | 0.6 | `exp(-k * chamfer)` vs. reference, size-sensitive |\n\nThe ladder is graded, not binary: syntax error (0.0) < compiles-but-wrong\n(≈0.4) < missing/floating part (≈0.5) < wrong size (≈0.85) < exact (1.0) —\na within-group gradient suited to GRPO. The watertight and connected-component\nterms are printability rewards: chamfer alone scores a cloud of disconnected\nfragments decently; these terms force structural coherence.\n\n### Procedural dataset\n\nProblems are generated, not scraped. Each example's prompt and reference\nprogram are emitted from the same parameter draw, so labels are faithful by\nconstruction and every dataset is contamination-free. 16 parametric shape\nfamilies across 4 difficulty levels:\n\n- **L1** single primitive (box, cylinder, sphere, cone)\n- **L2** one boolean op (plate with hole, tube, notched box, open container)\n- **L3** 3-5 ops / repetition (hole-grid plate, L-bracket, stepped shaft, flanged washer)\n- **L4** rotation, radial symmetry, derived dimensions, relational placement\n  (hex nut from across-flats width, toothed disc, radially perforated tube, domed monument)\n\n### Task\n- **Type**: single-turn\n- **Output format**: one fenced code block containing a complete OpenSCAD program\n- **Rubric overview**: weighted reward ladder above; `compiled`, `watertight`, `single_component`, `similarity` also logged as zero-weight metrics in every eval\n\n### Quickstart\n\nRequires the `openscad` binary on PATH (https://openscad.org — or the\n`openscad/openscad` Docker image).\n\n```bash\nprime env install cad-env\nprime eval run cad-env -m <model> -n 20 -r 4 -t 2048\n```\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_examples` | int | `200` | train dataset size |\n| `num_eval_examples` | int | `60` | eval dataset size (different seed) |\n| `seed` | int | `0` | dataset RNG seed |\n| `levels` | tuple | `(1,2,3,4)` | difficulty levels to cycle through |\n| `chamfer_k` | float | `10.0` | similarity sharpness `exp(-k*cd)` |\n\nCurriculum: `-a '{\"levels\": [3, 4]}'` to train only on hard tiers.\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | weighted ladder (0.2 compile + 0.1 watertight + 0.1 single-component + 0.6 similarity) |\n| `compiled` | 1 if the program produced a valid mesh |\n| `watertight` | 1 if the mesh is a closed manifold |\n| `single_component` | 1 if the mesh has no disconnected parts |\n| `similarity` | `exp(-k * chamfer)` vs. reference in [0, 1] |\n\n### Reference results (21 examples x 8 rollouts, levels 1-4)\n\n| model | mean | L1 | L2 | L3 | L4 | pass@1 | pass@8 |\n|---|---|---|---|---|---|---|---|\n| qwen/qwen3-8b | 0.85 | 1.00 | 0.96 | 0.72 | 0.68 | 0.89 | 0.95 |\n| Qwen/Qwen3.5-2B | 0.28 | 0.68 | 0.19 | 0.22 | 0.11 | 0.12 | 0.38 |\n\n(pass@k = fraction of examples with any rollout reward ≥ 0.9.)\nThe 8B sits in the trainable band on L3/L4; the 2B from L2 up, with large\ngroup mean → max gaps (e.g. 0.10 → 0.82) — the profile GRPO exploits.\n\n### Verifier hygiene notes\n\n- OpenSCAD/CGAL emits faces in nondeterministic order; faces are canonically\n  sorted by centroid before seeded surface sampling, making chamfer exactly\n  reproducible.\n- Candidate and reference are centered and scaled by the *reference's* extent,\n  so position is forgiven but absolute size errors are penalized.\n- Voxel IoU was deliberately rejected as a reward term: it misranks hollow\n  shapes (scores a solid cube above a handle-less mug).\n- `import`/`include`/`use`/`surface` are rejected (no filesystem access from\n  generated code).\n- Compiled meshes are cached by content hash; scoring adds ~0.6s per unique\n  program.\n","encoding":"utf-8","truncated":false,"total_bytes":4691},"status":null}