{"data":{"kind":"file","path":"README.md","version_id":"v4xjk8ygxfj9baq3zy5izicm","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":4231,"modified_at":"2026-05-04T17:45:14.451000","content_hash":"4bcad28add68c01ecdbd72e421187157ec4c6f16e54706dee220ebdd4c25f1fe"},"entries":[],"content":"# dbt-trivial\n\n### Overview\n- **Environment ID**: `dbt-trivial`\n- **Short description**: stateful dbt Core environment for training agents to build reusable mart and intermediate dbt models.\n- **Tags**: multi-turn, tool-use, sql, dbt, train, eval\n\n### Datasets\n- **Primary dataset**: Hugging Face dataset repo `diicell/duckdb-dbt-qa`.\n- **Default splits**: `train` and `eval`.\n- **Bundled project artifact**: `dbt_project.tar.gz` in the same dataset repo.\n- **Task metadata**: stored in `info` as JSON, including task spec fields and compact verification metadata.\n\n### Task\n- **Type**: multi-turn tool use (`StatefulToolEnv`)\n- **Output format**: final answer must be returned as:\n\n```xml\n<dbt_model_sql>\n...\n</dbt_model_sql>\n```\n\n- **Tool contract**:\n  - `list_staging_models`\n  - `get_model_details`\n  - `sample_staging_model`\n  - `profile_staging_model`\n  - `compile_candidate`\n  - `run_candidate`\n  - `query_candidate_output`\n\n### Quickstart\nInstall the local environment:\n\n```bash\nprime env install dbt-trivial -p ./environments\n```\n\nSmoke eval:\n\n```bash\nprime eval run dbt-trivial -m qwen3-30b-i -n 5\n```\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `corpus_dataset` | `str` | `diicell/duckdb-dbt-qa` | Hugging Face dataset repo containing the parquet splits and dbt tarball. |\n| `corpus_split` | `str` | `train` | Train split name to load from the dataset repo. |\n| `eval_split` | `str` | `eval` | Eval split name to load from the dataset repo. |\n| `dbt_bundle_path` | `str \\| None` | `None` | Optional local path to `dbt_project.tar.gz`. |\n| `dbt_bundle_filename` | `str` | `dbt_project.tar.gz` | Artifact filename to download from the dataset repo. |\n| `dbt_path` | `str` | `\"dbt\"` | dbt executable to use for compile/run. |\n| `max_turns` | `int` | `12` | Max assistant turns per rollout. |\n| `max_output_rows` | `int` | `35000` | Hard cap for exact output verification. |\n| `verification_sample_rows` | `int` | `3` | Number of sample rows retained in verification payloads. |\n| `max_active_dbt_jobs` | `int` | `4` | Per-env-server concurrency cap for expensive dbt workspace and compile/run operations. |\n| `allowed_complexities` | `list[str] \\| None` | `None` | Optional curriculum filter applied to train and eval splits by `info[\"complexity\"]`. Use this to stage training from simpler examples toward harder examples. |\n\nFor local evals, keep `max_active_dbt_jobs` around `2` to `4` unless the machine has plenty of spare CPU and file IO. For hosted RL, prefer `1` to `2` per env server so rollout concurrency does not multiply into too many simultaneous dbt processes.\n\n### Reward Overview\n- deterministic exact match against packaged verification metadata (`row_count`, `columns`, `output_hash`, `hash_mode`)\n- contract-aware fallback soft execution comparison by running the gold dbt model in the same isolated workspace\n- explicit contract checks for required output columns, full output column agreement, ordered-task exactness, and partial rescue for column-order-only mismatches\n- grain-aware penalties based on required dimensions so dropped or duplicated entity keys lower reward materially\n- syntax/validity checks for `config(materialized='table')`, `ref()`, and allowed staging model usage\n- train-strict three-band outcome scoring: ordered exact tables receive full reward, rowset-exact/order-miss tables get a lower middle band, and fuzzy partial matches stay capped below `0.60`\n- row-dominant semantic fallback so row F1 and grain heavily outweigh loose value overlap when the table is plausible but not exact\n- exact hidden-score solutions are capped unless the rollout successfully ran the final XML SQL through `run_candidate`\n- full exact reward also requires compile-first behavior, visible pre-tool reasoning, and reasoned recovery after any failed compile/run\n\n### Notes\n- each rollout gets a fresh private dbt project workspace copied from a cached template.\n- the template project is downloaded/extracted once per env server process.\n- read-only staging exploration and profiling use cached template metadata; candidate execution stays rollout-local.\n- env caches and rollout workspaces live under `./.cache/dbt_trivial_<pid>/`.\n","encoding":"utf-8","truncated":false,"total_bytes":4231},"status":null}