{"data":{"kind":"file","path":"README.md","version_id":"ax57j1030hweqyi40i3bwcp3","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2634,"modified_at":"2026-03-22T19:54:51.118000","content_hash":"bba7ad00882fed772dd0a17e48515ab118d94b47b937f45f061007c9c89a6ac7"},"entries":[],"content":"# pydantic-adherence\n\n### Overview\n- **Environment ID**: `pydantic-adherence`\n- **Short description**: Multi-turn JSON-structured output validated against per-sample Pydantic models; parse or validation errors are returned to the model for retry until `max_turns` is reached.\n- **Tags**: json, structure, multi-turn, pydantic, parsing\n\n### Datasets\n- **Primary dataset(s)**: `justus27/pydantic-adherance-test` (HF) prompts with per-sample `verification_info`\n- **Source links**: [justus27/pydantic-adherance-test](https://huggingface.co/datasets/justus27/pydantic-adherance-test)\n- **Split sizes**: Uses `train` split\n\n### Task\n- **Type**: multi-turn\n- **Parser**: Custom `PydanticParser` requiring the final assistant message to be a standalone JSON object, with optional outer ```json fences\n- **Rubric overview**: Validates the final assistant JSON against a Pydantic model constructed from `verification_info` code, and applies a `-0.05` penalty for each assistant turn after the first; intermediate prose with embedded JSON does not count as success\n- **Turn loop**: After each failed attempt, the environment returns the parse or Pydantic validation error as a new user message so the model can retry until `max_turns`\n- **Dataset hygiene**: By default, rows with incompatible Pydantic schemas are filtered out at load time so they do not hard-cap reward during training or eval\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval pydantic-adherence\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval pydantic-adherence \\\n  -m gpt-4.1-mini \\\n  -n 20 -r 3 -t 1024 -T 0.7\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n\n### Environment Arguments\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `dataset_name` | str | `\"justus27/pydantic-adherance-test\"` | Name of the dataset to use |\n| `dataset_split` | str | `\"train\"` | Split of the dataset to use |\n| `max_turns` | int | `3` | Maximum number of model attempts before the rollout stops |\n| `drop_incompatible_examples` | bool | `true` | Skip dataset rows whose Pydantic schemas are incompatible with the runtime |\n\n### Metrics\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Total reward: `pydantic_adherence_reward_func - 0.05 * max(num_turns - 1, 0)` |\n| `pydantic_adherence_reward_func` | `1.0` if the final assistant JSON parses and validates against the Pydantic model; else `0.0` |\n| `turn_penalty_reward_func` | `-0.05` times the number of assistant turns after the first |\n| `num_turns` | Number of model turns used before success or stop |\n","encoding":"utf-8","truncated":false,"total_bytes":2634},"status":null}