{"data":{"kind":"file","path":"README.md","version_id":"cq9hvaejfeevwyapm8vteqlf","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3652,"modified_at":"2026-03-31T22:50:35.444000","content_hash":"5b013a5161fae75772aa562f2532e5b3ae880dd7e35ae10a1cb311fc7220eb9a"},"entries":[],"content":"# Advanced IF (`advanced_if`)\n\nVerifiers **single-turn** environment over [facebook/AdvancedIF](https://huggingface.co/datasets/facebook/AdvancedIF): the model reads a full conversation trajectory and emits a JSON rubric list; an LLM judge scores alignment with the gold rubrics.\n\n**Environment ID (Prime / verifiers / Hub):** `advanced_if` — matches the `env_id` in `advanced_if.py`. The installable package name in `pyproject.toml` is `advanced-if` (hyphens); import module is `advanced_if`.\n\n## What this environment does\n\n1. Builds rollouts from the Hub row: `conversation_history` is rendered as role-tagged text; gold rubrics come from `prompt_metadata.rubrics` (stored as the rollout `answer`).\n2. The policy completes in one assistant turn with JSON of the form `{\"rubrics\": [\"…\", …]}`.\n3. `AdvancedIFJudgeRubric` (`vf.JudgeRubric`) calls a judge model; reward is the mean of three booleans the judge must return as JSON: `coverage`, `faithful`, `non_redundant` (no synonyms or loose coercion).\n4. Optional `attach_dataset_stats` attaches split-level histograms to the rubric as `dataset_stats` for custom metrics.\n\n## Repository layout\n\n| Path | Purpose |\n|------|---------|\n| `advanced_if.py` | Entrypoint: `AdvancedIFEnv`, `load_environment()` for Prime / verifiers |\n| `core/config.py` | `EnvironmentConfig` |\n| `core/dataset.py` | Hub → rollout rows, `analyze_dataset`, `build_dataset` |\n| `core/rubrics.py` | `AdvancedIFJudgeRubric`, judge client wiring |\n| `core/prompts.py` | System / user / judge prompt strings |\n| `configs/debug.toml` | Local smoke eval preset |\n| `configs/endpoints.toml` | Optional Prime Inference endpoint registry |\n\n## Setup\n\nRequires **Python 3.11+**. Use **`uv`** from this directory.\n\n```bash\ncd advanced_if\nuv sync\nuv run python -c \"import advanced_if; print(advanced_if.load_environment)\"\n```\n\n**Dev:** `uv sync --group dev` (Ruff).\n\n## Running evaluations\n\nRun commands from **`advanced_if/`** (the directory that contains this package’s `pyproject.toml`).\n\n### Local\n\n```bash\nuv run prime eval run configs/debug.toml\n```\n\nSet the API key for the judge client (default `judge_client_config.api_key_var` is `PRIME_API_KEY`). Point TOML at `configs/endpoints.toml` if you use a shared registry (`endpoints_path` in the eval config).\n\nAdd your own `configs/eval.toml` by copying a sibling env’s eval TOML shape: `[[eval]]`, `env_id = \"advanced_if\"`, `env_dir_path = \".\"`, and `[eval.env_args]` for `EnvironmentConfig` fields.\n\n## Environment config\n\n`load_environment(config, **kwargs)` builds an **`EnvironmentConfig`** (`core/config.py`). Dict kwargs are merged the same way as a single mapping.\n\n| Field | Role |\n|------|------|\n| `dataset_name` / `dataset_split` | Hub dataset (default `facebook/AdvancedIF` / `train`) |\n| `max_examples` / `seed` | Subset and shuffle for `build_dataset` |\n| `judge_model` / `judge_sampling_args` | Judge LLM id and sampling kwargs |\n| `judge_client_config` | `verifiers.types.ClientConfig` (defaults: Prime Inference URL + `PRIME_API_KEY`) |\n| `max_turns` | Fixed at `1` in practice (`SingleTurnEnv`) |\n| `attach_dataset_stats` | If true, rubric receives `dataset_stats` from `analyze_dataset` |\n\nThe judge client’s `api_key_var` must be present in the process environment (`verifiers` `ensure_keys`); there is no file-based key fallback.\n\n## Dataset snapshot (`train`)\n\n| | |\n|--:|--:|\n| Rows | 1645 |\n| `carried_context_multi_turn_eval_v5` | 736 |\n| `system_steerability_v2` | 507 |\n| `complex_if_single_turn_v5` | 402 |\n\nRubrics per example range roughly 1–20; full histograms are produced by `analyze_dataset` when `attach_dataset_stats` is enabled.\n","encoding":"utf-8","truncated":false,"total_bytes":3652},"status":null}