{"data":{"kind":"file","path":"README.md","version_id":"x5q0ecl2f6aqu3wcqvd4d5bt","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1521,"modified_at":"2026-04-01T22:45:03","content_hash":"abe32126dd8a4f3ff0706360ecf8fa1a59bfaa2917cfb8b85656c737156fb46c"},"entries":[],"content":"# if_summarize_judge\n\nVerifiers environment for evaluating instruction-following on Wikipedia article summarization.\n\n## What it does\n\nGiven a Wikipedia article, the model must produce a constrained summary (e.g. \"exactly 5 words\", \"3 decreasing-length sentences\", \"newspaper headline in ALL CAPS\"). A judge model scores whether the structural constraint was met.\n\n## Constraints\n\n17 held-out constraint types covering exact word/sentence counts, punctuation rules, format requirements, and structural patterns. See `EVAL_CONSTRAINTS` in the source.\n\n## Judge\n\nDefaults to `gpt-4.1-mini` via Prime Inference. Pass `judge_url` and `judge_model` to use a local vLLM instance instead.\n\n## Usage\n\n```bash\n# Install\nprime env install kalomaze/if_summarize_judge\n\n# Eval with remote judge\nvf-eval if_summarize_judge \\\n  --num-examples 16 --rollouts-per-example 4 \\\n  -b http://localhost:8000/v1 --model your-model\n\n# Eval with local judge\nvf-eval if_summarize_judge \\\n  --num-examples 16 --rollouts-per-example 4 \\\n  -b http://localhost:8000/v1 --model your-model \\\n  --env-args '{\"judge_url\": \"http://localhost:8067/v1\", \"judge_model\": \"your-judge-model\"}'\n\n# Save rollout logs\nvf-eval if_summarize_judge \\\n  --env-args '{\"save_rollouts\": true, \"save_rollouts_path\": \"rollouts.jsonl\"}'\n```\n\n## Data source\n\nArticles from [kalomaze/glm-wikisummary-if-it4-think](https://huggingface.co/datasets/kalomaze/glm-wikisummary-if-it4-think). The environment strips the original training constraint and replaces it with a held-out one.\n","encoding":"utf-8","truncated":false,"total_bytes":1521},"status":null}