{"data":{"kind":"file","path":"README.md","version_id":"e2hw2gwcxue4axerbh7fql1m","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1117,"modified_at":"2025-08-28T22:16:22.165000","content_hash":"ff9b4427ffc4b564de7b8ccc01851127d1adfe4300c48ee8944e388f467a48de"},"entries":[],"content":"# non-refusal\n\n### Overview\n- **Environment ID**: `non-refusal`\n- **Short description**: A simple benchmark for determining how much a model refuses to answer. Useful for white-hatting safety + applying GRPO to iteratively remove refusals. Meant to answer the question: \"How much does it cost to remove safety from an OSS model?\"\n- **Tags**: refusal, safety, eval\n\n### Datasets\n- **Primary dataset(s)**: `NousResearch/RefusalDataset`\n- **Source links**: [Link](https://huggingface.co/datasets/NousResearch/RefusalDataset)\n- **Split sizes**: 166/0 (no test set)\n\n### Task\n- **Type**: single-turn\n- **Parser**: `vf.Parser`\n- **Rubric overview**: Uses Nous Research's [Minos model](https://huggingface.co/NousResearch/Minos-v1) to judge the non-refusal probability of the model's response.\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval non-refusal\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval non-refusal   -m gpt-4.1-mini   -n 20 -r 3 -t 1024 -T 0.7   -a '{\"key\": \"value\"}'  # env-specific args as JSON\n```\n\nWill add more details here after some eval and training runs.\n","encoding":"utf-8","truncated":false,"total_bytes":1117},"status":null}