{"data":{"kind":"file","path":"README.md","version_id":"j5i14glf5n2krc495v2jvu3d","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3013,"modified_at":"2025-10-31T21:07:35.225000","content_hash":"0623429b4147aa1a802dba6e86d0d05e7d948d2519222a56f13233e947ba274f"},"entries":[],"content":"# PII Masking RL Environment\n\nA PII masking RL environment made with [verifiers.](https://github.com/PrimeIntellect-ai/verifiers)\n\nSee this environment on the Prime Intellect Environments Hub Here: https://app.primeintellect.ai/dashboard/environments/adamlucek/pii-masking\n\nThis environment has been used to train:\n- [AdamLucek/Qwen3-4B-Instruct-2507-PII-RL](https://huggingface.co/AdamLucek/Qwen3-4B-Instruct-2507-PII-RL)\n\n### Overview\n- **Environment ID**: `pii-masking`\n- **Short description**: Evaluates models' ability to identify and mask personally identifiable information (PII) in text by replacing it with `[PII]` tags.\n- **Tags**: `pii`, `privacy`, `redaction`, `single-turn`\n\n### Datasets\n- **Primary dataset**: [AdamLucek/open-pii-masking-en-us-30k](https://huggingface.co/datasets/AdamLucek/open-pii-masking-en-us-30k)\n- **Source links**: A filtered and transformed subset of [ai4privacy/open-pii-masking-500k-ai4privacy](https://huggingface.co/datasets/ai4privacy/open-pii-masking-500k-ai4privacy) filtered for only US English examples where all unique PII labels have been removed and replaced with a [PII] mask.\n- **Split sizes**: Default 80% train, 20% eval (configurable via `num_eval_examples`)\n\n### Task\n- **Type**: `single-turn`\n- **Parser**: `XMLParser` with `masked_output` field\n- **Rubric overview**: Exact match (100%), PII count (50%), format compliance (10%)\n\n### Quickstart\n\nInstall the prime CLI and verifiers [by following this Quick Start guide.](https://github.com/PrimeIntellect-ai/verifiers?tab=readme-ov-file#quick-start)\n\nRun an evaluation with default settings:\n```bash\nuv run vf-eval pii-masking\n```\n\nConfigure model and sampling:\n```bash\nuv run vf-eval pii-masking \\\n  -m gpt-4o-mini \\\n  -n 20 -r 3 -t 1024 -T 0.7 \\\n  -a '{\"num_train_examples\": 1000, \"num_eval_examples\": 200}'  # env-specific args as JSON\n```\n\n**Notes**:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_train_examples` | int | `-1` | Number of training examples to use. Use `-1` for all available examples. |\n| `num_eval_examples` | int | `-1` | Number of evaluation examples. If `-1`, uses 20% of the training dataset. Otherwise, uses the specified count. |\n| `random_seed` | int | `42` | Random seed for dataset splitting. Ensures reproducible train/eval splits. |\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Main scalar reward (weighted sum of exact match, PII count, format compliance) |\n| `exact_match_reward` | Binary reward (1.0 if perfect match, 0.0 otherwise). Compares parsed masked output character-by-character with expected answer. |\n| `pii_count_reward` | Binary reward (1.0 if correct, 0.0 otherwise). Checks if number of `[PII]` tags matches expected count. |\n| `format_reward` | Parser-generated format reward. Ensures output is properly formatted with valid XML tags (`<masked_output>...</masked_output>`). |\n","encoding":"utf-8","truncated":false,"total_bytes":3013},"status":null}