{"data":{"kind":"file","path":"README.md","version_id":"rhtjl3yeoub6zk96jrm5whde","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":12062,"modified_at":"2026-01-16T20:31:52.627000","content_hash":"ec4a4a08881a0bbd36af9f9700f9ce51acb57dc38c7a0c595fc4c71b389ca9f8"},"entries":[],"content":"# PII Detection RL Environment\n\nTraining models to identify and extract Personally Identifiable Information (PII) in structured JSON format.\n\n## Overview\n\nThis environment trains language models to:\n1. **Identify all PII entities** in unstructured text\n2. **Extract exact text spans** for each entity\n3. **Classify entities** with appropriate PII type labels\n4. **Output valid JSON** with consistent structure\n\nBased on the [nvidia/Nemotron-PII](https://huggingface.co/datasets/nvidia/Nemotron-PII) dataset (100k synthetic PII examples across 55+ entity types).\n\n## Expected Output Format\n\n```json\n[\n  {\"text\": \"Jason\", \"label\": \"first_name\"},\n  {\"text\": \"1987-05-22\", \"label\": \"date_of_birth\"},\n  {\"text\": \"87 Avenida De La Estrella\", \"label\": \"street_address\"}\n]\n```\n\n**Key requirements:**\n- Must be a valid JSON array\n- Each object must have `text` and `label` fields\n- Labels should be lowercase with underscores (e.g., `phone_number`)\n- Extract the exact text span as it appears in the source\n\n## PII Entity Types\n\nThe environment covers 55+ PII/PHI categories from the Nemotron-PII dataset:\n\n**Personal Identifiers:**\n- `first_name`, `last_name`, `full_name`\n- `date_of_birth`, `age`\n- `gender`, `race_ethnicity`\n\n**Contact Information:**\n- `email`, `phone_number`, `fax_number`\n- `street_address`, `city`, `state`, `postcode`, `country`\n\n**Financial/Government IDs:**\n- `ssn` (Social Security Number)\n- `credit_debit_card`, `cvv`, `pin`\n- `account_number`, `customer_id`\n- `bank_routing_number`, `swift_bic`\n\n**Healthcare (PHI):**\n- `medical_record_number`\n- `blood_type`\n- `health_plan_beneficiary_number`\n\n**Professional:**\n- `employee_id`, `occupation`\n- `education_level`\n- `certificate_license_number`\n\n**Technical:**\n- `user_name`, `password`\n- `ipv4`, `mac_address`\n- `device_identifier`\n- `url`, `http_cookie`\n\n**Sensitive Attributes:**\n- `political_view`, `religious_belief`, `sexuality`\n- `biometric_identifier`\n\n**Temporal:**\n- `date`, `time`, `date_time`\n\n**Transportation:**\n- `vehicle_identifier`, `license_plate`\n\n## Reward Function Design\n\n### Multi-Component Rubric\n\nThe reward function uses 5 components to provide strong learning signals:\n\n| Component | Weight | Purpose |\n|-----------|--------|---------|\n| **JSON Format** | 30% | Ensures valid JSON structure (CRITICAL) |\n| **Entity Recall** | 25% | Finds all PII entities (completeness) |\n| **Entity Precision** | 25% | Avoids false positives (accuracy) |\n| **Label Accuracy** | 15% | Correct PII type classification |\n| **Output Consistency** | 5% | Quality checks (no duplicates, clean format) |\n\n### 1. JSON Format Validation (30% weight) 🎯\n\n**This is the most critical component** - the model MUST learn valid JSON output.\n\nScoring:\n- `1.0`: Perfect JSON array with all entities correctly structured\n- `0.9`: Valid JSON, 80%+ entities have correct structure\n- `0.7`: Valid JSON, 50-80% entities correct\n- `0.5`: Parseable JSON but wrong structure or <50% valid\n- `0.3`: JSON found but malformed\n- `0.0`: No valid JSON found\n\n**Robust parsing strategies:**\n1. Direct JSON parsing\n2. Extract from markdown code blocks (`` ```json ... ``` ``)\n3. Find JSON array in mixed text\n4. Auto-fix common errors (trailing commas)\n\n### 2. Entity Recall (25% weight)\n\nMeasures completeness: `found_entities / total_ground_truth_entities`\n\n- Exact text + label match: 1.0 credit\n- Fuzzy text match (contains/contained): 0.7 credit\n- Handles case differences and minor text variations\n\n**Example:**\n```\nGround truth: 3 entities\nModel finds: 2 exact + 1 fuzzy match\nRecall = (2 + 0.7) / 3 = 0.90\n```\n\n### 3. Entity Precision (25% weight)\n\nPenalizes hallucinations: `correct_predictions / total_predictions`\n\n- Prevents the model from over-predicting\n- Uses same fuzzy matching as recall\n- Returns 1.0 if no predictions and no ground truth (correct empty case)\n\n**Example:**\n```\nModel predicts: 4 entities\nCorrect: 3 entities\nPrecision = 3 / 4 = 0.75\n```\n\n### 4. Label Accuracy (15% weight)\n\nFor correctly identified entities (text match), checks label correctness.\n\n- Only evaluates entities that were successfully located\n- Focuses on classification accuracy separate from detection\n\n**Example:**\n```\nModel finds \"Jason\" → labels as \"full_name\" (ground truth: \"first_name\")\nText match ✓, Label match ✗\n```\n\n### 5. Output Consistency (5% weight)\n\nQuality checks:\n- ✓ No duplicate entities (+0.3)\n- ✓ Consistent label format - lowercase with underscores (+0.3)\n- ✓ Reasonable entity count - within 50% of expected (+0.2)\n- ✓ Clean text extraction - no extra whitespace/special chars (+0.2)\n\n## Dataset Examples\n\nThe environment includes 10 diverse examples covering:\n\n1. **Financial Services** - Account application with name, DOB, address\n2. **Healthcare** - Patient record with MRN, doctor, blood type, contacts\n3. **HR Document** - Employee file with ID, SSN, salary, occupation\n4. **Payment Info** - Credit card details with billing address\n5. **Technical** - Login credentials, IP, MAC address\n6. **Vehicle Registration** - License, VIN, license plate\n7. **Banking** - Account numbers, routing, SWIFT codes\n8. **Demographics** - Age, gender, ethnicity, political/religious views\n9. **Customer Service** - Customer ID, phone, fax, address\n10. **Scheduling** - Meeting with emails, phone numbers, PIN\n\n80/20 train/eval split = **8 training examples, 2 eval examples**\n\n## Design Rationale\n\n### Why 30% weight on JSON format?\n\n**Problem:** Language models often fail at strict structured output, especially JSON.\n\n**Solution:** Heavy emphasis on format ensures the model learns this critical skill first. Without valid JSON, the output is unusable regardless of content quality.\n\n**Learning progression:**\n1. Steps 1-30: Model learns to output `[]` (basic JSON structure)\n2. Steps 31-60: Model learns `[{\"text\": \"...\", \"label\": \"...\"}]` format\n3. Steps 61+: Model learns to populate with correct entities\n\n### Why balanced recall + precision (25% each)?\n\n**Problem:** Optimizing only recall → model over-predicts (many false positives)\n**Problem:** Optimizing only precision → model under-predicts (misses entities)\n\n**Solution:** Equal weighting creates balanced F1-like optimization.\n\n### Why fuzzy text matching?\n\n**Problem:** Exact string matching is too brittle.\n\n**Example:**\n```\nGround truth: \"87 Avenida De La Estrella\"\nModel outputs: \"87 Avenida De La Estrella, Apt 2B\"\n\nExact match: ✗ (0.0 reward)\nFuzzy match: ✓ (0.7 reward)  ← Better learning signal!\n```\n\nModels learn the core task (identifying PII location and type) even with minor text span differences.\n\n### Why separate label accuracy component?\n\n**Problem:** Conflating detection and classification makes debugging hard.\n\n**Solution:** Separate metrics allow us to see:\n- Is the model finding the right entities? (Recall/Precision)\n- Is the model classifying them correctly? (Label Accuracy)\n\nThis enables targeted improvements if one skill lags behind.\n\n## Usage\n\n### Local Testing\n\nThe local test file is architecture-dependent. If you have Python environment issues, you can deploy and test on Prime Hub directly.\n\n### Training on Prime Hub\n\n1. **Build and push environment:**\n```bash\ncd PII_Detection\nprime env build\nprime env push maziyar/PII_Detection\n```\n\n2. **Start training:**\n```bash\nprime rl run configs/PII/v1-4b-thinking-150steps.toml\n```\n\n3. **Monitor progress:**\n```bash\n# List runs\nprime rl list\n\n# Watch logs\nprime rl logs <run_id> --follow\n```\n\n### Expected Training Dynamics\n\n**Early steps (1-50):**\n- Format score jumps to ~0.7-0.9 (model learns JSON structure)\n- Recall/Precision near 0 (random entities)\n- Total reward: ~0.2-0.3\n\n**Mid training (51-100):**\n- Format score stabilizes at ~0.95-1.0\n- Recall starts climbing (0.3 → 0.6)\n- Precision starts climbing (0.3 → 0.6)\n- Label accuracy improves (0.2 → 0.5)\n- Total reward: ~0.5-0.7\n\n**Late training (101-150):**\n- All components improve together\n- Model learns entity-label associations\n- Total reward: ~0.7-0.85+\n\n**Target final performance:**\n- Format: 0.95-1.0 (near perfect JSON)\n- Recall: 0.70-0.85 (finds most entities)\n- Precision: 0.75-0.90 (few false positives)\n- Labels: 0.60-0.80 (mostly correct classifications)\n- Total: 0.75-0.85\n\n## Extending the Environment\n\n### Adding More Examples\n\nTo expand beyond 10 examples, add to `PII_EXAMPLES` in `PII_Detection.py`:\n\n```python\n{\n    \"text\": \"Your text with PII here...\",\n    \"entities\": [\n        {\"text\": \"entity1\", \"label\": \"pii_type1\"},\n        {\"text\": \"entity2\", \"label\": \"pii_type2\"}\n    ],\n    \"category\": \"domain_name\"\n}\n```\n\n**Recommendation:** Start with 50-100 examples for production use. The Nemotron-PII dataset has 200k examples you can sample from.\n\n### Adjusting Reward Weights\n\nModify weights in `PII_Detection.__init__()`:\n\n```python\nrubric = vf.Rubric(\n    parser=parser,\n    funcs=[...],\n    weights=[0.30, 0.25, 0.25, 0.15, 0.05]  # Adjust these\n    #        ^^^^  ^^^^  ^^^^  ^^^^  ^^^^\n    #      format recall prec labels consist\n)\n```\n\n**Scenarios:**\n- Model struggles with JSON → increase format weight to 0.40-0.50\n- Model has good format but poor recall → increase recall to 0.30\n- Too many false positives → increase precision to 0.30\n- Wrong label classifications → increase labels to 0.20\n\n### Adding Custom Reward Components\n\nYou can add domain-specific scoring:\n\n```python\ndef _evaluate_sensitive_pii(completion, answer, **kwargs):\n    \"\"\"Higher weight for detecting sensitive PII (SSN, passwords, etc.)\"\"\"\n    sensitive_types = {'ssn', 'password', 'credit_debit_card', 'medical_record_number'}\n    # Implementation...\n    return score\n```\n\nThen add to rubric with appropriate weight.\n\n## Troubleshooting\n\n### Issue: Format score stuck at 0\n\n**Cause:** Model not learning JSON structure\n\n**Fix:**\n1. Increase format weight to 0.40-0.50\n2. Reduce max_tokens (model might be generating too much text)\n3. Check if base model has JSON generation capability\n\n### Issue: Recall/Precision imbalance\n\n**Cause:** Uneven learning\n\n**Fix:**\n1. Adjust weights (increase the lagging metric)\n2. Check examples - ensure they're not too easy/hard\n3. Increase rollouts_per_example for more exploration\n\n### Issue: Low label accuracy despite good detection\n\n**Cause:** Ambiguous or confusing label categories\n\n**Fix:**\n1. Review label definitions - are they clear?\n2. Add more examples for confusing label pairs\n3. Consider merging very similar labels\n\n## Comparison with Other Approaches\n\n### Why RL instead of Supervised Fine-Tuning?\n\n**RL advantages:**\n1. **Learns from rewards** - can optimize for exact metrics you care about\n2. **Handles imperfect data** - fuzzy matching allows learning from slight variations\n3. **Iterative improvement** - explores different strategies through rollouts\n4. **No need for perfect labels** - reward function handles fuzzy boundaries\n\n**SFT disadvantages:**\n1. Requires perfectly formatted training data\n2. Learns to mimic, not optimize specific objectives\n3. Can't easily handle trade-offs (recall vs precision)\n\n### Why Not Use Token-Level Loss?\n\nTraditional NER uses BIO tagging with token-level cross-entropy loss.\n\n**This approach is better for RL because:**\n1. **End-to-end optimization** - optimizes final JSON output quality\n2. **Structured output** - JSON is more useful than BIO tags\n3. **Multi-objective** - balances format, recall, precision simultaneously\n4. **Flexible matching** - fuzzy text matching is hard with token-level labels\n\n## License\n\nThis environment is designed for training on the nvidia/Nemotron-PII dataset, which is licensed under CC BY 4.0 (commercial use allowed).\n\n## Citation\n\nIf using this environment, please cite:\n\n```bibtex\n@software{pii_detection_rl_env,\n  title = {PII Detection RL Environment},\n  author = {Maziyar},\n  year = {2026},\n  description = {Reinforcement learning environment for training models on PII detection with structured JSON output}\n}\n\n@dataset{nemotron_pii,\n  title = {Nemotron-PII: A Synthetic Dataset for Entity Recognition},\n  author = {NVIDIA},\n  year = {2024},\n  url = {https://huggingface.co/datasets/nvidia/Nemotron-PII}\n}\n```\n","encoding":"utf-8","truncated":false,"total_bytes":12062},"status":null}