{"data":{"kind":"file","path":"README.md","version_id":"mtyy326cm3bqiewumsfr55lx","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":7194,"modified_at":"2025-10-23T05:01:05.938000","content_hash":"bc06c21dd1de3533ad2cf3c5a6d3277f96c5fde80f83b253058251b586a37d52"},"entries":[],"content":"# Phishing Detection with Evidence\n\nA tool-using RL environment for training and evaluating models on phishing detection with evidence-based reasoning. Models analyze emails for phishing indicators, search for corroborating evidence, and provide justified classifications.\n\n## Overview\n\nThis environment implements evidence-seeking phishing detection, combining content analysis with external validation tools to identify and explain phishing attempts.\n\n**Environment Type**: `ToolEnv` - Multi-turn environment with tool access\n**Task**: Classify emails as phishing/legitimate with evidence-based justification\n**Tools**: URL reputation checker, domain WHOIS lookup, content similarity search\n**Reward Structure**: Classification accuracy + evidence quality + explanation coherence\n\n## Installation\n\nInstall the environment using the Prime CLI:\n\n```bash\nprime env install intertwine/sv-env-phishing-detection\n```\n\nOr using pip directly:\n\n```bash\npip install sv-env-phishing-detection\n```\n\n## Setup\n\n### API Keys Configuration\n\nSet your API keys as environment variables:\n\n```bash\n# OpenAI API Key (required for OpenAI models)\nexport OPENAI_API_KEY=\"your-openai-api-key\"\n\n# For persistent configuration\necho 'export OPENAI_API_KEY=\"your-key\"' >> ~/.bashrc\nsource ~/.bashrc\n```\n\n## Usage\n\n### With Verifiers Library\n\n```python\nimport verifiers as vf\n\n# Load the environment with tools enabled\nenv = vf.load_environment(\"intertwine/sv-env-phishing-detection\", include_tools=True)\n\n# Evaluate a model\nresults = env.evaluate(\n    client=vf.OpenAIClient(),\n    model=\"gpt-5-mini\",\n    num_examples=10\n)\n\nprint(f\"Average reward: {results.stats['mean_reward']:.2%}\")\nprint(f\"Detection accuracy: {results.stats.get('accuracy', 0):.2%}\")\n```\n\n### Quick Evaluation\n\nUse the verifiers CLI:\n\n```bash\n# Basic evaluation with tools\nvf-eval intertwine/sv-env-phishing-detection \\\n  --model gpt-5-mini \\\n  --num-examples 10\n\n# Without tools (content analysis only)\nvf-eval intertwine/sv-env-phishing-detection \\\n  --model gpt-5-mini \\\n  --num-examples 10 \\\n  --include-tools false\n```\n\n### Training with Prime RL\n\n```toml\n[environment]\nid = \"intertwine/sv-env-phishing-detection\"\nkwargs = {include_tools = true}\n```\n\n## Task Details\n\n### Input Format\n\nEmail content with headers and body:\n\n```text\nFrom: security@amaz0n-support.com\nSubject: Urgent: Account Security Alert\nBody: Your Amazon account has been compromised. Click here to secure it: http://bit.ly/secure-amz\n```\n\n### Expected Output\n\nJSON object with classification and evidence:\n\n```json\n{\n  \"label\": \"Phishing\",\n  \"confidence\": 0.95,\n  \"evidence\": [\n    \"Spoofed sender domain (amaz0n-support.com vs amazon.com)\",\n    \"Suspicious URL shortener (bit.ly)\",\n    \"Urgency tactics in subject line\"\n  ],\n  \"explanation\": \"Email exhibits multiple phishing indicators including domain spoofing and social engineering tactics\"\n}\n```\n\n### Available Tools\n\nWhen `include_tools=True`, the model has access to:\n\n1. **check_url_reputation**: Analyze URL safety and reputation\n2. **lookup_domain_whois**: Get domain registration details\n3. **search_similar_campaigns**: Find similar phishing patterns\n4. **verify_sender_authenticity**: Check SPF/DKIM records\n\n### Scoring\n\nThe environment uses a weighted rubric:\n\n- **Classification Accuracy** (50%): Correct phishing/legitimate determination\n- **Evidence Quality** (30%): Relevant and verifiable indicators\n- **Explanation Coherence** (10%): Clear reasoning from evidence\n- **Tool Utilization** (10%): Effective use of verification tools\n\n## Weights & Biases Logging\n\nThis environment supports automatic Weave tracing:\n\n```python\nimport weave\nimport verifiers as vf\n\n# Initialize Weave\nweave.init(project=\"phishing-detection\")\n\n# Load and evaluate\nenv = vf.load_environment(\"intertwine/sv-env-phishing-detection\", include_tools=True)\nresults = env.evaluate(\n    client=vf.OpenAIClient(),\n    model=\"gpt-5-mini\",\n    num_examples=50\n)\n\n# Results automatically traced to W&B\n```\n\nConfigure via environment variables:\n- `WEAVE_PROJECT`: Set project name\n- `WEAVE_DISABLED`: Set to 'true' to disable logging\n- `WANDB_API_KEY`: Your W&B API key\n\n## Evaluation Approach\n\n### Metrics Tracked\n- **Detection Accuracy**: Phishing vs legitimate classification\n- **False Positive Rate**: Legitimate emails marked as phishing\n- **False Negative Rate**: Phishing emails missed (critical metric)\n- **Evidence Precision**: Validity of cited indicators\n- **Response Time**: Tool usage efficiency\n\n### Example Evaluation Script\n\n```python\nimport verifiers as vf\nimport weave\n\nweave.init(project=\"phishing-eval\")\n\nenv = vf.load_environment(\"intertwine/sv-env-phishing-detection\", include_tools=True)\n\n# Evaluate with focus on reducing false negatives\nresults = env.evaluate(\n    client=vf.OpenAIClient(),\n    model=\"gpt-5-mini\",\n    num_examples=200,\n    seed=42\n)\n\nprint(f\"Mean Reward: {results.stats['mean_reward']:.2%}\")\nprint(f\"Accuracy: {results.stats.get('accuracy', 0):.2%}\")\nprint(f\"False Positives: {results.stats.get('false_positive_rate', 0):.2%}\")\nprint(f\"False Negatives: {results.stats.get('false_negative_rate', 0):.2%}\")\n```\n\n## Performance Benchmarks\n\n| Model       | Accuracy | False Positives | False Negatives | Overall |\n|-------------|----------|-----------------|-----------------|---------|\n| GPT-4o-mini | 87%      | 8%              | 5%              | 82%     |\n| GPT-4o      | 93%      | 4%              | 3%              | 89%     |\n\n## Phishing Tactics Covered\n\nThe environment includes diverse phishing techniques:\n\n- **Domain Spoofing**: Lookalike domains, homoglyphs\n- **URL Obfuscation**: Shorteners, redirects, embedded links\n- **Social Engineering**: Urgency, authority, scarcity tactics\n- **Credential Harvesting**: Fake login pages, form requests\n- **Attachment Threats**: Malicious documents, executables\n- **Business Email Compromise**: CEO fraud, invoice scams\n\n## Dataset\n\n- **Phishing Samples**: Real-world inspired phishing emails\n- **Legitimate Emails**: Business, personal, and marketing emails\n- **Evidence Database**: Known phishing domains, campaigns\n- **Validation Data**: SPF/DKIM records, WHOIS information\n\n## Future Improvements\n\n- **Attachment Analysis**: Scan documents and executables for threats\n- **Multi-language Support**: Detect phishing in non-English emails\n- **Real-time Threat Intelligence**: Integration with threat feeds\n- **User Context**: Personalized detection based on user patterns\n- **Campaign Tracking**: Link related phishing attempts\n- **Automated Response**: Generate warning messages and remediation steps\n\n## Requirements\n\n- Python 3.12+\n- `verifiers>=0.1.4`\n- API key for model inference\n\n## About\n\nThis environment is part of the Open Security Verifiers suite - a collection of security and alignment RL environments using Prime Intellect's Verifiers framework. Each environment provides executable, programmatic rewards for training robust security-aware AI systems.\n\n## Support\n\nFor issues or questions:\n- Report issues on the [Prime Intellect Environments Hub](https://app.primeintellect.ai/dashboard/environments)\n- Check the [Security Verifiers GitHub repository](https://github.com/intertwine/security-verifiers)\n- Contact the Intertwine team\n","encoding":"utf-8","truncated":false,"total_bytes":7194},"status":null}