{"data":{"kind":"file","path":"README.md","version_id":"bbp3hy9x83uqd5ntc0c5zjtc","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3517,"modified_at":"2026-02-12T00:36:03.901000","content_hash":"2d8a13662a093ceb3a91c1451e8c30fd228a35d09ef911bbc634b17b4a171bca"},"entries":[],"content":"# phishing-detector\n\n### Overview\n- **Environment ID**: `wambosec/phishing-detector`\n- **Short description**: LLM-generated phishing and legitimate emails scored by agent classification accuracy.\n- **Tags**: single-turn, security, phishing, train, eval\n\n### How It Works\n\n1. At load time, an LLM (default: `qwen/qwen3-235b-a22b-instruct-2507` via Prime API) generates a mix of phishing and legitimate emails targeting Prime Intellect employees.\n2. Each email is presented to the model under evaluation along with a system prompt containing company context and a team directory.\n3. The model must respond with a summary, suggested reaction, and a `<phishing>TRUE</phishing>` or `<phishing>FALSE</phishing>` tag.\n4. The reward function checks whether the classification matches ground truth.\n\nPhishing emails use varied tactics:\n- Domain spoofing (wrong TLD, homoglyphs, mixed alphabets)\n- Generic corporate senders (`support@`, `hr@`, `security@`, `noreply@`)\n- External service impersonation (GitHub, Google, AWS, Slack, DocuSign)\n- BEC, credential harvesting, fake invoices, malicious calendar invites\n- Urgency is optional — many phishing emails are calm and routine-sounding\n\nLegitimate emails are normal internal business communications between real team members (project updates, code reviews, meetings, etc.). Both phishing and legit emails reference other team members naturally.\n\nEmails are kept short (5-8 sentences max).\n\n### Datasets\n- **Primary dataset(s)**: Dynamically generated at environment load time via LLM API call. No static dataset.\n- **Split sizes**: Controlled by `num_examples` (default 20), split by `phishing_ratio` (default 50/50).\n\n### Task\n- **Type**: single-turn\n- **Output format**: `<phishing>TRUE</phishing>` or `<phishing>FALSE</phishing>` XML tag\n- **Rubric overview**: Classification reward (`phishing_correct`, weight 0.8) + format reward (`correct_format`, weight 0.2) + TP/FP/TN/FN metrics\n\n### Quickstart\n\n```bash\nprime env install phishing-detector\nprime eval run wambosec/phishing-detector -m qwen/qwen3-vl-30b-a3b-instruct -n 5 -r 10\n```\n\n### Required Environment Variables\n\n| Variable | Description |\n| -------- | ----------- |\n| `PRIME_API_KEY` | API key for the Prime Intellect inference API (used for email generation) |\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_examples` | int | `20` | Number of emails to generate |\n| `generator_model` | str | `\"qwen/qwen3-235b-a22b-instruct-2507\"` | Model used to generate emails |\n| `generator_base_url` | str | `\"https://api.pinference.ai/api/v1\"` | API endpoint for email generation |\n| `generator_api_key_var` | str | `\"PRIME_API_KEY\"` | Environment variable holding the API key |\n| `phishing_ratio` | float | `0.5` | Fraction of emails that are phishing |\n| `seed` | int | `42` | RNG seed for reproducibility |\n| `max_workers` | int | `10` | Concurrent threads for email generation |\n\n### Reward\n\n| Component | Weight | Description |\n| --------- | ------ | ----------- |\n| `phishing_correct` | 0.8 | +1 if classification matches ground truth, +0 otherwise |\n| `correct_format` | 0.2 | +1 if valid `<phishing>TRUE/FALSE</phishing>` tag present, +0 otherwise |\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `true_positive` | Phishing correctly flagged as phishing |\n| `false_positive` | Legit incorrectly flagged as phishing |\n| `true_negative` | Legit correctly identified as safe |\n| `false_negative` | Phishing incorrectly marked as safe |\n","encoding":"utf-8","truncated":false,"total_bytes":3517},"status":null}