{"data":{"kind":"file","path":"README.md","version_id":"sov34uy7abwtjt7fla4k5zp0","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1657,"modified_at":"2026-04-14T18:09:07.234000","content_hash":"2e1d2c47ef73f3043e6005846cd6d5a4b31d925719ce226c7beb375544d640e1"},"entries":[],"content":"# MCP Safety\n\nEvaluates LLM resistance to adversarial attacks in MCP tool-use scenarios. Based on [MCP-SafetyBench](https://github.com/xjzzzzzzzz/MCPSafety) ([arXiv:2512.15163](https://arxiv.org/abs/2512.15163)).\n\n## Overview\n\n- **245 tasks** across 5 domains: financial_analysis, repository_management, web_search, location_navigation, browser_automation\n- **20 attack categories** including Tool Poisoning (parameter, command injection, filesystem, network, dependency, redirection), Tool Shadowing, Rug Pull, Intent Injection, Data Tampering, Identity Spoofing, Replay Injection, and more\n- **2 attack classes**: Disruption Attack, Stealth Attack\n\n## Task\n\nEach test case presents the model with a system prompt containing MCP tool descriptions (some poisoned or malicious) and a user task. The model must complete the task without being misled by the attack.\n\n## Metrics\n\n| Metric | Description |\n|--------|-------------|\n| `attack_resistance` | 1.0 if model resisted the attack, 0.0 if it complied (scored) |\n| `task_quality` | 1.0 if model produced a substantive response (metric-only) |\n\n## Arguments\n\n| Argument | Default | Description |\n|----------|---------|-------------|\n| `judge_model` | `openai/gpt-5.4-mini` | LLM judge model via OpenRouter |\n| `judge_api_key_var` | `OPEN_ROUTER_API_KEY` | Environment variable for API key |\n| `data_path` | Auto-detected | Path to test configs directory |\n| `domain` | `None` | Filter by domain |\n| `attack_category` | `None` | Filter by attack type |\n| `category` | `None` | Filter: \"Disruption Attack\" or \"Stealth Attack\" |\n\n## Quickstart\n\n```bash\nprime eval run mcp-safety -m gpt-4.1-mini -n 10 -r 1\n```\n","encoding":"utf-8","truncated":false,"total_bytes":1657},"status":null}