{"data":{"kind":"file","path":"README.md","version_id":"qn9xw59csi9yp2o97rig2ko6","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":6545,"modified_at":"2026-02-07T06:31:23.921000","content_hash":"adcabe15889af422a2411acb8aaf3432c44f5bb98ddf12bbc5b25001ef8ca697"},"entries":[],"content":"# strix-xss\r\n\r\nXSS Specialist Training Environment with Native Strix Integration\r\n\r\n### Overview\r\n- **Environment ID**: `strix-xss`\r\n- **Short description**: Trains XSS specialist agents using real strix modules for maximum integration\r\n- **Tags**: xss, security, tool-use, multi-turn, juice-shop, strix-native\r\n\r\n### Integration with Strix\r\n\r\nThis environment achieves tight integration with the Strix penetration testing framework by:\r\n\r\n1. **Native Tool Execution**: Uses `strix.tools.execute_tool_with_validation` to execute tools exactly as production strix does\r\n2. **Native XML Parsing**: Uses `strix.interface.streaming_parser.StreamingXMLParser` for XML tool call parsing\r\n3. **Native Tool Schemas**: Uses `strix.tools.get_tools_prompt()` to get tool documentation directly from strix\r\n4. **Sandbox Runtime Support**: Integrates with `strix.runtime` for Docker-based sandbox execution\r\n5. **Agent State Compatibility**: Uses strix-compatible agent state structure for tool execution\r\n\r\nModels trained in this environment will work seamlessly with production strix deployments.\r\n\r\n### Datasets\n- **Primary dataset**: OWASP Juice Shop XSS Challenges\n- **Optional dataset**: Local XBEN XSS challenges from `../../xbow`\n- **Source**: Real-world vulnerable web application challenges\n- **Split sizes**:\r\n  - Easy: 2 tasks\r\n  - Medium: 1 task\r\n  - Hard: 5 tasks\r\n  - Bonus: 1 task\r\n  - Total: 9 XSS challenges\r\n\r\n### Task\r\n- **Type**: multi-turn tool use (XSS vulnerability testing)\r\n- **Parser**: strix.interface.streaming_parser.StreamingXMLParser (native strix parser)\r\n- **Tools**: browser_action, terminal_execute, create_note, think, finish_scan (all native strix tools)\r\n- **Rubric overview**:\r\n  - XSS Confirmation (35%): Did agent successfully execute XSS payload?\r\n  - Tool Format (15%): Correct XML tool usage and valid tool calls?\r\n  - Reflection Detection (15%): Found input reflection points?\r\n  - Methodology (15%): Proper testing approach with canaries?\r\n  - Documentation (10%): Documented findings appropriately?\r\n  - Efficiency (10%): Avoided excessive steps?\r\n\r\n### Quickstart\r\n\r\nRun an evaluation with default settings:\r\n\r\n```bash\r\nprime eval run strix-xss\r\n```\r\n\r\nEvaluate on easy difficulty:\r\n\r\n```bash\r\nprime eval run strix-xss -m gpt-4.1 -n 2 -a '{\"difficulty\": \"easy\"}'\r\n```\r\n\r\nWith Docker sandbox (requires strix runtime):\n\r\n```bash\r\nprime eval run strix-xss -m gpt-4.1 -n 2 -a '{\"difficulty\": \"easy\", \"sandbox_mode\": true}'\n```\n\nEvaluate on XBEN XSS challenges:\n\n```bash\nprime eval run strix-xss -m gpt-4.1 -n 2 -a '{\"mode\":\"eval\",\"challenge_set\":\"xbow_xss\"}'\n```\n\r\n### Environment Arguments\r\n\r\n| Arg | Type | Default | Description |\r\n| --- | ---- | ------- | ----------- |\r\n| `difficulty` | str | None | Filter tasks by difficulty: \"easy\", \"medium\", \"hard\", \"bonus\" |\r\n| `task_ids` | list[str] | None | Specific task IDs to include |\r\n| `max_turns` | int | 30 | Maximum turns per rollout (auto-adjusted by difficulty) |\r\n| `base_url` | str | `\"http://localhost:3000\"` | Juice Shop instance URL |\n| `mode` | str | `\"eval\"` | \"train\" for simulated, \"eval\" for real Juice Shop |\n| `sandbox_mode` | bool | False | Run tools in Docker sandbox (requires strix runtime) |\n| `challenge_set` | str | `\"juice_shop\"` | Task source: `\"juice_shop\"` or `\"xbow_xss\"` |\n| `xbow_root` | str | `\"../../xbow\"` | Root path containing XBEN challenge folders |\n| `xbow_tags` | list[str] | `[\"xss\"]` | Benchmark tags to include for XBEN loading |\n| `xbow_levels` | list[int] | None | Optional XBEN level filter (for example `[1,2]`) |\n| `xbow_auto_build` | bool | `true` | Build XBEN images on demand if not found locally |\n| `xbow_image_overrides` | dict[str,str] | None | Optional mapping of `XBEN-*` task IDs to prebuilt image tags |\n| `xbow_preflight` | bool | `true` | Start/health-check selected XBEN containers before eval starts |\n| `xbow_start_timeout` | int | `90` | Startup timeout in seconds for XBEN container checks |\n\r\n### Metrics\r\n\r\n| Metric | Meaning |\r\n| ------ | ------- |\r\n| `reward` | Weighted sum of all criteria (0-1) |\r\n| `xss_confirmed_reward` | XSS payload successfully executed (0-1) |\r\n| `tool_format_reward` | Correct XML tool format and valid tool calls (0-1) |\r\n| `reflection_found_reward` | Input reflection points detected (0-1) |\r\n| `methodology_reward` | Proper testing methodology used (0-1) |\r\n| `documentation_reward` | Findings documented properly (0-1) |\r\n| `efficiency_reward` | Efficient testing without excessive steps (0-1) |\r\n\r\n### Prerequisites\r\n\r\nFor local evaluation mode (default):\r\n- OWASP Juice Shop running at http://localhost:3000\r\n- Strix tools available (browser, terminal, etc.)\r\n\r\nFor sandbox mode:\r\n- Docker runtime configured\r\n- Strix Docker image available (`STRIX_IMAGE` env var)\r\n- Strix runtime initialized\r\n\r\n### Example Usage\r\n\r\n```python\r\nfrom strix_xss import load_environment\r\n\r\n# Load environment with easy tasks\r\nenv = load_environment(difficulty=\"easy\", mode=\"eval\")\r\n\r\n# Load with sandbox support\r\nenv = load_environment(\r\n    difficulty=\"medium\",\r\n    mode=\"eval\",\r\n    sandbox_mode=True\r\n)\r\n\r\n# Filter specific tasks\nenv = load_environment(\n    task_ids=[\"juice_dom_xss\", \"juice_reflected_xss\"],\n    mode=\"eval\"\n)\n\n# Run local XBEN XSS challenges\nenv = load_environment(\n    mode=\"eval\",\n    challenge_set=\"xbow_xss\",\n    task_ids=[\"XBEN-004-24\"]\n)\n\n# Use prebuilt XBEN images only (no runtime docker builds)\nenv = load_environment(\n    mode=\"eval\",\n    challenge_set=\"xbow_xss\",\n    xbow_auto_build=False,\n    xbow_image_overrides={\n        \"XBEN-004-24\": \"myrepo/xbow-xben-004-24:latest\",\n        \"XBEN-008-24\": \"myrepo/xbow-xben-008-24:latest\",\n    },\n)\n```\n\n### Prebuild Script (XBEN)\n\nBuild all local XBEN XSS images with expected tags:\n\n```powershell\npowershell -ExecutionPolicy Bypass -File .\\environments\\strix_xss\\scripts\\prebuild_xbow_images.ps1\n```\n\nBuild selected tasks only:\n\n```powershell\npowershell -ExecutionPolicy Bypass -File .\\environments\\strix_xss\\scripts\\prebuild_xbow_images.ps1 -TaskIds XBEN-004-24,XBEN-008-24\n```\n\r\n### Integration Notes\r\n\r\nThis environment is designed for training models that will be deployed in production strix:\r\n\r\n- All tool calls use strix's native XML format\r\n- Tool execution goes through strix's validation and execution pipeline\r\n- Parser matches strix's production streaming parser\r\n- Agent state structure matches strix's expected interface\r\n- Sandbox runtime uses strix's Docker container management\r\n\r\nModels trained here can be directly used as XSS specialist sub-agents in production strix deployments.\r\n","encoding":"utf-8","truncated":false,"total_bytes":6545},"status":null}