{"data":{"kind":"file","path":"README.md","version_id":"l195s7v5z27xsfu8ed8lywvw","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1727,"modified_at":"2025-10-25T21:01:37.720000","content_hash":"0e4fced8ee3fb64bb9cae4fe7bfce51ee6ed9457e9bba4824e8ded6bc4760d81"},"entries":[],"content":"# CyberSOCEval\n\n### Overview\n- **Environment ID**: `cybersoceval`\n- **Short description**: Verifier for the CrowdStrike + Meta CyberSOCEval benchmarks: malware analysis and threat intelligence reasoning.\n- **Tags**: cybersecurity, malware-analysis, single-turn, eval\n\n### Datasets\n- **Primary dataset(s)**: [cybersoceval-questions](https://huggingface.co/datasets/kyleavery/cybersoceval-questions), [cybersoceval-reports](https://huggingface.co/datasets/kyleavery/cybersoceval-reports)\n- **Source links**: Questions and some reports from [CyberSecEval 4](https://github.com/meta-llama/PurpleLlama/tree/main/CybersecurityBenchmarks/datasets/crwd_meta), other reports from [CyberSOCEval_data](https://github.com/CrowdStrike/CyberSOCEval_data)\n- **Split sizes**: 609 malware analysis questions, 588 threat intelligence reasoning questions\n\n### Task\n- **Type**: single-turn\n- **Parser**: ThinkParser when `use_think=True`, else a basic `Parser` extracting the final boxed answer (`extract_boxed_answer`)\n- **Rubric overview**: Jaccard set similarity.\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval cybersoceval\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval cybersoceval -m gpt-4.1-mini -n 20 -r 3 -t 1024 -T 0.7 -a '{\"benchmark\": \"malware_analysis\"}'\n```\n\n### Environment Arguments\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `benchmark` | str | `all` | Which benchmark to use: `malware_analysis`, `threat_intel_reasoning`, or `all` (runs both in `EnvGroup`) |\n| `use_think` | bool | `true` | Whether to parse `<think>` tags |\n\n### Metrics\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Jaccard set similarity over MC letter sets (0–1) |\n","encoding":"utf-8","truncated":false,"total_bytes":1727},"status":null}