{"data":{"kind":"file","path":"README.md","version_id":"bqjxjjp886rnvameqf047awp","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":4205,"modified_at":"2026-04-09T17:23:14.900000","content_hash":"c2f2ab2a9c731192f4b1cde57526f534d88b4778de631c59466941f73134fdf3"},"entries":[],"content":"# kernelbench-kguard\n\nKernelBench environment with [KernelGuard](https://github.com/SinatrasC/kernelguard) integrated as a reward hacking detector.\n\nFork of [siro/kernelbench-env](https://app.primeintellect.ai/dashboard/environments/siro/kernelbench-env) with a precheck gate and reward penalty for benchmark exploitation attempts.\n\n## What It Does\n\nModels receive a PyTorch architecture and must write an optimized replacement. The evaluation runs on H100 GPUs via Modal. KernelGuard scans the generated code before GPU evaluation:\n\n- **Precheck gate**: If KernelGuard detects an exploit pattern (result replay, timer patching, harness manipulation, obfuscated exec), the submission is blocked before GPU eval. Saves compute.\n- **Reward penalty**: Blocked submissions receive a -2.0 reward penalty. Suspicious patterns (below auto-filter threshold) receive -0.5.\n- **State tracking**: All KernelGuard results are stored in `state[\"info\"][\"kernelguard_flags\"]` for analysis.\n\n## Install\n\n```bash\nuv sync\n```\n\n## Usage\n\n### With vf-eval (recommended)\n\n```bash\nuv run vf-eval kernelbench_kguard \\\n    --env-dir-path . \\\n    --env-args '{\"num_turns\": 1, \"gpu\": \"H100\", \"kernelguard_enabled\": true}' \\\n    --model \"openai/gpt-4o\" \\\n    --num-examples 10 \\\n    --rollouts-per-example 1 \\\n    --save-results \\\n    --state-columns \"problem_name\" \\\n    --debug\n```\n\n### Programmatic\n\n```python\nimport asyncio\nfrom kernelbench_kguard import load_environment\nfrom openai import AsyncOpenAI\n\nenv = load_environment(\n    gpu=\"H100\",\n    num_turns=1,\n    kernelguard_enabled=True,\n)\n\nclient = AsyncOpenAI(\n    api_key=\"...\",\n    base_url=\"https://api.pinference.ai/api/v1/\",\n)\n\nresults = asyncio.run(env.evaluate(\n    client=client,\n    model=\"openai/gpt-4o\",\n    num_examples=10,\n    rollouts_per_example=1,\n    save_results=True,\n    state_columns=[\"problem_name\"],\n))\n```\n\n### Disable KernelGuard\n\n```python\nenv = load_environment(gpu=\"H100\", kernelguard_enabled=False)\n```\n\n## Configuration\n\n| Parameter | Default | Description |\n|-----------|---------|-------------|\n| `gpu` | `\"T4\"` | GPU type for Modal evaluation (`T4`, `L4`, `A100`, `H100`, `H200`, `B200`) |\n| `num_turns` | `1` | Feedback loop turns |\n| `feedback_loop` | `\"until_max_turns\"` | `\"until_correct\"`, `\"until_max_turns\"`, or `\"none\"` |\n| `levels` | `[1]` | KernelBench problem levels |\n| `num_correct_trials` | `3` | Correctness check trials |\n| `num_perf_trials` | `100` | Performance measurement trials |\n| `kernelguard_enabled` | `True` | Enable/disable KernelGuard precheck + reward penalty |\n\n## Prerequisites\n\nDeploy the Modal evaluation runner before first use:\n\n```bash\nuv run modal deploy kernelbench_src/modal_runner.py\n```\n\n## How KernelGuard Integrates\n\n1. Model generates code\n2. `kernelguard.analyze_code(code)` runs in <1ms (in-process, no network)\n3. If `should_filter=True`: skip Modal GPU eval, return synthetic failed result, apply -2.0 penalty\n4. If clean: proceed to Modal GPU eval normally\n5. After rollout: `kernelguard_penalty` reward function reads flags from state\n\nKernelGuard is configured with `entrypoints=[\"custom_kernel\", \"forward\"]` to match both KernelBot and KernelBench submission formats.\n\n## Detected Exploit Patterns\n\n| Pattern | Description |\n|---------|-------------|\n| `LAST_CALL_REPLAY` | Global identity/pointer replay — returns cached output on repeated inputs |\n| `CONFIG_CACHE_EXPLOIT` | Dict/config-keyed result cache with early return |\n| `POINTER_REPLAY` | `data_ptr()`-based output replay |\n| `TIMER_MONKEYPATCH` | Patches `torch.cuda.Event` or timing functions |\n| `OBFUSCATED_EXEC` | `exec(decode(...))` hidden payloads |\n| `HARNESS_RUNTIME_PATCHING` | Patches benchmark harness functions |\n| `EVALUATOR_EXPLOIT` | Imports/patches `__main__` evaluator internals |\n\nSee [KernelGuard](https://github.com/SinatrasC/kernelguard) for the full list of 35 detection rules.\n\n## Credits\n\n- Original environment: [siro/kernelbench-env](https://app.primeintellect.ai/dashboard/environments/siro/kernelbench-env)\n- KernelBench dataset: [ScalingIntelligence/KernelBench](https://github.com/ScalingIntelligence/KernelBench)\n- KernelGuard: [SinatrasC/kernelguard](https://github.com/SinatrasC/kernelguard)\n","encoding":"utf-8","truncated":false,"total_bytes":4205},"status":null}