{"data":{"kind":"file","path":"README.md","version_id":"rrsex7fe106sg87t3mv9djqr","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2455,"modified_at":"2025-08-29T20:08:16.222000","content_hash":"df25adfef2f806f082de1ab75ebe0fe4f2265a4108a070387713eefc0e41858f"},"entries":[],"content":"# mnist-adversarial\n\n### Overview\n- **Environment ID**: `mnist-adversarial`\n- **Short description**: Evaluation environment for testing AI models' ability to distinguish adversarial examples from normal MNIST digits while correctly identifying the digit class.\n- **Tags**: single-turn, test, eval, mnist, adversarial-example\n\n### Datasets\n- **Primary dataset(s)**: `wambosec/adversarial-mnist` - A dataset containing both normal and adversarial MNIST digit examples\n- **Source links**: [Hugging Face Dataset](https://huggingface.co/datasets/wambosec/adversarial-mnist)\n- **Split sizes**: Uses test split by default, with configurable sample size (default: 50 normal + 50 adversarial = 100 total examples)\n\n### Task\n- **Type**: single-turn\n- **Parser**: Custom regex parser that extracts responses in `\\boxed{adversarial_X}` or `\\boxed{normal_X}` format\n- **Rubric overview**: Dual scoring system with +0.5 points for correct adversarial/normal classification and +0.5 points for correct digit identification (0-9)\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval mnist-adversarial\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval mnist-adversarial -m gpt-4.1-mini -n 20 -r 3 -t 1024 -T 0.7 -a '{\"size\": 30}'\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n- Models receive flattened 784-element arrays representing 28×28 grayscale MNIST images\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `dataset_split` | str | `\"test\"` | Dataset split to use (train/test/validation) |\n| `size` | int | `50` | Number of normal and adversarial examples each (total = 2×size) |\n\n### Input Format\nThe model receives a flattened array of 784 grayscale values (0-255) representing a 28×28 MNIST digit image in row-major order. The system prompt instructs the model to classify the image as either adversarial or normal and identify the digit class.\n\n### Expected Output Format\nModels must respond with exactly one line in the format:\n- `\\boxed{adversarial_X}` for adversarial examples (where X is the digit 0-9)\n- `\\boxed{normal_X}` for normal examples (where X is the digit 0-9)\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Main scalar reward: 0.5 for correct adversarial/normal classification + 0.5 for correct digit identification (max: 1.0) |\n| `accuracy` | Exact match on target answer |","encoding":"utf-8","truncated":false,"total_bytes":2455},"status":null}