{"data":{"kind":"file","path":"README.md","version_id":"bm4hg0ujpe9iv0j7ddbhi87b","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2540,"modified_at":"2026-03-01T00:10:35","content_hash":"8342313f984c25dc370ede7294ec045e999b0f75d831f81a7fde23f03769b4a1"},"entries":[],"content":"# rg-mix-env\n\nWeighted mix of challenging reasoning-gym tasks for RL training.\n\n## Overview\n\nThis environment combines 5 challenging reasoning tasks, weighted **inversely proportional** to their Qwen3-4B pass@1 scores so that harder tasks get more representation during training.\n\n## Tasks\n\n| Task | Type | pass@1 | Weight (1/pass@1) | Config |\n|------|------|--------|-------------------|--------|\n| arc_1d | Pattern recognition | 0.40 | 2.49 | default |\n| sokoban_hard | Planning/search | 0.31 | 3.23 | 3-4 boxes, 9x9 |\n| countdown_7 | Arithmetic search | 0.30 | 3.33 | 7 numbers |\n| zebra_puzzles_7 | Constraint satisfaction | 0.25 | 3.98 | 7 people, 5 chars |\n| cryptarithm | Cryptarithmetic | 0.19 | 5.31 | default |\n\n## Quickstart\n\n```python\nfrom verifiers import load_environment\n\nenv = load_environment(\"rg-mix-env\")\n\n# or with custom args:\nenv = load_environment(\"rg-mix-env\", num_train_examples=10000, num_eval_examples=2048, seed=42)\n```\n\n## Environment Arguments\n\n| Argument | Type | Default | Description |\n|----------|------|---------|-------------|\n| `num_train_examples` | int | 10000 | Number of training examples |\n| `num_eval_examples` | int | 2048 | Number of evaluation examples |\n| `seed` | int | 42 | Random seed for reproducibility |\n| `dataset_path` | str \\| None | None | Path to pre-generated dataset directory. If provided, loads from disk (~10s) instead of generating (~23 min). |\n\n## Pre-generating Datasets\n\nDataset generation is slow (~23 min) due to puzzle verification (zebra puzzles, sokoban BFS). Pre-generate once and reuse:\n\n```bash\n# Generate dataset (uses multiprocessing, ~11 min with 32 workers)\npython generate_rg_mix_dataset.py \\\n    --output /pscratch/sd/s/siddart2/datasets/rg_mix_7500 \\\n    --num-train 7500 --num-test 100 --seed 42\n\n# Or submit as batch job\nsbatch generate_dataset.sbatch\n```\n\nThen reference in TOML config:\n\n```toml\n[[orchestrator.env]]\nid = \"rg-mix-env\"\nargs = { num_train_examples = 7500, num_eval_examples = 100, seed = 42, dataset_path = \"/pscratch/sd/s/siddart2/datasets/rg_mix_7500\" }\n```\n\n### What gets saved\n\nThe `--output` directory contains:\n- `dataset/` — HF Dataset with `question`, `answer`, `task` columns\n- `metadata.json` — Entry map + full entry dicts for scoring (~9 MB for 7600 examples)\n\n### Pre-generated datasets on cluster\n\n| Path | Train | Test | Seed |\n|------|-------|------|------|\n| `$SCRATCH/datasets/rg_mix_7500` | 7500 | 100 | 42 |\n\n## Requirements\n\nRequires `verifiers[rg]` and `reasoning-gym` to be installed in the runtime environment.\n","encoding":"utf-8","truncated":false,"total_bytes":2540},"status":null}