{"data":{"kind":"file","path":"README.md","version_id":"fdw23fvgp0iydm376fij7xqn","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":4552,"modified_at":"2025-08-25T02:19:27.349000","content_hash":"e48d4cff4d828eb74817b83e150b5a8e97919450fbde9629507176e8d0e9390a"},"entries":[],"content":"# synlogic\n\n### Overview\n- **Environment ID**: `synlogic`\n- **Short description**: Single-turn evaluation over SynLogic (HF) tasks with XML formatting and official task verifiers.\n- **Tags**: reasoning, logic, single-turn, xml, synthetic\n\n### Datasets\n- **Primary dataset(s)**: `MiniMaxAI/SynLogic` on Hugging Face\n- **Source links**: GitHub (MiniMax-AI/SynLogic), HF Dataset card\n- **Configs**: `easy`, `hard` (each contains many task types via `data_source`)\n- **Split sizes**: Uses HF splits; train/eval counts configurable via loader args\n\n### Task\n- **Type**: single-turn\n- **Parser**: `XMLParser([\"think\",\"answer\"])`\n- **Rubric overview**: Task correctness via official per-task verifier when available (fallback to normalized equality) + optional format component\n\n### Quickstart\nRun an evaluation with defaults (easy/validation, all tasks):\n\n```bash\nuv run vf-eval synlogic\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval synlogic \\\n  -m gpt-4.1-mini \\\n  -n 20 -r 3 -t 1024 -T 0.7 \\\n  -a '{\n        \"subset\": \"easy\",\n        \"split\": \"validation\",\n        \"num_train_examples\": 200,\n        \"num_eval_examples\": 200\n      }'\n```\n\nRun via OpenRouter using Qwen (programmatic):\n\n```bash\nexport OPENROUTER_API_KEY=... \nuv run python environments/synlogic/run_openrouter_synlogic.py --model \"qwen/qwen3-1.7b-instruct\"\n\nRun locally with a Hugging Face model (no API key):\n\n```bash\n# Install deps if needed (CUDA example)\nuv pip install --python 3.12 --index-url https://download.pytorch.org/whl/cu121 \\\n  torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1\nuv pip install --python 3.12 'transformers>=4.43,<5' 'accelerate>=0.31,<1' 'safetensors>=0.4,<1' 'sentencepiece>=0.1'\n\n# Run with HF provider (uses Transformers locally)\nuv run python environments/synlogic/run_openrouter_synlogic.py \\\n  --provider hf \\\n  --model \"Qwen/Qwen3-1.7B\" \\\n  --device cuda --device-map auto \\\n  --num-eval 20 --num-train 200 --max-tokens 1024 --temperature 0.7\n```\n\nFilter to a specific task (data_source). Official verifiers are auto-discovered and auto-setup:\n\n```bash\nuv run vf-eval synlogic \\\n  -m gpt-4.1-mini \\\n  -n 20 -r 3 -t 1024 -T 0.7 \\\n  -a '{\n        \"subset\": \"easy\",\n        \"split\": \"validation\",\n        \"tasks\": [\"arrow_maze\"]\n      }'\n```\n\n### Environment Arguments\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `subset` | str | `\"easy\"` | HF config name (`easy` or `hard`) |\n| `split` | str | `\"validation\"` | Eval split override (if set, overrides `eval_split`) |\n| `train_split` | str | `\"train\"` | HF split for training dataset |\n| `eval_split` | str\\|None | `\"validation\"` | HF split for eval if `split` not provided |\n| `tasks` | str\\|list[str]\\|None | `None` | Filter by dataset `data_source` values; if `None`, include all |\n| `num_train_examples` | int | `200` | Number of training examples (use `-1` for full split) |\n| `num_eval_examples` | int | `200` | Number of evaluation examples (use `-1` for full split) |\n| `seed` | int | `0` | Random seed for sampling |\n| `question_key` | str\\|None | auto | Override dataset question field |\n| `answer_key` | str\\|None | auto | Override dataset answer field |\n| `task_key` | str\\|None | auto (`data_source`) | Override dataset task field |\n| `system_prompt` | str\\|None | built-in | Override System message (XML Think/Answer) |\n| `verifier` | str\\|callable\\|None | `None` | Optional override: single verifier (all tasks) as dotted path or callable |\n| `verifier_map` | dict\\|None | `None` | Optional per-task override: `{task_name: dotted_path}` |\n| `task_alias_map` | dict\\|None | `None` | Map dataset task names to repo folder names when they differ |\n| `auto_setup_repo` | bool | `True` | Auto-download official SynLogic repo into a cache and add to `sys.path` for verifiers |\n\n### Metrics\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Weighted sum of task correctness + format components |\n| `check_answer_reward_func` | 1.0/0.0 per example from official verifier or equality fallback |\n| `format_reward` | Adherence to `<think>`/`<answer>` XML format |\n\n### Notes\n- Requires network on first run to download the dataset from HF.\n- Official verifiers are automatically set up: if not already importable, the environment downloads a zip of `MiniMax-AI/SynLogic` into a user cache (default `~/.cache/synlogic_repo`) and adds it to `sys.path`. No `repo_root` is needed. You can still provide `verifier`/`verifier_map` to override defaults.\n- Set `SYNLOGIC_CACHE_DIR` to customize the cache location. Set `auto_setup_repo=False` to opt out.\n","encoding":"utf-8","truncated":false,"total_bytes":4552},"status":null}