{"data":{"kind":"file","path":"README.md","version_id":"qo48n2gt2jrm4dkxqzz2np6w","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3193,"modified_at":"2026-01-17T01:18:27.567000","content_hash":"3cbb1c06f5a3147adb431f7af6e821bf6f99921bf5df50bf6977ba8d9af6b139"},"entries":[],"content":"# SERE (Verifiers Environment)\n\nSERE is a symbolic-first embodied environment for controllable RL with LLMs.\nIt exposes typed objects, predicates, numeric fluents, and actions with explicit\npreconditions/effects. Instead of pixels, the agent interacts through text\nobservations and PDDL-style actions. This makes runs fully interpretable and\nreproducible while letting you control horizon length, feedback frequency, and\nobservability.\n\nMore background and full docs live in the SERE repo:\nhttps://github.com/hallerite/SERE\n\n## Quick Start\n\n```python\nfrom vf_sere import load_environment\n\n# All tasks across all domains\nenv = load_environment()\n\n# Or limit to specific domains\nenv = load_environment(domains=[\"kitchen\", \"assembly\"])\n```\n\nPrime CLI (uses the env name, no imports needed):\n\n```bash\nprime eval run vf-sere -m gpt-4o-mini -n 10\n```\n\nSuggested default eval scope (first 5 kitchen + first 5 assembly tasks):\n\n```bash\nprime eval run vf-sere -m gpt-4o-mini -n 20 \\\n  -a '{\"domains\":[\"kitchen\",\"assembly\"],\"num_tasks_per_domain\":5}'\n```\n\n## Key Knobs (Controllability)\n\nUse these to shape difficulty, horizon, and feedback:\n\n- `run_mode`: `INTERACTIVE` (step-by-step), `BATCH` (short plan segments),\n  `OPEN_LOOP` (full plan, no intermediate feedback). This directly controls\n  feedback frequency and effective horizon.\n- `max_episode_steps`: override per-task step limits.\n- `time_limit`: cap total time when durations are enabled.\n- `enable_durations`: actions consume time; enables time-based horizons.\n- `enable_numeric`: numeric fluents (energy, temperature, etc.).\n- `enable_stochastic`: stochastic outcomes for robustness testing.\n- `enable_reward_shaping`: apply task-defined reward shaping milestones (default: False).\n- `reward_shaping`: milestone or potential-based shaping override (task-level control).\n- `step_penalty` / `invalid_penalty`: trade off exploration vs. efficiency.\n\nObservation control (partial observability):\n\n- `display_nl`: include natural-language glosses for facts/actions.\n- `show_affordances`: list currently valid actions.\n- `formatter_config`: set `visibility` (e.g., `room`) to restrict what the\n  agent can see.\n\n## Task Selection\n\n```python\nenv = load_environment(\n    domains=[\"kitchen\"],\n    num_tasks_per_domain=5,\n    include_multi_agent=True,\n    episodes_per_task=2,\n)\n```\n\nTasks are YAML-defined (domain + task). SERE ships multiple domains (e.g.,\n`kitchen`, `assembly`) with reference plans for solvability checks.\n\n## Multi-Agent Tasks\n\nMulti-agent tasks require one action per robot per step. The parser supports\nmultiple S-expressions in a single response:\n\n```\n(move r1 kitchen pantry)(idle r2)\n```\n\n## Example: More Control\n\n```python\nfrom vf_sere import load_environment\nfrom sere.core.pddl_env.run_mode import RunMode\n\nenv = load_environment(\n    domains=[\"kitchen\"],\n    run_mode=RunMode.BATCH,\n    max_episode_steps=40,\n    enable_durations=True,\n    time_limit=30.0,\n    enable_stochastic=False,\n    display_nl=False,\n    show_affordances=False,\n)\n```\n\n## Troubleshooting\n\n- Import error: install SERE with verifiers extra:\n  `uv sync --extra verifiers`\n- No tasks found: check available domains via `get_available_domains()`.\n","encoding":"utf-8","truncated":false,"total_bytes":3193},"status":null}