{"data":{"kind":"file","path":"README.md","version_id":"nupdfg9bslfmtn2bv240yagd","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":13320,"modified_at":"2026-05-27T23:03:15.356000","content_hash":"d6da333984c1c2ba2a00beb71f6147535cbe592efa62cc0fcbc51fd9a85f737f"},"entries":[],"content":"# AerialSim-Env\n\n**AerialSim-Env** is a Prime/Verifiers-compatible RL/evaluation environment for\naerial autonomy. An agent receives a mission/state observation, picks a\nhigh-level action, and the environment updates state, computes reward, and\nterminates the episode.\n\n## What is implemented\n\n- `aerialsim-fast` symbolic Python backend.\n- 5 high-level actions: `continue`, `inspect`, `reroute`, `land`, `abort`.\n- 5 starter scenarios covering all four task families: waypoint navigation,\n  obstacle avoidance, safe landing (clear LZ), safe landing (blocked LZ),\n  and mission abort.\n- Two policies: `random_action` and a hand-coded `heuristic_action`.\n- Two example scripts: a single-episode trace and a multi-policy evaluator.\n- Pytest suite for backend, rewards, and heuristic policy.\n- **Verifiers entrypoint** at `aerialsim_env:load_environment` returning\n  a `verifiers.MultiTurnEnv` subclass — same shape as\n  `environments/alphabet_sort/alphabet_sort.py` in the verifiers\n  reference repo. Override `env_response` + `@vf.stop`, hand verifiers\n  the dataset + rubric, let it own the rollout loop. No Taskset, no\n  Harness, no custom program. Works end-to-end with `vf-eval aerialsim_env`.\n- **JSON-in-prompt action format**: the model emits\n  `{\"action\": \"land\"}` on each turn; the env parses and steps the\n  symbolic backend. Works on any model, regardless of whether tool-call\n  parsing is enabled on the inference / training side.\n- **Parametric scenario generator** with seeded train/eval splits.\n- **Published to the Prime Environments Hub** as\n  [`nahidalam/aerialsim-env`](https://app.primeintellect.ai/dashboard/environments/nahidalam/aerialsim-env)\n  with passing Hub integration tests; trainable via\n  [`configs/aerialsim_smoke.toml`](configs/aerialsim_smoke.toml).\n- Helper scripts for dev setup and verification.\n\n---\n\n## Setup\n\nRecommended path on a dev box (e.g. EC2 Ubuntu):\n\n```bash\n# Clone\ngit clone https://github.com/nahidalam/aerialsim-env.git\ncd aerialsim-env\n\n# Install uv if needed\ncurl -LsSf https://astral.sh/uv/install.sh | sh\nsource ~/.bashrc\n\n# Install Prime CLI (Prime Lab supersedes the older Prime-RL launcher path)\nuv tool install prime\nprime login\n\n# Optional Prime Lab workspace bootstrap (opinionated layout for\n# `prime train run configs/<env>.toml`).\nprime lab setup\n\n# Install this environment\nuv pip install -e .\n\n# Run Week 1 checks\npython examples/run_fast_episode.py\npython examples/evaluate_fast.py\npytest\n```\n\n### Helper scripts\n\n```bash\n# Read-only check that Python / uv / prime / vf-eval are present;\n# prints commands to install anything missing. Never modifies the system.\nbash scripts/setup_dev.sh\n\n# Runs the full Week 1 verification flow:\n#   uv pip install -e .  (or pip)\n#   python examples/run_fast_episode.py\n#   python examples/evaluate_fast.py\n#   pytest\n#   (optional) vf-eval aerialsim-env\nbash scripts/verify_week1.sh\n```\n\n`scripts/setup_dev.sh` does **not** make destructive system changes. It prints\nthe `sudo apt install` / `uv tool install` commands to run when something is\nmissing, and you decide whether to run them.\n\n---\n\n## How to use\n\n### Run a single episode (heuristic policy)\n\n```bash\npython examples/run_fast_episode.py\npython examples/run_fast_episode.py --scenario safe_landing_blocked_001\npython examples/run_fast_episode.py --scenario obstacle_avoidance_001 --max-steps 30\n```\n\nFor each step the script prints the action, observation JSON, reward, done\nflag, and the events emitted that step, and ends with an episode summary.\n\n### Run evaluation across all starter scenarios\n\n```bash\npython examples/evaluate_fast.py\n```\n\nReports per policy: `success_rate`, `average_reward`, `average_steps`,\n`collision_rate`, `unsafe_landing_rate`. The heuristic policy beats the\nrandom baseline on every starter scenario.\n\n### Run tests\n\n```bash\npytest\n```\n\n### Verifiers `vf-eval`\n\n`verifiers` is a hard dependency, so `vf-eval` is installed by\n`uv pip install -e \".[dev]\"`. `MultiTurnEnv` owns the rollout loop —\nwe don't ship our own \"program\" anymore (that pattern was the source\nof the `Invalid chat message` failures we hit earlier).\n\n**Default eval split (5 hand-authored scenarios):**\n\n```bash\nexport PRIME_API_KEY=...   # from app.primeintellect.ai\nvf-eval aerialsim_env \\\n  --api-base-url https://api.pinference.ai/api/v1 \\\n  --api-key-var PRIME_API_KEY \\\n  --model qwen/qwen3-30b-a3b-instruct-2507 \\\n  --num-examples 5 --rollouts-per-example 1 \\\n  --disable-tui --abbreviated-summary\n```\n\nCost: ~$0.0005 for a full 5-scenario smoke run with\n`qwen/qwen3-30b-a3b-instruct-2507`.\n\nThe entrypoint resolves to\n`aerialsim_env.load_environment(split=..., n_train=..., train_seed=...,\nmax_turns=..., **kwargs)`, which returns a `verifiers.MultiTurnEnv`\nsubclass composed of:\n\n- A `DatasetBuilder` (zero-arg callable returning a `datasets.Dataset`)\n  with one row per scenario — JSON-format instructions + mission +\n  initial observation bundled into a single `user` message.\n- An `env_response` method that parses the model's JSON action,\n  replays the symbolic backend, and returns the next observation as\n  a typed `vf.UserMessage` (or `[]` when the backend terminates).\n- A `@vf.stop backend_done` method that flags termination.\n- A `vf.Rubric` with `total_reward` (weight 1.0, the scalar reward\n  the optimizer sees) and `success_rate` (weight 0.0, surfaced as a\n  metric on dashboards).\n- The default `vf.MultiTurnEnv` rollout loop — owns model calls,\n  message types, trajectory recording, and chat-template rendering.\n\n### Multimodal mode (Path E — procedural images)\n\nFor VLM training, pass `image_obs=true` to render the symbolic state\ninto a base64 PNG and attach it to each user message as an `image_url`\ncontent block:\n\n```bash\nvf-eval aerialsim_env \\\n  -a '{\"split\":\"eval\",\"image_obs\":true}' \\\n  --api-base-url https://api.pinference.ai/api/v1 \\\n  --api-key-var PRIME_API_KEY \\\n  --model Qwen/Qwen2.5-VL-7B-Instruct \\\n  --num-examples 5 --rollouts-per-example 1 \\\n  --disable-tui --abbreviated-summary\n```\n\nEverything else — scenarios, action vocabulary, rubric, MultiTurnEnv\nadapter, scoring — is unchanged. When the AirSim backend lands, the\nprocedural renderer is replaced by real camera frames; nothing else in\nthe pipeline changes. See [airsim_integration_notes.md](airsim_integration_notes.md)\nfor the full pipeline transition.\n\nA matching training config lives at [configs/aerialsim_path_e.toml](configs/aerialsim_path_e.toml).\n\n### Parametric scenario generator + train/eval split\n\nThe 5 hand-authored scenarios in `aerialsim_env/scenarios/v1/*.json` are\nthe canonical **held-out eval set**. For training we draw from the\nparametric generator at `aerialsim_env.scenarios.generator`.\n`load_environment(split=..., n_train=..., train_seed=...)` accepts\nthese directly as kwargs:\n\n```bash\n# Default — 5 hand-authored eval scenarios\nvf-eval aerialsim_env\n\n# 40 generated train scenarios\nvf-eval aerialsim_env -a '{\"split\":\"train\"}'\n\n# 20 generated train scenarios, against Prime\nvf-eval aerialsim_env \\\n  -a '{\"split\":\"train\",\"n_train\":20}' \\\n  --api-base-url https://api.pinference.ai/api/v1 \\\n  --api-key-var PRIME_API_KEY \\\n  --model qwen/qwen3-30b-a3b-instruct-2507 \\\n  --num-examples 20 --rollouts-per-example 3 \\\n  --disable-tui --abbreviated-summary\n```\n\nTrain and eval scenario ids are disjoint by construction (every generated\nscenario id ends in `_train`). The generator is seeded, so\n`train_seed=0` is reproducible. From Python:\n\n```python\nfrom aerialsim_env.scenarios.generator import train_scenarios, eval_scenarios\n\ntrain = train_scenarios(n=40, seed=0)     # 40 scenarios, balanced across families\nevals = eval_scenarios(n=10, seed=10_000) # disjoint id namespace from train\n```\n\nEach generated scenario follows the same JSON schema as the hand-authored\nones and drops straight into `make_env(...)`, the FastBackend, and the\nMultiTurnEnv adapter.\n\n---\n\n## Publish + train on Prime Lab\n\nThe env is published to the Hub as\n[`nahidalam/aerialsim-env`](https://app.primeintellect.ai/dashboard/environments/nahidalam/aerialsim-env).\nRe-publish with:\n\n```bash\nprime whoami                            # confirm `environments: write` scope\nbash scripts/publish_env.sh             # dry-run; prints every command\nbash scripts/publish_env.sh --apply     # local install + tests + prime env push --auto-bump\n```\n\nLaunch the smoke training run with:\n\n```bash\nprime train run configs/aerialsim_smoke.toml\n```\n\nThe smoke config ([configs/aerialsim_smoke.toml](configs/aerialsim_smoke.toml)):\n\n```toml\nmodel = \"qwen/qwen3-30b-a3b-instruct-2507\"\nmax_steps = 50\nbatch_size = 32\nrollouts_per_example = 4\n\n[sampling]\nmax_tokens = 128\n\n[[env]]\nid = \"nahidalam/aerialsim-env\"\nargs = { split = \"train\", n_train = 40, train_seed = 0 }\n```\n\nEstimated spend: a few dollars.\n\n## Project structure\n\n```\naerialsim-env/\n├── README.md\n├── pyproject.toml\n├── aerialsim_env.py                   # Verifiers entrypoint (stub; see note below)\n├── aerialsim_env/                     # actual implementation package\n│   ├── __init__.py                    # exports load_environment(), make_env(), ...\n│   ├── v1.py                          # verifiers MultiTurnEnv adapter (env_response + @vf.stop + Rubric)\n│   ├── core/\n│   │   ├── actions.py\n│   │   ├── env.py                     # AerialSimEnv (reward + termination glue)\n│   │   ├── observations.py\n│   │   ├── rewards.py\n│   │   └── termination.py\n│   ├── backends/\n│   │   ├── base.py                    # Backend interface\n│   │   └── fast_backend.py            # Symbolic 'fast' backend\n│   ├── scenarios/\n│   │   ├── v1/\n│   │   │   ├── waypoint_nav_001.json\n│   │   │   ├── obstacle_avoidance_001.json\n│   │   │   ├── safe_landing_clear_001.json\n│   │   │   ├── safe_landing_blocked_001.json\n│   │   │   └── mission_abort_001.json\n│   │   └── generator.py            # parametric train/eval scenarios\n│   └── policies/\n│       ├── random_policy.py\n│       └── heuristic_policy.py\n├── configs/\n│   └── aerialsim_smoke.toml           # prime train run smoke config\n├── examples/\n│   ├── run_fast_episode.py\n│   └── evaluate_fast.py\n├── scripts/\n│   ├── setup_dev.sh\n│   ├── verify_week1.sh\n│   └── publish_env.sh                 # Hub publish walkthrough (dry-run by default)\n└── tests/\n    ├── test_fast_backend.py\n    ├── test_rewards.py\n    ├── test_heuristic_policy.py\n    ├── test_v1_adapter.py\n    └── test_scenario_generator.py\n```\n\n> Note on `aerialsim_env.py` vs the `aerialsim_env/` package: when both are on\n> disk, Python resolves `import aerialsim_env` to the package directory, so\n> the canonical `load_environment()` lives in `aerialsim_env/__init__.py`.\n> The root `aerialsim_env.py` is kept per the Week 1 spec as a structural\n> placeholder and as a fallback for tools that load the entrypoint by file\n> path.\n\n---\n\n## State and actions\n\n`Observation` (returned every step) includes:\n\n- `distance_to_goal: float`\n- `obstacle_ahead: bool`\n- `landing_zone_clear: bool`\n- `collision: bool`\n- `landed: bool`\n- `aborted: bool`\n- `done: bool`\n- `step_count: int`\n- plus `mission`, `task_type`, `scenario_id` for context\n\nAction effects:\n\n```\ncontinue:\n  if obstacle_ahead: collision=True, done=True\n  else: distance_to_goal -= 5\n\nreroute:\n  obstacle_ahead=False\n  distance_to_goal -= 3\n  (counts as a \"correct_reroute\" if obstacle_ahead was True at decision time)\n\ninspect:\n  no movement, small step penalty\n\nland:\n  if landing_zone_clear and distance_to_goal <= 5: landed=True, done=True\n  else: unsafe landing, done=True\n\nabort:\n  aborted=True, done=True\n  (counts as \"correct_abort\" if landing zone is blocked, else \"unnecessary_abort\")\n```\n\n## Reward table\n\n| Event              | Reward |\n|--------------------|-------:|\n| reach_waypoint     |   +2.0 |\n| safe_landing       |   +3.0 |\n| correct_reroute    |   +2.0 |\n| correct_abort      |   +2.0 |\n| collision          |   −5.0 |\n| unsafe_landing     |   −5.0 |\n| unnecessary_abort  |   −2.0 |\n| timeout            |   −1.0 |\n| inspect_cost       |   −0.1 |\n| step_cost          |   −0.1 |\n\n`step_cost` is applied every step. `inspect_cost` is applied additionally on\n`inspect` steps.\n\n## Scenario schema\n\nScenarios live in `aerialsim_env/scenarios/v1/*.json`. Example\n(`safe_landing_blocked_001.json`):\n\n```json\n{\n  \"scenario_id\": \"safe_landing_blocked_001\",\n  \"task_type\": \"safe_landing\",\n  \"backend\": \"fast\",\n  \"mission\": \"Reach LZ-1 and land only if safe.\",\n  \"initial_scene\": {\n    \"start_pose\": [0, 0, -10],\n    \"goal_waypoint\": [20, 0, -10],\n    \"landing_zone\": { \"id\": \"LZ-1\", \"center\": [20, 0, 0], \"clear\": false },\n    \"obstacle_ahead\": false,\n    \"distance_to_goal\": 5.0\n  },\n  \"intervention_spec\": {\n    \"type\": \"block_landing_zone\",\n    \"target_region\": \"LZ-1\",\n    \"object_type\": \"obstacle\",\n    \"active\": true\n  },\n  \"expected_behavior\": \"abort\",\n  \"success_condition\": \"do_not_land_when_lz_blocked\"\n}\n```\n\nThe `fast` backend honors `block_landing_zone` and `spawn_obstacle`\n`intervention_spec` types when `active: true`.\n\n## License\n\nMIT.\n","encoding":"utf-8","truncated":false,"total_bytes":13320},"status":null}