{"data":{"kind":"file","path":"README.md","version_id":"vrsa8cwz3eo6y2fuh2p2byvs","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1849,"modified_at":"2025-09-17T03:19:29.533000","content_hash":"76ef1a65577766a831dfea4f4e839cd575c4d844ef6d84038e6401f395596d4f"},"entries":[],"content":"# robocourier\n\n### Overview\n- **Environment ID**: `robocourier`\n- **Short description**: A grid world reinforcement learning environment where a robot must pick up a package, deliver it to a dropoff, and manage its battery by recharging.\n- **Tags**: \n- reinforcement learning, gridworld, robotics, navigation, battery management\n\n### Datasets\n- None — this is a simulated environment, not dataset based.\n\n\n### Task\n- **Type**: single-turn (episodic RL loop)\n- **Parser**: Gymnasium style API\n- **Rubric overview**: \n  - Rewards:  \n    - `-0.1` per step (time penalty)  \n    - `+10` for successful delivery  \n    - `-5` if battery runs out  \n  - Metrics: cumulative reward, episode length, delivery success rate\n\n### Quickstart\nRun a local test episode:\n\n```python\nfrom robocourier import make_env\n\nenv = make_env()\nobs, info = env.reset()\ndone = False\n\nwhile not done:\n    action = env.action_space.sample()\n    obs, reward, terminated, truncated, info = env.step(action)\n    if terminated or truncated:\n        done = True\nOn Prime, install and run:\n\nbash\n\nprime env install sutan/robocourier\nuv run vf-eval robocourier\nConfigure sampling:\n\nbash\nuv run vf-eval robocourier \\\n  -m gpt-4.1-mini \\\n  -n 20 -r 3 -t 1024 -T 0.7 \\\n  -a '{\"grid_size\": 10, \"battery_max\": 30}'\nEnvironment Arguments\nArg\tType\tDefault\tDescription\ngrid_size\tint\t10\tSize of the grid world\nbattery_max\tint\t30\tMaximum battery capacity\nstep_cost\tfloat\t0.1\tPenalty applied each step\ndelivery_reward\tfloat\t10.0\tReward for successful delivery\nbattery_fail_penalty\tfloat\t5.0\tPenalty if battery runs out before recharge\nuse_stay\tbool\tFalse\tWhether to allow a “stay” action\n\nMetrics\nMetric\tMeaning\nreward\tMain scalar reward per step/episode\nsuccess\t1 if delivery completed, 0 otherwise\nsteps\tNumber of steps before termination/truncation\nbattery\tBattery fraction at end of episode","encoding":"utf-8","truncated":false,"total_bytes":1849},"status":null}