{"data":{"kind":"file","path":"README.md","version_id":"v0ck5zrwd1vyon3kc68ujpfd","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3914,"modified_at":"2026-02-10T17:25:54.173000","content_hash":"dd979116f1b646da661089e41414529f0ff6603d6eacd38387ca2b78d059bba1"},"entries":[],"content":"# among-us\n\n### Overview\n**Environment ID**: `fenil/among-us`\n\nA reinforcement learning environment for training and evaluating LLMs on social deduction gameplay. Models learn strategic deception, coordination, and social reasoning by playing 6-player Among Us games against other AI opponents.\n\n**Tags**: social-deduction, multi-agent, deception, games, train, eval\n\n### Task\n**Type**: Multi-turn social deduction game (6-player Among Us on The Skeld map)\n\n**Mechanics**: Room-based movement, fog-of-war, task completion, kills, body reports, emergency meetings, discussion, and voting\n\n**Modes**:\n- **Single-agent (bots)**: One model plays against 5 heuristic bots — fast iteration for evaluation\n- **Multi-agent (models)**: One model plays against 5 different LLMs via API — realistic opponents for RL training\n\n### Scoring (Antim Labs weighting)\n| Reward | Points | Condition |\n| ------ | ------ | --------- |\n| Impostor win | 50 | Model wins as impostor (requires deception) |\n| Crewmate win | 10 | Model wins as crewmate (requires coordination) |\n\n### Metrics\n| Metric | Meaning |\n| ------ | ------- |\n| `player_win` | 1.0 if the model's team won |\n| `game_completed` | 1.0 if the game reached a conclusion |\n| `task_progress` | Fraction of total tasks completed (0.0–1.0) |\n| `turn_efficiency` | Higher for faster wins (0.0–1.0) |\n\n### Quickstart\n\n**Install:**\n```bash\nprime env install fenil/among-us\n```\n\n**Evaluation:**\nRun single-model evaluations against heuristic bots using `prime eval run` with the environment ID.\n\n**RL Training:**\nConfigure multi-model opponents in your training TOML and launch with `prime rl run`.\n\n### Environment Arguments\n\n**Mode selection:**\n- `mode`: `\"bots\"` (default) or `\"models\"` (for RL training with API opponents)\n\n**Game configuration:**\n- `num_games`: Training dataset size (default: 60)\n- `num_eval_games`: Evaluation dataset size (default: 20)\n- `num_players`: Players per game, max 6 (default: 6)\n- `num_impostors`: Impostors per game (default: 1)\n- `tasks_per_player`: Tasks assigned to each crewmate (default: 3)\n- `kill_cooldown_turns`: Turns between impostor kills (default: 5)\n- `max_turns`: Maximum turns before draw (default: 80)\n- `impostor_ratio`: Fraction of games as impostor role (default: 0.5)\n\n**Opponent configuration (when `mode=\"models\"`):**\n- `opponent_models`: Comma-separated list or array of 5 model IDs\n- `opponent_base_url`: API endpoint (auto-detects Prime Inference)\n- `opponent_api_key_env`: Environment variable for API key (default: auto-detected)\n- `opponent_timeout`: API timeout in seconds (default: 30)\n\n### RL Training\n\nThe environment supports hosted RL training on Prime Intellect with multi-model opponents.\n\n**Configuration:**\nSet `mode = \"models\"` in your training TOML and specify 5 opponent models via `opponent_models`. The trained model (player 0) will play against these API-controlled opponents, learning strategic deception and social reasoning through reinforcement learning.\n\n**Results:**\nModels trained with RL show significant improvement in impostor win rate, learning to time kills, avoid suspicion, and lie convincingly in discussions without explicit instruction.\n\n### Architecture\n```\namong_us/\n├── map_config.py       # The Skeld map (rooms, tasks, adjacency graph)\n├── game_state.py       # Core game mechanics (movement, kills, voting, win conditions)\n├── parser.py           # Natural language → structured action parsing\n├── bots.py             # Heuristic AI opponents (BFS pathfinding, basic strategy)\n├── among_us.py         # Single-agent Verifiers environment (vs bots)\n├── among_us_multi.py   # Multi-agent RL environment (vs API models)\n├── dataset.py          # Procedural game generation (seeds, roles)\n├── rubric.py           # Reward functions (Antim Labs weighting)\n└── among_us_main.py    # load_environment() entrypoint\n```\n","encoding":"utf-8","truncated":false,"total_bytes":3914},"status":null}