{"data":{"kind":"file","path":"README.md","version_id":"bkh3qif0cdu85hj5w3k4m92q","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1983,"modified_at":"2026-03-15T23:05:06.105000","content_hash":"30964bb666238209ef0ef825c9b4aeba84e7ef563783c1e599bd38dabdbd3569"},"entries":[],"content":"# BarterBench — Prime Intellect Environment\n\nRL training environment based on [BarterBench](https://github.com/JamesEBall/BarterBench): a competitive multi-agent marketplace where agents trade scarce resources to meet inventory targets.\n\n## Task\n\nThe model controls a single trader (Agent 0) in a marketplace of N agents. The remaining agents are RandomAgents (zero-cost baselines). On each turn the model receives the current marketplace state and must output a valid JSON trade action. The episode ends after all rounds complete or all agents meet their goals.\n\n**Reward:** Goal completion — fraction of target inventory acquired (0.0–1.0).\n\n## Scenarios\n\n| Scenario | Agents | Items | Rounds | Scarcity |\n|---|---|---|---|---|\n| `spice_wars` (default) | 10 | 5 | 12 | Gold + Gems (dual) |\n| `gold_rush` | 6 | 3 | 8 | Gold |\n| `water_crisis` | 8 | 4 | 10 | Water (extreme) |\n| `grand_bazaar` | 12 | 7 | 12 | Silk + Diamonds |\n\n## Usage\n\n```python\nimport verifiers as vf\n\nenv = vf.load_environment(\"barterbench\", scenario=\"spice_wars\", num_examples=50)\n```\n\n## Parameters\n\n| Parameter | Default | Description |\n|---|---|---|\n| `scenario` | `\"spice_wars\"` | Scenario name |\n| `num_examples` | `50` | Rollouts per training batch |\n| `seed` | `42` | Random seed |\n\n## Action Space\n\n```json\n{\"action\": \"post_offer\",    \"give\": {\"gold\": 1}, \"want\": {\"silk\": 2}, \"message\": \"...\"}\n{\"action\": \"private_offer\", \"give\": {\"gold\": 1}, \"want\": {\"silk\": 2}, \"target_agent\": 3, \"message\": \"...\"}\n{\"action\": \"accept_offer\",  \"offer_id\": 5, \"message\": \"...\"}\n{\"action\": \"pass_turn\",     \"message\": \"...\"}\n```\n\n## Key Findings\n\nAll tested frontier models (Claude Sonnet/Opus, GPT-4o, Llama-70B) achieve near-zero Information Security Score — they immediately disclose their target items in round 1 despite this being strategically dominated. This **cooperative norm transfer** is the primary training signal: RL should teach models to conceal targets while still executing efficient trades.\n","encoding":"utf-8","truncated":false,"total_bytes":1983},"status":null}