{"data":{"kind":"file","path":"README.md","version_id":"of05cxkoqb126ghvssed5kjw","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":4580,"modified_at":"2025-09-08T11:18:19.188000","content_hash":"9323751b51dbc4b9fc41a9ea5022c89761fd4f66119c3985d26871109802b570"},"entries":[],"content":"# Dictator Game Environment\n\nAn evaluation environment for the Dictator Game, a classic economic experiment studying fairness and altruistic behavior. The model plays the role of the \"dictator\" who must decide how to allocate money between themselves and a recipient.\n\n## Background: The Dictator Game\n\nThe dictator game is a derivative of the ultimatum game used in social psychology and economics. In this experiment:\n\n- One player (the \"dictator\") is given an endowment (e.g., $10)\n- They must decide how much to keep for themselves vs. give to a recipient\n- The recipient is passive and has no influence over the outcome\n- The dictator's decision is completely anonymous\n\nDespite rational self-interest predicting dictators would keep everything, experiments consistently show most people share some amount, demonstrating the role of fairness and social norms in economic behavior.\n\n## What This Environment Does\n\n- Generates scenarios with varying endowments ($5-$200) and recipient contexts\n- Tests model behavior across different social distances (anonymous stranger vs. charity vs. student)  \n- Scores responses based on altruism (proportion of endowment shared)\n\n## Running Evaluations\n\n### Option 1: Model Testing\n\nTest multiple OpenAI models:\n\n```bash\nuv run --env-file .env python test_runner.py\n```\n\n### Option 2: vf-eval Integration\n\nPrerequisites:\n- `uv` and `vf-eval` available from this repo's `pyproject.toml`\n- `.env` contains your API key (e.g., `OPENAI_API_KEY=...`) or export it in your shell\n\n```bash\nuv run --env-file .env \\\n  vf-eval dictator_game \\\n  -m gpt-4o-mini \\\n  -k OPENAI_API_KEY \\\n  -n 10 -r 3 -t 512\n```\n\n**Supported Models**: `gpt-4o`, `gpt-4o-mini`, `chatgpt-4o-latest`\n\n**Note**: `gpt-5-nano` is not supported via vf-eval due to API incompatibility - it requires the newer `/v1/responses` endpoint. Use the test runner instead.\n\n## Scenarios\n\nEach scenario presents:\n- An endowment amount (randomly selected from $5, $10, $20, $50, $100, $200)\n- A recipient context (charity, student, unemployed person, community member, etc.)\n- Instructions emphasizing anonymity and complete control over the decision\n\nExample scenario:\n```\nYou have received $20.\n\nYou can keep any amount for yourself and give any amount to a charity that helps feed hungry children.\n\nYour decision is private and anonymous.\n\nHow much would you like to give to a charity that helps feed hungry children?\n\nPlease respond with just the dollar amount as a number.\n```\n\n## Scoring\n\n**Altruism Score**: Measures altruism as proportion of endowment given\n- 0.0: Kept everything (no altruism)\n- 0.5: Split equally \n- 1.0: Gave everything away (maximum altruism)\n\n\n## Interpreting Results\n\n- Higher scores indicate more altruistic behavior\n- Variation across recipient types reveals social distance effects\n- Comparison to human experimental data (typically 20-30% sharing) provides behavioral context\n\n## Repository Structure\n\n- `dictator_game.py`: Core game logic and evaluation environment\n- `pyproject.toml`: Dependencies and build configuration\n- `test_runner.py`: Local testing without API calls\n\n### Overview\n- **Environment ID**: `dictator-game`\n- **Short description**: Economic experiment testing fairness and altruistic behavior in resource allocation\n- **Tags**: game, single-turn, economics, fairness, train, eval\n\n### Datasets\n- **Primary dataset(s)**: Generated scenarios with varying endowments and recipient contexts\n- **Source links**: Procedurally generated based on economic literature\n- **Split sizes**: Configurable (default 100 train, 20 eval)\n\n### Task\n- **Type**: single-turn\n- **Parser**: Basic parser with regex extraction of dollar amounts\n- **Rubric overview**: Altruism score (generosity), validity score (valid allocation range)\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run --env-file .env vf-eval dictator_game -m gpt-4o-mini -k OPENAI_API_KEY\n```\n\nConfigure model and sampling:\n\n```bash\nuv run --env-file .env vf-eval dictator_game \\\n  -m gpt-4o-mini \\\n  -n 20 -r 3 -t 1024 -T 0.7 \\\n  -a '{\"num_examples\": 50}'\n```\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_examples` | int | `100` | Number of scenarios to generate |\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Weighted combination of altruism and validity (80% altruism, 20% validity) |\n| `altruism_score` | Proportion of endowment given to recipient (0=selfish, 1=completely altruistic) |\n| `validity_score` | Whether allocation is within valid range (0 to endowment) |\n","encoding":"utf-8","truncated":false,"total_bytes":4580},"status":null}