{"data":{"kind":"file","path":"README.md","version_id":"i665d6rdv441risx3stjw0xh","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":6035,"modified_at":"2026-03-13T16:32:18.433000","content_hash":"548fdf8a71693e99a2bfa8a1e96f11af75d5832a1f73b4ee5c8b3c2fbcb13b6e"},"entries":[],"content":"# buyer-seller\n\nMulti-turn buyer/seller negotiation environment for `verifiers`.\n\n- Buyer: policy model selected by `vf-eval -m ...`\n- Seller: fixed opponent called inside the environment via LiteLLM\n\n## Repository Layout\n\n- `buyer_seller.py`: `NegotiationEnv` + `load_environment()` entrypoint\n- `utils.py`: `.env` loading, env validation, dataset loading, role-flip helper\n- `rewards.py`: action parser + 7 reward functions\n- `generators/generate_dataset.py`: synthetic dataset generator (template + OpenAI-style LLM + HF chat LLM modes)\n- `dataset.json`: sample dataset (10 episodes)\n- `test_seller_model_smoke.py`: real seller API smoke test (no mocks)\n- `generators/generators.md`: concise dataset generation guide\n\n## Configuration\n\nThe environment uses `utils._validate_env()` and auto-loads `.env` from repo root.\n\nRequired:\n\n- `OPENAI_API_KEY`\n- `SELLER_MODEL`\n- `OPENAI_API_BASE`\n\nOptional:\n\n- `MAX_TURNS` (default `10`)\n- `HF_DATASET_REPO` (default `ViditOstwal/price-negotiation-datasets`)\n- `HF_DATASET_SPLIT` (default `train`)\n- `HF_TOKEN` (or `HUGGINGFACE_HUB_TOKEN`) for private HF datasets\n- `DATASET_PATH` local fallback path (default `dataset.json`)\n\nExample `.env`:\n\n```bash\nOPENAI_API_KEY=sk-...\nSELLER_MODEL=openai/gpt-4.1-mini\nOPENAI_API_BASE=https://api.openai.com/v1\nHF_DATASET_REPO=ViditOstwal/price-negotiation-datasets\nHF_DATASET_SPLIT=train\nDATASET_PATH=dataset.json\nMAX_TURNS=10\n```\n\n## Runtime Flow\n\n1. `load_environment()` validates env vars and loads dataset (HF `train` split first, then local `dataset.json` fallback).\n2. Buyer sends an action (`<action>OFFER $X</action>`, `ACCEPT`, or `WALK`).\n3. Env parses buyer action and updates state.\n4. Env calls seller model via `litellm.acompletion(...)`.\n5. Seller action is parsed and applied.\n6. Episode ends on max turns, deal, or walk-away.\n\nImportant seller safety rules enforced in code:\n\n- Seller cannot accept below `seller_reserve_price`.\n- Seller cannot offer below `seller_reserve_price` (offer is clamped).\n- Seller API failures trigger fallback response and are recorded in `state[\"seller_errors\"]`.\n\n## Rewards\n\nRubric with 7 rewards (`surplus_reward` weighted 3x, others 1x):\n\n- `surplus_reward`\n- `walkaway_penalty`\n- `format_reward`\n- `efficiency_bonus`\n- `anchoring_reward`\n- `concession_rate_reward`\n- `decreasing_concessions_reward`\n\nReward targets:\n- `surplus_reward`: maximize buyer value capture on deals.\n- `walkaway_penalty`: reward correct outcome decisions (close when feasible, walk when infeasible).\n- `format_reward`: keep buyer action tags consistently valid.\n- `efficiency_bonus`: finish successful deals in fewer turns.\n- `anchoring_reward`: encourage a strong but realistic opening anchor near the ideal point.\n- `concession_rate_reward`: discourage large per-turn upward concessions.\n- `decreasing_concessions_reward`: encourage concession sequences that shrink over time.\n\nSee `rewards.py` for exact formulas.\n\n## Run Commands\n\n### 1) Seller Smoke Test (real API call)\n\n```bash\nuv run python -m unittest -q test_seller_model_smoke.py\n```\n\nVerbose:\n\n```bash\nuv run python -m unittest -v test_seller_model_smoke.py\n```\n\nIf `uv` cache permissions fail:\n\n```bash\nUV_CACHE_DIR=.uv-cache uv run python -m unittest -q test_seller_model_smoke.py\n```\n\n### 2) Evaluate Buyer with Verifiers\n\n```bash\nuv run vf-eval buyer_seller -m openai/gpt-4.1-mini -n 5 -r 1\n```\n\nPrime Inference style:\n\n```bash\nuv run vf-eval buyer_seller \\\n  -m openai/gpt-4o \\\n  -k PRIME_API_KEY \\\n  -b https://api.pinference.ai/api/v1 \\\n  -n 1 -r 1 -s\n```\n\n### 3) Generate Dataset\n\nTemplate mode (default):\n\n```bash\nuv run python generators/generate_dataset.py --mode template --n 100 --output dataset.json --seed 42\n```\n\nLLM mode:\n\n```bash\nuv run python generators/generate_dataset.py --mode llm --n 100 --output dataset.json --seed 42\n```\n\nHF LLM mode:\n\n```bash\nHF_LLM_MODEL=Qwen/Qwen2.5-72B-Instruct:novita \\\nHF_TOKEN=hf_... \\\nuv run python generators/generate_dataset.py --mode hf-llm --n 100 --output dataset.json --seed 42\n```\n\nLLM mode + push to Hugging Face Hub:\n\n```bash\nHF_TOKEN=hf_... \\\nHF_DATASET_REPO=your-hf-username/price-negotiation-dataset \\\nuv run python generators/generate_dataset.py \\\n  --mode llm --n 100 --output dataset.json --seed 42 \\\n  --push-to-hf --hf-split train\n```\n\nLLM mode env vars:\n\n- Required: `OPENAI_API_KEY`\n- Optional: `OPENAI_API_BASE` (default `https://api.openai.com/v1`)\n- Optional: `GENERATOR_MODEL` (default `gpt-4o-mini`)\n- Required for `--mode hf-llm`: `HF_LLM_MODEL`\n- Required for `--mode hf-llm`: `HF_TOKEN` or `HUGGINGFACE_HUB_TOKEN`\n- Optional for `--mode hf-llm`: `HF_LLM_API_BASE`\n- Optional for HF push: `HF_TOKEN` (or `HUGGINGFACE_HUB_TOKEN`)\n- Optional for HF push: `HF_DATASET_REPO` (or `HF_REPO_ID`)\n\nHF write modes:\n- `append` (default): load existing split and append newly generated rows before push\n- `overwrite`: replace the target split with newly generated rows\n- For `--mode llm` or `--mode hf-llm` with `--push-to-hf`, the generator checkpoints by default every `100` rows (`--hf-push-every`) so partial progress is preserved if a later step fails.\n\n`generators/generate_dataset.py` auto-loads missing values from repo-root `.env` before validation.\n\nNote: category balancing is enabled by default in current script behavior.\n\n## Dataset Categories\n\nThe generator currently samples from 10 categories:\n\n- `antiques`\n- `electronics`\n- `collectibles`\n- `vehicles`\n- `art`\n- `furniture`\n- `jewelry`\n- `musical_instruments`\n- `sports_outdoors`\n- `luxury_fashion`\n\nIn template mode, every category has a curated product bank. In LLM modes, the category is passed to the model and the product is generated dynamically.\n\n## Current Sample Dataset (`dataset.json`)\n\n- Episodes: `10`\n- Categories: depends on the file contents; balanced generation now spreads rows across 10 categories\n- Difficulties: `easy/medium/hard/no_deal` mix\n- Generator version:\n  - `1.1-template` for template generation\n  - `2.0-llm` for LiteLLM/OpenAI-style generation\n  - `2.1-hf-llm` for Hugging Face chat generation\n","encoding":"utf-8","truncated":false,"total_bytes":6035},"status":null}