{"data":{"kind":"file","path":"README.md","version_id":"iblkftxvezf5c5ondyz4srn7","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":4581,"modified_at":"2026-06-06T18:17:05.982000","content_hash":"7228006d36ee5e93791b057a3e83cf9805e8c79bec0a24f789c8c15270a67956"},"entries":[],"content":"# Futuresim Verifiers Environment\n\nFuturesim is a Prime Intellect Verifiers environment for evaluating forecasting\nagents in a date-gated news simulation. The environment advances a simulated\nforecasting market, exposes only information available at each date, records\nforecasts, and scores them after resolution.\n\nUseful links:\n\n- Hub install: `prime env install shash42/futuresim`\n- Blogpost: [openforecaster.github.io/futuresim](https://openforecaster.github.io/futuresim/)\n- Paper: [arxiv.org/abs/2605.15188](https://arxiv.org/abs/2605.15188)\n- Questions: [nikhilchandak/OpenForesight](https://huggingface.co/datasets/nikhilchandak/OpenForesight)\n- Article corpus: [shash42/forecast-news](https://huggingface.co/datasets/shash42/forecast-news)\n- LanceDB hybrid index: [shash42/forecast-news-embeddings](https://huggingface.co/datasets/shash42/forecast-news-embeddings)\n- Embedding model: [Qwen/Qwen3-Embedding-8B](https://huggingface.co/Qwen/Qwen3-Embedding-8B)\n\n## What It Provides\n\n- A Verifiers `load_environment(\"futuresim\")` entrypoint.\n- A date-gated filesystem article workspace for CLI agents.\n- Optional MinimalHarness-compatible Codex and Claude Code runner support.\n- Optional LanceDB hybrid MCP search, with raw search artifacts kept outside the\n  agent shell when `agent_filesystem_sandbox` is enabled.\n- Exact matching for quick smoke tests, and OpenRouter/vLLM answer matching for\n  faithful reproduction.\n\nThe OpenReward/ORS integration is also available in the GitHub repository, but\nthis package page focuses on the Verifiers environment.\n\n## Quick Start\n\nInstall from Prime:\n\n```bash\nprime env install shash42/futuresim\n```\n\nLoad locally:\n\n```python\nfrom verifiers import load_environment\n\nenv = load_environment(\n    \"futuresim\",\n    env_args={\n        \"futuresim\": {\n            \"articles_base\": \"/path/to/articles\",\n            \"matching\": \"exact\"\n        },\n        \"minimal_harness\": {\n            \"harness_backend\": \"codex\",\n            \"model\": \"gpt-5.5\",\n            \"reasoning_effort\": \"xhigh\",\n            \"codex_resume\": True,\n            \"agent_filesystem_sandbox\": True,\n            \"network_isolation\": True\n        },\n        \"sandbox\": {\n            \"docker_image\": \"your-registry/futuresim-sandbox:latest\",\n            \"network_access\": True,\n            \"timeout_minutes\": 720,\n            \"timeout_per_command_seconds\": 86400\n        }\n    }\n)\n```\n\nThis exact/no-hybrid configuration is only a smoke test. To reproduce the paper\nexperiments, use answer matching and the LanceDB hybrid search tool.\n\n## Required Artifacts\n\nDownload the filesystem article corpus:\n\n```bash\nhf download shash42/forecast-news \\\n  --repo-type dataset \\\n  --local-dir /path/to/articles \\\n  --include '2025/12/**' \\\n  --include '2026/**'\n```\n\nThe article tree must be shaped as:\n\n```text\narticles_base/\n  YYYY/\n    MM/\n      DD/\n        articles.jsonl\n```\n\nFor hybrid search, also provide:\n\n```bash\nhf download shash42/forecast-news-embeddings \\\n  --repo-type dataset \\\n  --local-dir /path/to/lancedb\n\nhf download Qwen/Qwen3-Embedding-8B \\\n  --local-dir /path/to/embedding-model\n```\n\nHybrid search can also use an already-running OpenAI-compatible embedding\nserver via `embedding_server_url`.\n\n## Faithful Reproduction\n\nRecommended Futuresim settings:\n\n```json\n{\n  \"futuresim\": {\n    \"dataset_path\": \"nikhilchandak/OpenForesight\",\n    \"split\": \"aljazeeraQ12026v37\",\n    \"start_date\": \"2025-12-31\",\n    \"end_date\": \"2026-03-28\",\n    \"resolution_start\": \"2025-12-31\",\n    \"resolution_end\": \"2026-03-28\",\n    \"articles_base\": \"/path/to/articles\",\n    \"matching\": \"openrouter\",\n    \"matcher\": \"deepseek/deepseek-v3.2\",\n    \"matcher_api_key_env\": \"OPENROUTER_API_KEY\",\n    \"enable_hybrid_search\": true,\n    \"hybrid_search\": {\n      \"search_db\": \"/path/to/lancedb\",\n      \"embedding_model\": \"/path/to/embedding-model\",\n      \"search_type\": \"hybrid\",\n      \"max_results\": 10\n    }\n  }\n}\n```\n\nSet `OPENROUTER_API_KEY` as a Prime secret or environment variable. CLI-agent\nreproductions also require each user to provide their own Codex or Claude Code\ncredentials; Futuresim does not ship maintainer keys.\n\n## Sandbox Notes\n\nUse `agent_filesystem_sandbox: true` and `network_isolation: true` for public\nor reproducibility runs. The outer Prime sandbox may need `network_access: true`\nso the CLI can reach its model provider, while Futuresim's inner runner blocks\ngeneral agent egress and routes allowed provider traffic through its proxy.\n\nThe agent may submit zero forecasts on a day. Forecasts count only after the\nagent calls MCP `submit_forecasts` and then `next_day`.\n","encoding":"utf-8","truncated":false,"total_bytes":4581},"status":null}