{"data":{"kind":"file","path":"README.md","version_id":"b8b29o7qzw97rcqmflvejubl","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3486,"modified_at":"2026-03-25T22:08:07.783000","content_hash":"fb2cb75e491a172945bb2b74b95ebc158628905b66822a64283ec44f748bcb7f"},"entries":[],"content":"# Art Blocks Collector\n\nRL training and evaluation environment for [Art Blocks](https://artblocks.io), the leading generative art platform.\n\nTests AI model knowledge of Art Blocks projects, artists, editions, on-chain data, trait distributions, and platform-specific concepts.\n\n## Benchmark Results (March 2026)\n\n| Model | Score |\n|-------|-------|\n| Opus 4.6 | 56.8% |\n| o3 | 56.0% |\n| GPT-4.1 | 45.6% |\n| Sonnet 4.6 | 41.6% |\n| DeepSeek V3 | 39.6% |\n| Haiku 4.5 | 36.4% |\n| Llama 4 Scout | 24.0% |\n\nAll models evaluated zero-shot with no tools or context provided. 50 questions per model.\n\n## Dataset\n\n207 Q&A pairs across 12 categories:\n\n- **Artist lookups** — project-to-artist mapping\n- **Edition sizes** — maxInvocations for projects across eras\n- **Script types** — p5, three.js, regl, paper, tone, twemoji, custom\n- **Verticals** — curated, playground, factory, studio, explorations, flex\n- **Chains** — Ethereum, Arbitrum, Base\n- **Licenses** — CC BY, NFT License, CC0, etc.\n- **Contract addresses** — V1, V2, V3, Engine contracts\n- **Project indices and slugs**\n- **Trait/rarity data** — Ringers backgrounds, Fidenza colors, Meridian styles\n- **Platform knowledge** — tokenData, PostParams, MCP tools\n- **Engine partners** — Bright Moments, Doodle Labs, AOI\n- **Multi-project artists** — Snowfro, Kjetil Golid, Jeff Davis, etc.\n\n## Task\n\n- **Type**: single-turn\n- **Output format**: plain text (concise answer, no explanation)\n- **Rubric**: fuzzy-match with normalization (case insensitive, prefix stripping, containment matching)\n\n### Metrics\n\n| Metric | Meaning |\n|--------|---------|\n| `reward` | Main scalar reward (1.0 exact, 0.8 contained, 0.6 partial, 0.0 miss) |\n| `check_answer` | Same as reward (single rubric function) |\n\n## Quick Start\n\n```bash\nprime env install jordanlyall/artblocks-collector\n\n# Evaluate with Anthropic (note: base URL without /v1)\nexport ANTHROPIC_API_KEY=\"...\"\nprime eval run artblocks-collector \\\n  --api-base-url \"https://api.anthropic.com\" \\\n  --api-key-var ANTHROPIC_API_KEY \\\n  --api-client-type anthropic_messages \\\n  -m claude-haiku-4-5-20251001 \\\n  --num-examples 50\n\n# Evaluate with OpenRouter (GPT-4o, Llama, DeepSeek, etc.)\nexport OPENROUTER_API_KEY=\"...\"\nprime eval run artblocks-collector \\\n  --api-base-url \"https://openrouter.ai/api/v1\" \\\n  --api-key-var OPENROUTER_API_KEY \\\n  --api-client-type openai_chat_completions \\\n  -m meta-llama/llama-4-scout \\\n  --num-examples 50\n```\n\n## What This Measures\n\nBaked-in knowledge from pre-training. No tools, no MCP, no retrieval. Models that score high genuinely understand generative art. Models that score low need domain-specific training.\n\nThe ceiling (~57%) shows that even frontier models have significant gaps in domain knowledge for generative art.\n\n## Roadmap\n\n- **v0.4**: Expand dataset to 300+ (more Engine projects, historical pricing, governance)\n- **Phase 2**: Multi-turn `RLMEnv` with Art Blocks MCP tools as harness\n- **Phase 3**: Transaction validation rubrics against testnet\n- **Phase 4**: Aesthetic/recommendation rubrics with human-labeled ground truth\n\n## About Art Blocks\n\nArt Blocks is the leading platform for generative art on Ethereum. Artists upload algorithms that produce unique artworks at mint time, with the transaction hash as the seed for randomness. 500+ projects across curated, playground, and engine verticals since 2020.\n\n- [artblocks.io](https://artblocks.io)\n- [Art Blocks MCP](https://github.com/ArtBlocks/artblocks-mcp)\n","encoding":"utf-8","truncated":false,"total_bytes":3486},"status":null}