{"data":{"kind":"file","path":"README.md","version_id":"tk2dr0ne1h8bw0i49tk58y5m","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":7133,"modified_at":"2025-09-10T00:18:10.423000","content_hash":"c6751500a30caf2c4919b0093f52c8b82f2596d8794d4a393fe422c0085eaa0f"},"entries":[],"content":"# SEO Judge Environment 🔍\n\nA comprehensive multi-turn SEO evaluation environment that challenges AI models to analyze web pages, identify SEO issues, and provide actionable optimization recommendations following modern search engine best practices.\n\n## Overview\n- **Environment ID**: `vf-seo-judge`\n- **Type**: Multi-turn tool use environment\n- **Description**: Evaluates AI agents on their ability to perform comprehensive SEO audits and provide optimization recommendations\n- **Tags**: `seo`, `web-analysis`, `multi-turn`, `tool-use`, `evaluation`\n\n## How It Works\n\nThe SEO Judge environment presents AI models with web pages to analyze and challenges them to:\n\n1. **Fetch and Parse Content**: Use web scraping tools to retrieve page content\n2. **Analyze SERP Competition**: Research competing pages in search results\n3. **Audit SEO Elements**: Check technical SEO, metadata, content structure, and performance\n4. **Generate Recommendations**: Provide specific, actionable optimization suggestions\n5. **Verify Solutions**: Validate proposed fixes against best practices\n\n### Task Flow\n\nThe environment follows a structured multi-turn interaction:\n\n```\nUser Query → Tool Selection → Analysis → Recommendations → Scoring\n     ↑                                                        ↓\n     └──────────← Iterative Refinement ←──────────────────────┘\n```\n\n## Available Tools\n\nModels have access to a comprehensive SEO toolkit:\n\n### 📄 Content Analysis\n- **`fetch_page`**: Retrieve HTML content from URLs\n- **`extract_readable`**: Parse main content, headings, and structure\n- **`extract_meta`**: Extract title, meta description, canonical tags\n- **`schema_parse`**: Analyze structured data (JSON-LD, microdata)\n\n### 🔍 SERP Research\n- **`serp_topn`**: Get top search results for target keywords\n- **`serp_gap_matrix`**: Compare content coverage vs competitors\n\n### ⚡ Performance Analysis\n- **`pagespeed_insights`**: Core Web Vitals and performance metrics\n\n### 🔧 Technical Audits\n- **`image_alt_audit`**: Check image accessibility and optimization\n- **`links_audit`**: Analyze internal/external link structure\n- **`robots_audit`**: Validate robots.txt and meta robots directives\n\n### ✏️ Content Optimization\n- **`rewrite_suggestions`**: Generate optimized content recommendations\n- **`lint_rewrites`**: Validate proposed content changes\n- **`verify_fixes`**: Check if recommendations follow best practices\n\n## Scoring Rubric\n\nThe environment uses a weighted 7-dimensional scoring system (0-100 per dimension):\n\n| Dimension | Weight | Description |\n|-----------|--------|-------------|\n| **People-First Intent** | 25% | Content quality, structure, and user value |\n| **SERP Coverage** | 20% | Competitive analysis and content gaps |\n| **Metadata & SERP** | 15% | Title tags, meta descriptions, snippets |\n| **Crawlability & Anchors** | 10% | Link structure and crawl accessibility |\n| **Structured Data** | 10% | Schema markup and rich snippets |\n| **Indexability & Canonicals** | 10% | Technical indexing optimization |\n| **Core Web Vitals** | 10% | Performance and user experience metrics |\n\n### Scoring Details\n\n- **People-First Intent**: Rewards H1 presence, content depth (600+ words), and outline structure (3+ sections)\n- **SERP Coverage**: Compares content headings against competitor analysis\n- **Metadata**: Validates title length (30-65 chars) and meta description (80-170 chars)\n- **Crawlability**: Assesses internal link quality and anchor text\n- **Structured Data**: Credits JSON-LD implementation, penalizes validation errors\n- **Indexability**: Checks canonical tags, penalizes noindex directives\n- **Core Web Vitals**: Evaluates LCP (<2.5s), CLS (<0.1), INP (<200ms)\n\n## Output Contract\n\nModels must return structured JSON with:\n\n```json\n{\n  \"summary\": \"Brief assessment overview\",\n  \"scorecard\": {\n    \"overall\": 85,\n    \"people_first_intent\": 90,\n    \"serp_coverage\": 75,\n    \"metadata_serp\": 85,\n    \"crawlability_anchors\": 80,\n    \"structured_data\": 60,\n    \"indexability_canonicals\": 95,\n    \"core_web_vitals\": 70\n  },\n  \"rewrites\": {\n    \"title\": \"Optimized title tag\",\n    \"meta_description\": \"Compelling meta description\",\n    \"h1\": \"Primary heading\",\n    \"outline\": [\"Section 1\", \"Section 2\"],\n    \"jsonld\": {\"@type\": \"Article\"},\n    \"internal_links\": [{\"anchor\": \"text\", \"url\": \"/page\"}],\n    \"image_alt_fixes\": [{\"src\": \"image.jpg\", \"alt\": \"description\"}]\n  },\n  \"fixlist\": {\n    \"now\": [\"Critical fixes\"],\n    \"next\": [\"Future improvements\"]\n  },\n  \"evidence\": {\n    \"serp_sample\": [],\n    \"psi\": {},\n    \"meta\": {}\n  }\n}\n```\n\n## Quickstart\n\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval vf-seo-judge\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval vf-seo-judge \\\n  -m gpt-4o-mini \\\n  -n 20 -r 3 -t 2048 -T 0.3 \\\n  -a '{\"max_turns\": 8, \"timeout\": 300}'\n```\n\n## Environment Arguments\n\n| Arg | Type | Default | Description |\n|-----|------|---------|-------------|\n| `max_turns` | int | `6` | Maximum tool use turns per evaluation |\n| `timeout` | int | `180` | Timeout per tool call (seconds) |\n| `allow_external` | bool | `true` | Allow fetching external URLs |\n| `serp_limit` | int | `10` | Maximum SERP results to return |\n\n## Key Metrics\n\n| Metric | Range | Interpretation |\n|--------|-------|----------------|\n| `reward` | 0-100 | Weighted average of all scoring dimensions |\n| `people_first_intent` | 0-100 | Content quality and user-focused optimization |\n| `serp_coverage` | 0-100 | Competitive analysis thoroughness |\n| `metadata_serp` | 0-100 | Title and meta description optimization |\n| `crawlability_anchors` | 0-100 | Link structure and crawl accessibility |\n| `structured_data` | 0-100 | Schema markup implementation |\n| `indexability_canonicals` | 0-100 | Technical SEO fundamentals |\n| `core_web_vitals` | 0-100 | Performance optimization score |\n\n## Example Usage\n\n### Basic SEO Analysis\n```python\nimport verifiers as vf\nfrom vf_seo_judge import load_environment\n\n# Load environment\nenv = load_environment(max_turns=8)\n\n# Example query\nquery = \"Analyze the SEO of https://example.com/blog-post and provide optimization recommendations\"\n\n# Run evaluation\nresult = env.evaluate_single(query, model=\"gpt-4o\")\nprint(f\"SEO Score: {result['reward']}/100\")\n```\n\n## Training Recommendations\n\n1. **Start with Technical Audits**: Begin with tools like `extract_meta` and `robots_audit`\n2. **Progress to Content Analysis**: Move to `extract_readable` and content structure\n3. **Add Competitive Intelligence**: Incorporate `serp_topn` and gap analysis\n4. **Focus on Actionability**: Emphasize specific, implementable recommendations\n5. **Balance Dimensions**: Don't over-optimize single scoring aspects\n\n## Common Pitfalls\n\n- **Tool Selection**: Using too many tools without clear purpose\n- **Generic Advice**: Providing boilerplate SEO recommendations\n- **Ignoring Context**: Not adapting advice to specific page type/industry\n- **Missing Evidence**: Making claims without supporting tool data\n- **Poor Prioritization**: Not distinguishing critical vs. nice-to-have fixes\n\n","encoding":"utf-8","truncated":false,"total_bytes":7133},"status":null}