{"data":{"kind":"file","path":"README.md","version_id":"hpgq3w4tbx00g9movsp29mid","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":5295,"modified_at":"2025-09-08T02:13:36.610000","content_hash":"13182d43ae2b227baa7ea8df18ebe40a86e16b42119723042a7845e25f2b0641"},"entries":[],"content":"# dynamic_programming\n\n### Overview\n- **Environment ID**: `dynamic_programming`\n- **Short description**: Multiple dynamic programming problems including 0-1 Knapsack, LCS, Coin Change, Path Counting, and Edit Distance\n- **Tags**: single-turn, reasoning, math, math-python, eval, train, synthetic, logic, qa, think, constraints\n\n### Datasets\n- **Primary dataset(s)**: Randomly generated dynamic programming problems\n- **Source links**: Programmatically generated using classic dynamic programming algorithms\n- **Split sizes**: Configurable via `max_examples` parameter (default: 100 problems)\n\n### Task\n- **Type**: single-turn\n- **Parser**: `ThinkParser` (with thinking) or `Parser` (direct) with custom solution extraction\n- **Rubric overview**: Problem-specific evaluation with optimal solution verification\n\n### Problem Types\n\n#### 1. 0-1 Knapsack\n- **Items**: Each with weight and value\n- **Constraint**: Maximum knapsack capacity\n- **Goal**: Maximize value while staying within weight limit\n- **Format**: `<solution>[1, 3, 5], 42</solution>`\n\n#### 2. Longest Common Subsequence (LCS)\n- **Input**: Two strings\n- **Goal**: Find longest sequence appearing in both strings\n- **Format**: `<solution>[ABC], 3</solution>`\n\n#### 3. Coin Change\n- **Input**: Coin denominations and target amount\n- **Goal**: Minimize coins needed to make target amount\n- **Format**: `<solution>[1, 5, 10], 3</solution>`\n\n#### 4. Path Counting\n- **Input**: Grid with obstacles\n- **Goal**: Count paths from top-left to bottom-right (right/down only)\n- **Format**: `<solution>[], 6</solution>`\n\n#### 5. Edit Distance\n- **Input**: Two strings\n- **Goal**: Minimum operations to transform one string to another\n- **Format**: `<solution>[], 3</solution>`\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval dynamic-programming\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval dynamic-programming \\\n  -m gpt-4o-mini \\\n  -n 20 -r 3 -t 1024 -T 0.7 \\\n  -a '{\"max_examples\": 500, \"use_think\": true}'\n```\n\nTest different problem types:\n\n```bash\n# Single problem type\nuv run vf-eval dynamic-programming -a '{\"problem_types\": [\"lcs\"]}'\n\n# Multiple problem types\nuv run vf-eval dynamic-programming -a '{\"problem_types\": [\"knapsack\", \"lcs\", \"coin_change\"]}'\n\n# All problem types\nuv run vf-eval dynamic-programming -a '{\"problem_types\": [\"knapsack\", \"lcs\", \"coin_change\", \"path_counting\", \"edit_distance\"]}'\n\n# Different random seed\nuv run vf-eval dynamic-programming -a '{\"seed\": 123}'\n```\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `max_examples` | int | `100` | Number of problems to generate |\n| `use_think` | bool | `True` | Use thinking mode with `<think>` tags (`ThinkParser`) or direct reasoning (`Parser`) |\n| `seed` | int | `42` | Random seed for reproducible problem generation |\n| `difficulty_range` | tuple | `(5, 15)` | Range affecting problem complexity |\n| `problem_types` | str or list | `\"knapsack\"` | Problem type(s) to include: `\"knapsack\"`, `\"lcs\"`, `\"coin_change\"`, `\"path_counting\"`, `\"edit_distance\"` |\n\n### Scoring System\n\n**Quality-Based Scoring (Range: -0.5 - 1.0)**\n\nScoring varies by problem type but generally follows:\n\n- **+1.0 points**: Perfect optimal solution\n- **+0.8 points**: Near-optimal solution (≥90% of optimal)\n- **+0.5 points**: Good solution (≥70% of optimal)\n- **+0.2 points**: Valid solution (≥50% of optimal)\n- **+0.1 points**: Poor but valid solution\n- **-0.5 points**: Invalid solution with constraint violations\n- **-0.2 points**: Format errors (wrong claimed values)\n- **0.0 points**: No solution provided or parsing errors\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `dp_reward` | Problem-specific scoring with constraint validation (-0.5 to 1.0) |\n\n### Example Interactions\n\n**Knapsack Problem:**\n```\n0-1 Knapsack Problem:\n\nItems available:\nItem 1: weight=2, value=12\nItem 2: weight=1, value=10\nItem 3: weight=3, value=20\n\nKnapsack capacity: 5\n\nFind the combination of items that maximizes value while staying within the weight limit.\nEach item can be taken at most once.\n\nProvide your answer in the required format: <solution>[item numbers], total value</solution>\nExample: <solution>[1, 3, 5], 42</solution>\n```\n\n**LCS Problem:**\n```\nLongest Common Subsequence (LCS) Problem:\n\nString 1: ABCDGH\nString 2: AEDFHR\n\nFind the longest subsequence that appears in both strings in the same order (but not necessarily consecutive).\n\nProvide your answer in the required format: <solution>[LCS string], length</solution>\nExample: <solution>[ABC], 3</solution>\n```\n\n**Expected Response Format:**\n```\n<think>\nFor the knapsack problem, let me analyze value-to-weight ratios:\n- Item 1: 12/2 = 6.0\n- Item 2: 10/1 = 10.0  \n- Item 3: 20/3 = 6.67\n\nTesting combinations within capacity 5:\n- Items [1,2]: weight=3, value=22\n- Items [1,3]: weight=5, value=32 ✓\n- Items [2,3]: weight=4, value=30\n\nBest is [1,3] with value 32.\n</think>\n\n<solution>[1, 3], 32</solution>\n```\n\n**Scoring Example:**\n```\nLLM Response: <solution>[1, 3], 32</solution>\nOptimal Solution: Items [1, 3] with value 32\n\nAnalysis:\n- Selected items: [1, 3] (weight=5, value=32)\n- Capacity constraint: 5/5 ✓ (exactly at limit)\n- Optimal value: 32/32 = 100% ✓\n\nFinal Score: 1.0 (perfect optimal solution)\n```","encoding":"utf-8","truncated":false,"total_bytes":5295},"status":null}