{"data":{"kind":"file","path":"README.md","version_id":"mywhsrsd1r4pee5o3e52s1xd","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":5760,"modified_at":"2025-09-18T10:54:26.843000","content_hash":"442b17b0174cef29e66f7a376ce288307264f7c24076a135c740caa9908a264e"},"entries":[],"content":"# Code Debugger Environment\n\n### Overview\n- **Environment ID**: `code_debugger`\n- **Short description**: Advanced tool-based environment for debugging Python code with execution, analysis, testing, and optimization capabilities.\n- **Tags**: code-debugging, python, tools, testing, performance, optimization\n\n### Features\nThis environment provides sophisticated debugging tools for Python code:\n\n- **🔧 Code Execution**: Safely execute Python code with timeout protection\n- **🔍 Syntax Analysis**: Analyze code for syntax errors and provide suggestions\n- **🧪 Automated Testing**: Test code with multiple input cases and validate outputs\n- **⚡ Performance Profiling**: Profile code execution time and identify bottlenecks\n- **💡 Optimization Suggestions**: Analyze code and suggest performance improvements\n\n### Tools Available\n\n#### 1. `execute_python_code(code: str, timeout: int = 5)`\nExecutes Python code safely with output capture and error handling.\n\n**Parameters:**\n- `code`: Python code to execute\n- `timeout`: Maximum execution time in seconds\n\n**Returns:**\n- Execution output, errors, success status, and execution time\n\n#### 2. `analyze_code_syntax(code: str)`\nAnalyzes Python code for syntax errors and code quality issues.\n\n**Returns:**\n- Syntax validity, errors, warnings, and suggestions\n- AST analysis with function/class counts\n\n#### 3. `test_code_with_inputs(code: str, test_cases: List[Dict])`\nTests code with various input cases and validates outputs.\n\n**Parameters:**\n- `code`: Python code to test\n- `test_cases`: List of test cases with inputs and expected outputs\n\n**Returns:**\n- Test results with pass/fail counts and detailed results\n\n#### 4. `profile_code_performance(code: str, iterations: int = 1000)`\nProfiles code performance and provides timing metrics.\n\n**Returns:**\n- Execution times, performance rating, and timing statistics\n\n#### 5. `suggest_code_optimizations(code: str)`\nAnalyzes code and suggests optimizations and best practices.\n\n**Returns:**\n- Optimization suggestions, complexity issues, and best practice recommendations\n\n### Quickstart\n\n#### Basic Evaluation\n```bash\nuv run vf-eval code_debugger -m gpt-4o-mini -n 5 -r 2\n```\n\n#### With Custom Parameters\n```bash\nuv run vf-eval code_debugger \\\n  -m gpt-4o-mini \\\n  -n 10 -r 3 \\\n  -a '{\"max_turns\": 8}'\n```\n\n### Example Problems\n\nThe environment includes various debugging challenges:\n\n#### 1. Factorial Function Bug\n```python\ndef factorial(n):\n    if n == 0:\n        return 1\n    else:\n        return n * factorial(n - 1)\n\n# Problem: Doesn't handle negative numbers\nprint(factorial(-1))  # Should handle gracefully\n```\n\n#### 2. List Processing Bug\n```python\ndef process_list(items):\n    result = []\n    for item in items:\n        if item > 0:\n            result.append(item * 2)\n    return result\n\n# Problem: May have logic issues\nnumbers = [1, -2, 3, -4, 5]\nprint(process_list(numbers))\n```\n\n#### 3. String Manipulation Bug\n```python\ndef reverse_words(sentence):\n    words = sentence.split()\n    reversed_words = []\n    for word in words:\n        reversed_words.append(word[::-1])\n    return ' '.join(reversed_words)\n\n# Problem: May not handle edge cases\nprint(reverse_words(\"\"))  # Empty string\n```\n\n### Evaluation Criteria\n\nThe environment evaluates debugging quality based on:\n\n- **Code Execution** (20%): Successfully executing and testing code\n- **Syntax Analysis** (20%): Analyzing code for errors and issues\n- **Test Coverage** (30%): Running comprehensive tests\n- **Performance Analysis** (10%): Considering code performance\n- **Optimization** (20%): Suggesting improvements and best practices\n\n### Use Cases\n\n- **🧑‍💻 Code Review Training**: Train models to review and debug code\n- **🔧 Debugging Skills**: Evaluate debugging and problem-solving abilities\n- **📊 Code Quality Assessment**: Test understanding of code quality metrics\n- **⚡ Performance Analysis**: Evaluate performance optimization knowledge\n- **🧪 Testing Practices**: Assess testing and validation skills\n\n### Advanced Features\n\n#### Safe Execution Environment\n- Restricted built-ins for security\n- Timeout protection against infinite loops\n- Output capture and error handling\n\n#### Comprehensive Analysis\n- AST-based code analysis\n- Complexity detection (nested loops, etc.)\n- Best practice recommendations\n\n#### Performance Profiling\n- Execution time measurement\n- Performance rating system\n- Bottleneck identification\n\n#### Test Automation\n- Multiple test case support\n- Input/output validation\n- Error handling verification\n\n### Performance Expectations\n\n- **GPT-4o**: ~0.8-0.9 average reward (excellent debugging skills)\n- **GPT-4**: ~0.7-0.8 average reward (good debugging capabilities)\n- **Claude-3.5**: ~0.6-0.7 average reward (solid debugging skills)\n- **Smaller Models**: ~0.3-0.5 average reward (basic debugging)\n\n### Technical Details\n\n- **Max Turns**: 10 (allows for thorough debugging process)\n- **Tool Integration**: Native function calling support\n- **Safety**: Sandboxed execution environment\n- **Timeout**: 5-second default execution limit\n- **Memory**: Efficient AST parsing and analysis\n\n### Installation\n\n```bash\n# Install the environment\nprime env install code_debugger\n\n# Or install locally\ncd environments/code_debugger\npip install -e .\n```\n\n### Contributing\n\nThis environment is designed to be extensible. You can:\n\n- Add new debugging tools\n- Create more complex test cases\n- Implement additional analysis features\n- Add support for other programming languages\n\n### Notes\n\n- Uses safe execution environment to prevent malicious code\n- Supports both simple and complex debugging scenarios\n- Designed for educational and evaluation purposes\n- Compatible with all major language models\n- Extensible architecture for custom tools and analysis\n","encoding":"utf-8","truncated":false,"total_bytes":5760},"status":null}