{"data":{"kind":"file","path":"README.md","version_id":"rel7nucfn7n92ed3e7tf9p4l","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":9196,"modified_at":"2025-09-18T10:54:47.913000","content_hash":"360ee3664f74330dd0251d6a1e63e79042b1e2f8493a4286cf5a0fd2a775038b"},"entries":[],"content":"# Mathematical Reasoning Environment\n\n### Overview\n- **Environment ID**: `math_reasoning`\n- **Short description**: Advanced mathematical reasoning environment with symbolic computation, statistical analysis, graph theory, and proof verification capabilities.\n- **Tags**: mathematics, symbolic-computation, statistics, graph-theory, proof-verification, calculus, algebra\n\n### Features\nThis environment provides comprehensive mathematical reasoning tools:\n\n- **🧮 Symbolic Mathematics**: Solve equations, calculate derivatives and integrals with SymPy\n- **📊 Statistical Analysis**: Descriptive statistics, correlation analysis, and data interpretation\n- **🔢 Linear Algebra**: Matrix operations, determinants, inverses, and transformations\n- **📈 Graph Theory**: Connectivity analysis, shortest paths, and graph properties\n- **🔢 Combinatorics**: Permutations, combinations, and counting problems\n- **✅ Proof Verification**: Mathematical proof step validation and logical reasoning\n\n### Tools Available\n\n#### 1. `solve_equation(equation: str, variable: str = \"x\")`\nSolves mathematical equations symbolically.\n\n**Parameters:**\n- `equation`: Mathematical equation as string (e.g., \"x**2 - 4 = 0\")\n- `variable`: Variable to solve for\n\n**Returns:**\n- Solutions, solution type, and equation details\n\n#### 2. `calculate_derivative(expression: str, variable: str = \"x\", order: int = 1)`\nCalculates derivatives of mathematical expressions.\n\n**Parameters:**\n- `expression`: Mathematical expression as string\n- `variable`: Variable to differentiate with respect to\n- `order`: Order of derivative (1st, 2nd, etc.)\n\n**Returns:**\n- Derivative expression and simplified form\n\n#### 3. `calculate_integral(expression: str, variable: str = \"x\", limits: Optional[List[float]] = None)`\nCalculates integrals of mathematical expressions.\n\n**Parameters:**\n- `expression`: Mathematical expression as string\n- `variable`: Variable to integrate with respect to\n- `limits`: Integration limits for definite integral\n\n**Returns:**\n- Integral expression and numerical value (if definite)\n\n#### 4. `matrix_operations(matrix_a: List[List[float]], matrix_b: Optional[List[List[float]]] = None, operation: str = \"determinant\")`\nPerforms matrix operations.\n\n**Parameters:**\n- `matrix_a`: First matrix as list of lists\n- `matrix_b`: Second matrix for binary operations\n- `operation`: Operation type (determinant, inverse, transpose, multiply, add, subtract)\n\n**Returns:**\n- Operation result and result matrix\n\n#### 5. `statistical_analysis(data: List[float], analysis_type: str = \"descriptive\")`\nPerforms statistical analysis on data.\n\n**Parameters:**\n- `data`: List of numerical data points\n- `analysis_type`: Type of analysis (descriptive, correlation)\n\n**Returns:**\n- Statistical measures and analysis results\n\n#### 6. `graph_theory_analysis(edges: List[List[int]], analysis_type: str = \"connectivity\")`\nPerforms graph theory analysis.\n\n**Parameters:**\n- `edges`: List of edges as [node1, node2] pairs\n- `analysis_type`: Type of analysis (connectivity, shortest_path)\n\n**Returns:**\n- Graph properties and analysis results\n\n#### 7. `combinatorial_analysis(n: int, k: int, analysis_type: str = \"permutation\")`\nPerforms combinatorial analysis.\n\n**Parameters:**\n- `n`: Total number of items\n- `k`: Number of items to choose/arrange\n- `analysis_type`: Type of analysis (permutation, combination, factorial)\n\n**Returns:**\n- Combinatorial result and formula\n\n#### 8. `verify_proof(statement: str, proof_steps: List[str])`\nVerifies mathematical proof steps.\n\n**Parameters:**\n- `statement`: Mathematical statement to prove\n- `proof_steps`: List of proof steps\n\n**Returns:**\n- Proof verification results and validity assessment\n\n### Quickstart\n\n#### Basic Evaluation\n```bash\nuv run vf-eval math_reasoning -m gpt-4o-mini -n 5 -r 2\n```\n\n#### With Custom Parameters\n```bash\nuv run vf-eval math_reasoning \\\n  -m gpt-4o-mini \\\n  -n 10 -r 3 \\\n  -a '{\"max_turns\": 12}'\n```\n\n### Example Problems\n\nThe environment includes various mathematical challenges:\n\n#### 1. Quadratic Equation Solving\n```python\n# Problem: Solve x² - 5x + 6 = 0\n# Expected: x = 2 or x = 3\n# Tools: solve_equation\n```\n\n#### 2. Calculus - Derivatives\n```python\n# Problem: Find derivative of f(x) = x³ + 2x² - 5x + 1\n# Expected: f'(x) = 3x² + 4x - 5\n# Tools: calculate_derivative\n```\n\n#### 3. Linear Algebra - Matrix Operations\n```python\n# Problem: Find determinant of [[2, 3], [1, 4]]\n# Expected: Determinant = 5\n# Tools: matrix_operations\n```\n\n#### 4. Statistics - Data Analysis\n```python\n# Problem: Analyze [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]\n# Expected: Mean = 5.5, Std Dev ≈ 3.03\n# Tools: statistical_analysis\n```\n\n#### 5. Graph Theory - Connectivity\n```python\n# Problem: Check connectivity of graph with edges [(1,2), (2,3), (3,4), (4,1)]\n# Expected: Graph is connected\n# Tools: graph_theory_analysis\n```\n\n#### 6. Combinatorics - Counting\n```python\n# Problem: Calculate ways to arrange 5 books on a shelf\n# Expected: 5! = 120 arrangements\n# Tools: combinatorial_analysis\n```\n\n### Evaluation Criteria\n\nThe environment evaluates mathematical reasoning based on:\n\n- **Equation Solving** (15%): Successfully solving mathematical equations\n- **Calculus Operations** (15%): Calculating derivatives and integrals\n- **Matrix Operations** (15%): Performing linear algebra operations\n- **Statistical Analysis** (15%): Analyzing data and calculating statistics\n- **Graph Theory** (10%): Analyzing graph properties and connectivity\n- **Combinatorics** (10%): Solving counting and arrangement problems\n- **Proof Verification** (10%): Validating mathematical proofs\n- **Integral Calculus** (10%): Advanced calculus operations\n\n### Use Cases\n\n- **🎓 Educational Assessment**: Evaluate mathematical problem-solving skills\n- **🔬 Research Applications**: Test mathematical reasoning in AI systems\n- **📚 Curriculum Development**: Create mathematical learning materials\n- **🧮 Problem Solving**: Train models on mathematical problem-solving\n- **📊 Data Analysis**: Evaluate statistical reasoning capabilities\n- **🔍 Proof Verification**: Test logical reasoning and proof validation\n\n### Advanced Features\n\n#### Symbolic Computation\n- Exact symbolic solutions using SymPy\n- Support for complex mathematical expressions\n- Automatic simplification and formatting\n- Support for multiple variables and functions\n\n#### Numerical Analysis\n- High-precision numerical computations\n- Matrix operations with NumPy\n- Statistical calculations with SciPy\n- Error handling and validation\n\n#### Graph Theory\n- Connectivity analysis using DFS/BFS\n- Shortest path algorithms\n- Graph property detection\n- Cycle detection and analysis\n\n#### Proof Verification\n- Step-by-step proof validation\n- Logical reasoning assessment\n- Mathematical argument evaluation\n- Proof structure analysis\n\n### Performance Expectations\n\n- **GPT-4o**: ~0.8-0.9 average reward (excellent mathematical reasoning)\n- **GPT-4**: ~0.7-0.8 average reward (good mathematical capabilities)\n- **Claude-3.5**: ~0.6-0.7 average reward (solid mathematical skills)\n- **Specialized Math Models**: ~0.7-0.9 average reward (domain-specific strength)\n- **Smaller Models**: ~0.3-0.5 average reward (basic mathematical operations)\n\n### Technical Details\n\n- **Max Turns**: 15 (allows for complex multi-step problems)\n- **Tool Integration**: Native function calling support\n- **Libraries**: SymPy, NumPy, SciPy for mathematical computations\n- **Precision**: High-precision symbolic and numerical calculations\n- **Error Handling**: Comprehensive error detection and reporting\n\n### Dependencies\n\n- `sympy`: Symbolic mathematics library\n- `numpy`: Numerical computing library\n- `scipy`: Scientific computing library\n- `verifiers`: Core environment framework\n\n### Installation\n\n```bash\n# Install the environment\nprime env install math_reasoning\n\n# Or install locally\ncd environments/math_reasoning\npip install -e .\n```\n\n### Mathematical Domains Covered\n\n#### Algebra\n- Linear equations and systems\n- Quadratic equations\n- Polynomial operations\n- Rational expressions\n\n#### Calculus\n- Derivatives and differentiation\n- Integrals and integration\n- Limits and continuity\n- Optimization problems\n\n#### Linear Algebra\n- Matrix operations\n- Determinants and inverses\n- Eigenvalues and eigenvectors\n- Vector spaces\n\n#### Statistics\n- Descriptive statistics\n- Probability distributions\n- Correlation and regression\n- Hypothesis testing\n\n#### Graph Theory\n- Graph connectivity\n- Shortest path algorithms\n- Graph coloring\n- Network analysis\n\n#### Combinatorics\n- Permutations and combinations\n- Counting principles\n- Probability calculations\n- Arrangement problems\n\n### Contributing\n\nThis environment is designed to be extensible. You can:\n\n- Add new mathematical domains (geometry, number theory, etc.)\n- Implement additional proof verification methods\n- Create more complex problem types\n- Add support for additional mathematical libraries\n\n### Notes\n\n- Uses SymPy for exact symbolic computation\n- Supports both symbolic and numerical solutions\n- Comprehensive error handling for edge cases\n- Designed for educational and research purposes\n- Compatible with all major language models\n- Extensible architecture for custom mathematical tools\n","encoding":"utf-8","truncated":false,"total_bytes":9196},"status":null}