{"data":{"kind":"file","path":"README.md","version_id":"d7e3b0g6lv20d0ogkl02mgde","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":6370,"modified_at":"2025-08-30T23:14:52.720000","content_hash":"dc6ece0b37cbcea4ed1ccb4d2179715a84978b12d767bc9aeedcb6e5ac76948b"},"entries":[],"content":"# hud-vf-gym\r\n\r\n### Overview\r\n- **Environment ID**: `hud-vf-gym`\r\n- **Short description**: Generic adapter that bridges HUD's MCP infrastructure with the Verifiers framework\r\n- **Tags**: `adapter`, `mcp`, `CUA`\r\n\r\n### Datasets\r\n- **Primary dataset(s)**: Any HuggingFace dataset following HUD's task format\r\n- **Source links**: Configurable via `taskset` parameter (e.g., `hud-evals/2048-taskset`)\r\n- **Split sizes**: Depends on the loaded dataset\r\n\r\n### Task\r\n- **Type**: Configurable (single-turn or multi-turn, tool use)\r\n- **Parser**: ToolXMLParser with configurable thinking mode\r\n- **Rubric overview**: Weighted combination of task completion (from MCP evaluation), tool execution success, and format compliance\r\n\r\n### Quickstart\r\nRun an evaluation with any HUD-compatible taskset:\r\n\r\n```bash\r\nvf-eval hud-vf-gym \\\r\n  --env-args '{\"taskset\": \"your-org/your-taskset\", \"config_path\": \"./configs/your-env.yaml\"}' \\\r\n  --model gpt-4.1-mini \\\r\n```\r\n\r\nTrain an agent with GRPO (for text-based environments only):\r\n\r\n```python\r\nimport verifiers as vf\r\n\r\nenv = vf.load_environment(\r\n    env_id=\"hud-vf-gym\",\r\n    taskset=\"your-org/your-taskset\",\r\n    config_path=\"./configs/your-env.yaml\"\r\n)\r\n\r\nmodel, tokenizer = vf.get_model_and_tokenizer(\"Qwen/Qwen2.5-3B-Instruct\")\r\ntrainer = vf.GRPOTrainer(\r\n    model=model,\r\n    env=env,\r\n    args=vf.grpo_defaults()\r\n)\r\ntrainer.train()\r\n```\r\n\r\n### Environment Arguments\r\n| Arg | Type | Default | Description |\r\n| --- | ---- | ------- | ----------- |\r\n| `taskset` | str | (required) | HuggingFace dataset identifier |\r\n| `config_path` | str | (required) | Path to environment configuration YAML |\r\n| `num_tasks` | int | None | Optional limit on number of tasks to load |\r\n| `split` | str | `\"train\"` | Dataset split to use |\r\n| `max_turns` | int | (from config) | Override maximum turns per rollout |\r\n| `system_prompt` | str | (from config) | Override agent instructions |\r\n\r\n### Metrics\r\n| Metric | Meaning |\r\n| ------ | ------- |\r\n| `reward` | Weighted combination of all rubric components |\r\n| `task_completion` | Score from MCP evaluation tool (environment-specific) |\r\n| `tool_execution` | Ratio of successful tool calls to total attempts |\r\n| `format_compliance` | XML format correctness and action syntax validation |\r\n\r\n### How It Works\r\n\r\nHUD VF Gym is a generic adapter that enables any MCP-compatible environment to work with the Verifiers RL framework. It provides the bridge between HUD's infrastructure and Verifiers' training/evaluation capabilities.\r\n\r\n#### Components\r\n\r\n1. **Main Module** (`hud_vf_gym.py`):\r\n   - `load_environment()`: Loads tasks from HuggingFace datasets and creates HUDGym instance\r\n   - `HUDGym`: Extends Verifiers' `MultiTurnEnv` base class\r\n   - Manages Docker container lifecycle via MCP\r\n   - Handles multi-turn agent-environment interactions\r\n   - Integrates with HUD's telemetry and job tracking\r\n\r\n2. **MCP Integration** (`utils/mcp_utils.py`):\r\n   - `execute_tool()`: Universal tool execution through MCP protocol\r\n   - `create_action_args()`: Maps agent actions to MCP tool arguments\r\n   - Supports both direct MCP calls and action mapping transformations\r\n\r\n3. **Parsing System** (`utils/parsers.py`):\r\n   - `ToolXMLParser`: Validates XML-wrapped tool calls\r\n   - Extracts actions from `<tool>action(args)</tool>` format\r\n   - Configurable thinking mode with `<think>` tags\r\n   - Combines XML validation with action syntax checking\r\n\r\n4. **Reward System** (`utils/rubrics.py`):\r\n   - `HUDBaseRubric`: Configurable weighted reward function\r\n   - Components: task completion, tool execution, format compliance\r\n   - Task completion comes from MCP evaluation tool\r\n   - Tool execution tracks success rate\r\n   - Format compliance validates XML and action syntax\r\n\r\n#### Configuration System\r\n\r\nEnvironments are configured through YAML files that define:\r\n\r\n```yaml\r\n# Job tracking\r\njob:\r\n  name: \"Environment Run\"\r\n  metadata: {...}\r\n\r\n# Agent instructions\r\nsystem_prompt: |\r\n  Instructions and available tools...\r\n\r\n# Parser settings\r\nparser:\r\n  use_thinking: true/false  # Enable <think> tags\r\n  xml_weight: 0.6           # XML format importance\r\n  action_weight: 0.4        # Action syntax importance\r\n\r\n# Action mappings - the core of configuration\r\naction_mappings:\r\n  agent_action:             # What the agent calls\r\n    _tool: \"mcp_tool\"      # Underlying MCP tool\r\n    _parser:\r\n      positional: [\"arg1\"] # Expected arguments\r\n    param1:\r\n      from_arg: \"arg1\"     # Map from agent arg\r\n      transform: \"...\"     # Optional transform\r\n    param2:\r\n      static: \"value\"      # Static value\r\n\r\n# Rubric weights\r\nrubric:\r\n  weights:\r\n    task_completion: 0.8\r\n    tool_execution: 0.1\r\n    format_compliance: 0.1\r\n```\r\n\r\n#### Rollout Process\r\n\r\n1. **Initialization**:\r\n   - Create MCP client with Docker container\r\n   - Execute setup tools to prepare environment\r\n   - Append setup results to initial prompt\r\n\r\n2. **Multi-turn Loop**:\r\n   - Agent generates XML-wrapped tool call\r\n   - Parser extracts and validates action\r\n   - Action mappings transform to MCP tool call\r\n   - MCP executes tool, returns results\r\n   - Results sent back to agent\r\n   - Continue until done or max turns\r\n\r\n3. **Evaluation**:\r\n   - Execute evaluation tools\r\n   - Compute rewards based on rubric\r\n   - Clean up MCP resources\r\n\r\n#### Key Features\r\n\r\n- **Config-Driven**: No code changes needed for new environments\r\n- **Action Mapping**: Declarative transformation from agent to MCP tools\r\n- **Multimodal Support**: Handles text and image observations\r\n- **Job Tracking**: [Automatic HUD telemetry integration](https://app.hud.so)\r\n\r\n### Creating Custom Environments\r\n\r\nTo use hud-vf-gym with your own environment:\r\n\r\n1. **Create a Docker image** with a MCP server that implements your environment through HUD SDK\r\n2. **Define tasks** as a HuggingFace dataset with HUD format\r\n3. **Write a config YAML** with action mappings for your tools\r\n4. **Load and run**:\r\n   ```python\r\n   env = vf.load_environment(\r\n       env_id=\"hud-vf-gym\",\r\n       taskset=\"your-org/your-taskset\",\r\n       config_path=\"your-config.yaml\"\r\n   )\r\n   ```\r\n\r\n### Also See\r\n\r\n- [HUD Documentation](https://docs.hud.so)\r\n- [Verifiers Framework](https://github.com/willccbb/verifiers)\r\n- [HUD Python SDK](https://github.com/hud-evals/hud-python)\r\n- [Example Configs](https://github.com/hud-evals/hud-python/tree/main/rl/configs)","encoding":"utf-8","truncated":false,"total_bytes":6370},"status":null}