{"data":{"kind":"file","path":"README.md","version_id":"hpscc7plwbawua0k01t48nng","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2011,"modified_at":"2025-09-10T10:13:34.992000","content_hash":"5b27007b7f15ea349595edafc94ea6ab9fd4d444ed1392c20b6e395e6ead5947"},"entries":[],"content":"# code-suggestion\r\n\r\n### Overview\r\n- **Environment ID**: `code-suggestion`\r\n- **Short description**: TypeScript code completion environment that trains models to generate code snippets based on context and recent edits\r\n- **Tags**: code-completion, typescript, instruct-tuning\r\n\r\n### Datasets\r\n- **Primary dataset(s)**: continuedev/instinct-data - TypeScript code completion dataset with context and target completions\r\n- **Source links**: https://huggingface.co/datasets/continuedev/instinct-data\r\n- **Split sizes**: train_typescript and test_typescript splits (exact counts loaded dynamically)\r\n\r\n### Task\r\n- **Type**: single-turn\r\n- **Parser**: XMLParser with \"thinking\" and \"answer\" fields\r\n- **Rubric overview**: Exact match (50%), token accuracy (20%), length penalty (10%), format validation (20%)\r\n\r\n### Quickstart\r\nRun an evaluation with default settings:\r\n\r\n```bash\r\nuv run vf-eval code-suggestion\r\n```\r\n\r\nConfigure model and sampling:\r\n\r\n```bash\r\nuv run vf-eval code-suggestion   -m gpt-4.1-mini   -n 20 -r 3 -t 1024 -T 0.7    # env-specific args as JSON\r\n```\r\n\r\nNotes:\r\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\r\n\r\n### Environment Arguments\r\nThe environment accepts kwargs but currently doesn't use any specific arguments:\r\n\r\n| Arg | Type | Default | Description |\r\n| --- | ---- | ------- | ----------- |\r\n| `**kwargs` | dict | `{}` | Additional configuration (currently unused) |\r\n\r\n### Metrics\r\nKey metrics emitted by the rubric and their interpretation:\r\n\r\n| Metric | Meaning |\r\n| ------ | ------- |\r\n| `reward` | Main scalar reward (weighted sum of all criteria) |\r\n| `exact_match_reward` | 1.0 for perfect match, 0.0 otherwise |\r\n| `token_accuracy_reward` | Partial credit based on token-level similarity using SequenceMatcher |\r\n| `length_penalty_reward` | Penalizes responses that are too long/short (0.0 for extreme ratios, 1.0 for reasonable length) |\r\n| `format_reward_func` | Validates proper XML formatting with thinking and answer tags |\r\n\r\n","encoding":"utf-8","truncated":false,"total_bytes":2011},"status":null}