{"data":{"kind":"file","path":"README.md","version_id":"d6najy1kp9jrhaw2lhlk8z1p","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2619,"modified_at":"2025-09-08T15:00:21.596000","content_hash":"82d26c6cb9eedbdce282bbc85b3a5e037b880b7c9dea3b458b245a85eb05fd89"},"entries":[],"content":"# semantic\n\n### Overview\n- **Environment ID**: `semantic`\n- **Short description**: Semantic is a multi-turn reinforcement learning environment inspired by the popular Cemantix game. At the start of each episode, the environment samples a target word in a given language. The agent must iteratively guess the word, receiving feedback in the form of semantic similarity scores computed via an embedding model. This setup encourages discovery through exploration, supports multilingual evaluation, and promotes synonym and semantic grounding. Unlike classic next-token prediction, the environment provides a masked semantic objective, making it suitable for distillation-style training of large language models (akin to a BERT-like signal) within an RL framework. The RL env offer a huge flexibility and poor reward hacking (number of turns, similarity models, input words, ...).\n- **Tags**: semantic, cemantix, multiturn, BERT, xml\n\n### Datasets\n- **Primary dataset(s)**: wordfreq\n- **Source links**: https://github.com/rspeer/wordfreq/\n- **Split sizes**: No split because sampled at runtime\n\n### Task\n- **Type**: multi-turn (game interaction)\n- **Parser**: XMLParser with guess\n- **Rubric overview**: At each turn, the agent receives a similarity-based reward (0 for format errors, ∈[0,1] otherwise, and 2 for an exact match), scaled by an exponential decay with the turn number and normalized by the maximum achievable reward.\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval semantix\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval semantix   -m gpt-4.1-mini   -n 20 -r 3 -t 1024 -T 0.7   -a '{\"model_name\": \"paraphrase-multilingual-MiniLM-L12-v2\",\"lang\":\"en\",\"dataset_len\":10000}'  \n\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n\n### Environment Arguments\nDocument any supported environment arguments and their meaning. Example:\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `model_name` | str | `\"paraphrase-multilingual-MiniLM-L12-v2\"` | Semantic similarity model |\n| `lang` | str | `\"en\"` | Language for words sampling and guessing |\n| `dataset_len` | int | `10000` | Number of sampled words in the dataset |\n\n### Metrics\nSummarize key metrics your rubric emits and how they’re interpreted.\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | At each turn, the agent receives a similarity-based reward (0 for format errors, ∈[0,1] otherwise, and 2 for an exact match), scaled by an exponential decay with the turn number and normalized by the maximum achievable reward |\n\n","encoding":"utf-8","truncated":false,"total_bytes":2619},"status":null}