{"data":{"kind":"file","path":"README.md","version_id":"ou4hgr0nllmsieiia8r6h8e7","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":5496,"modified_at":"2025-09-09T22:37:22.597000","content_hash":"8527ff672517fc6e683e6bb67ad0fbc5100a47afdcd3a78937184d20a16acdaa"},"entries":[],"content":"# reference-similarity\n\n### Overview\n- **Environment ID**: `reference-similarity`\n- **Short description**: Scores single-turn generations by similarity to reference texts with anti-echo and repetition penalties. CSV-schema agnostic with simple templating.\n- **Tags**: single-turn, textgen, similarity, cosine, rouge, jaccard, eval\n\n### Datasets\n- **Primary dataset(s)**: CSV inputs for articles/prompts and per-article reference texts. Optional per-reference score weights.\n- **Default file names**: `articles_rows.csv`, `refs_rows.csv`, `scores_rows.csv`\n- **Schema mapping**: configurable via the `schema` argument to map your column names.\n\nDefault schema keys (can be overridden):\n- articles: `articles_path`, `article_id` (group key), `article_title`, optional `article_prompt`\n- references: `refs_path`, `ref_group_key`, `ref_text`, optional `ref_id`\n- scores: `scores_path`, `score_join_key`, `score_value`\n\n### Task\n- **Type**: single-turn\n- **Parser**: `vf.Parser()` (default verifiers parser)\n- **Rubric overview**:\n  - `similarity_to_refs`: cosine (SBERT, optional), Jaccard, or ROUGE-L; aggregated by `max`, `topk`, or `softmax` with optional weights from scores CSV\n  - `anti_echo_badness`: penalizes copying the prompt (Jaccard-overlap or containment)\n  - `repetition_badness`: penalizes n-gram repetition and simple patterns\n  - Final reward is a weighted sum of channels\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval reference-similarity\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval reference-similarity \\\n  -m gpt-4.1-mini \\\n  -n 20 -r 3 -t 1024 -T 0.7 \\\n  -a '{\n        \"data_dir\": \"training/data\",\n        \"metric\": \"cosine\",\n        \"agg\": \"max\",\n        \"min_refs_per_group\": 3\n      }'\n```\n\nPass a custom schema and prompt template:\n\n```bash\nuv run vf-eval reference-similarity \\\n  -a '{\n        \"data_dir\": \"./my_data\",\n        \"schema\": {\n          \"articles_path\": \"articles.csv\",\n          \"article_id\": \"url\",\n          \"article_title\": \"title\",\n          \"refs_path\": \"references.csv\",\n          \"ref_group_key\": \"article_url\",\n          \"ref_text\": \"text\",\n          \"ref_id\": \"uri\",\n          \"scores_path\": \"scores.csv\",\n          \"score_join_key\": \"post_uri\",\n          \"score_value\": \"score\"\n        },\n        \"user_template\": \"Write a post under 300 chars for '{title}'. URL: {url}\",\n        \"metric\": \"rouge_l\",\n        \"agg\": \"topk\",\n        \"topk\": 3\n      }'\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `data_dir` | str | `\"training/data\"` | Directory containing your CSV files |\n| `schema` | mapping | see below | Maps column/file names for articles/refs/scores |\n| `system_prompt` | str | concise system prompt | System instruction shown to the model |\n| `user_template` | str? | template string | Jinja2 (if installed) or `str.format` fallback for prompts |\n| `sample_size` | int? | `null` | Size-biased sample by reference count (null = all) |\n| `min_refs_per_group` | int | `3` | Min references for “high” bucket in sampling |\n| `metric` | str | `\"cosine\"` | `cosine` (SBERT), `jaccard`, or `rouge_l` |\n| `agg` | str | `\"max\"` | Aggregation: `max`, `topk`, or `softmax` |\n| `topk` | int | `3` | K when `agg = topk` |\n| `tau` | float | `0.5` | Temperature when `agg = softmax` |\n| `length_budget` | int? | `300` | Reward scaling beyond this length (chars/words) |\n| `length_unit` | str | `\"chars\"` | `chars` or `words` |\n| `anti_echo_mode` | str | `\"jaccard\"` | `jaccard` or `containment` |\n| `normalizer_cfg` | mapping | `{lower, strip_urls, collapse_ws}` | Normalization flags used by all channels |\n| `use_sentence_transformers` | bool | `true` | Enable SBERT for cosine metric |\n| `st_model_name` | str | `\"sentence-transformers/all-MiniLM-L6-v2\"` | SBERT model id |\n| `device` | str | `\"cpu\"` | Device for embeddings |\n| `preembed_refs` | bool | `false` | Precompute reference embeddings into dataset rows |\n| `channel_weights` | mapping | `{similarity:1.0, anti_echo:-0.2, repetition:-0.1}` | Weights for rubric channels |\n\nSchema keys (defaults shown):\n- Articles: `articles_path`=`\"articles_rows.csv\"`, `article_id`=`\"url\"`, `article_title`=`\"title\"`, `article_prompt`=`null`\n- Refs: `refs_path`=`\"refs_rows.csv\"` (compatible with historical `skeets_rows.csv`), `ref_group_key`=`\"article_url\"`, `ref_text`=`\"text\"`, `ref_id`=`\"uri\"`\n- Scores: `scores_path`=`\"scores_rows.csv\"`, `score_join_key`=`\"post_uri\"`, `score_value`=`\"score\"`\n\nCSV format examples (minimal):\n\n- articles_rows.csv\n  ```csv\n  url,title\n  https://example.com/a,First article\n  https://example.com/b,Second article\n  ```\n\n- refs_rows.csv\n  ```csv\n  article_url,text,uri\n  https://example.com/a,\"A helpful reference snippet.\",ref-1\n  https://example.com/a,\"Another relevant snippet.\",ref-2\n  https://example.com/b,\"Context for B.\",ref-3\n  ```\n\n- scores_rows.csv (optional)\n  ```csv\n  post_uri,score\n  ref-1,0.9\n  ref-2,0.7\n  ref-3,0.8\n  ```\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Weighted sum of channels (clipped to [0, 1]) |\n| `similarity` | Aggregated similarity to references in [0, 1] |\n| `anti_echo` | Prompt-copy penalty in [0, 1] (negatively weighted) |\n| `repetition` | Repetition penalty in [0, 1] (negatively weighted) |\n\n### Evaluation Reports\nEvaluation reports generated by `vf-eval` will appear here when published.\n","encoding":"utf-8","truncated":false,"total_bytes":5496},"status":null}