diff --git a/.nojekyll b/.nojekyll index 3d627ddfd..a14f9e4e3 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -e905cd86 \ No newline at end of file +5da99456 \ No newline at end of file diff --git a/docs/dataset-formats/conversation.html b/docs/dataset-formats/conversation.html index f335a0565..55953191f 100644 --- a/docs/dataset-formats/conversation.html +++ b/docs/dataset-formats/conversation.html @@ -552,6 +552,19 @@ Important type: chat_template roles_to_train: train_on_eos: +
If you receive an error like “chat_template choice is tokenizer_default but tokenizer’s chat_template is null.”, it means the tokenizer does not have a default chat_template. Follow the examples below instead to set a custom chat_template.
gemma chat template to override the tokenizer_config.json’s chat template on OpenAI messages format, training on all assistant messages.+A: This is because of the mismatch between
tokenizer.eos_tokenand EOS/EOT token in template. Please make sure to seteos_tokenunderspecial_tokensto the same EOS/EOT token as in template.
Q: “chat_template choice is tokenizer_default but tokenizer’s chat_template is null. Please add a chat_template in tokenizer config”
+diff --git a/docs/reward_modelling.html b/docs/reward_modelling.html index f21cc0021..20248e1ca 100644 --- a/docs/reward_modelling.html +++ b/docs/reward_modelling.html @@ -491,22 +491,30 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin val_set_size: 0.1 eval_steps: 100 +A: This is because the tokenizer does not have a chat template. Please add a chat template in the tokenizer config. See chat_template for more details.
+
Bradley-Terry chat templates expect single-turn conversations in the following format:
+{
+ "system": "...", // optional
+ "input": "...",
+ "chosen": "...",
+ "rejected": "..."
+}Process reward models are trained using data which contains preference annotations for each step in a series of interactions. Typically, PRMs are trained to provide reward signals over each step of a reasoning trace and are used for downstream reinforcement learning.
-base_model: Qwen/Qwen2.5-3B
-model_type: AutoModelForTokenClassification
-num_labels: 2
-
-process_reward_model: true
-datasets:
- - path: trl-lib/math_shepherd
- type: stepwise_supervised
- split: train
-
-val_set_size: 0.1
-eval_steps: 100base_model: Qwen/Qwen2.5-3B
+model_type: AutoModelForTokenClassification
+num_labels: 2
+
+process_reward_model: true
+datasets:
+ - path: trl-lib/math_shepherd
+ type: stepwise_supervised
+ split: train
+
+val_set_size: 0.1
+eval_steps: 100Please see stepwise_supervised for more details on the dataset format.