From dc051b861d4d0f20c673ad55ac93b2a43fa56fc4 Mon Sep 17 00:00:00 2001 From: Aleksey Korshuk <48794610+AlekseyKorshuk@users.noreply.github.com> Date: Tue, 23 Jan 2024 23:54:51 +0300 Subject: [PATCH] Update rlhf.md (#1178) [skip ci] --- docs/rlhf.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/rlhf.md b/docs/rlhf.md index 9957eb3a6..774c3992f 100644 --- a/docs/rlhf.md +++ b/docs/rlhf.md @@ -19,14 +19,14 @@ The various RL training methods are implemented in trl and wrapped via axolotl. #### DPO ```yaml -rl: true +rl: dpo datasets: - path: Intel/orca_dpo_pairs split: train - type: intel_apply_chatml + type: chatml.intel - path: argilla/ultrafeedback-binarized-preferences split: train - type: argilla_apply_chatml + type: chatml.argilla ``` #### IPO