From 52c83d30bf5df79763b1acca26d11073c79c9e9b Mon Sep 17 00:00:00 2001 From: Hamel Husain Date: Wed, 31 Jan 2024 17:27:35 -0500 Subject: [PATCH] Update rlhf.md (#1237) [skip ci] --- docs/rlhf.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/rlhf.md b/docs/rlhf.md index 3283880bd..9f5ba05fd 100644 --- a/docs/rlhf.md +++ b/docs/rlhf.md @@ -12,8 +12,8 @@ feedback. Various methods include, but not limited to: ### RLHF using Axolotl -[!IMPORTANT] -This is a BETA feature and many features are not fully implemented. You are encouraged to open new PRs to improve the integration and functionality. +>[!IMPORTANT] +>This is a BETA feature and many features are not fully implemented. You are encouraged to open new PRs to improve the integration and functionality. The various RL training methods are implemented in trl and wrapped via axolotl. Below are various examples with how you can use various preference datasets to train models that use ChatML