From 17f1e93c5f04c5884e149b2564dfd71fd473ac8c Mon Sep 17 00:00:00 2001 From: Quarto GHA Workflow Runner Date: Sat, 27 Apr 2024 16:08:00 +0000 Subject: [PATCH] Built site for gh-pages --- .nojekyll | 2 +- docs/dataset-formats/index.html | 10 ++++---- docs/rlhf.html | 2 +- search.json | 2 +- sitemap.xml | 44 ++++++++++++++++----------------- 5 files changed, 30 insertions(+), 30 deletions(-) diff --git a/.nojekyll b/.nojekyll index c3becc077..7e2ca9540 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -86a4e5fc \ No newline at end of file +5c3c9e0d \ No newline at end of file diff --git a/docs/dataset-formats/index.html b/docs/dataset-formats/index.html index b0788714d..8053ad4c6 100644 --- a/docs/dataset-formats/index.html +++ b/docs/dataset-formats/index.html @@ -351,7 +351,7 @@ Description - + Pre-training @@ -359,7 +359,7 @@ Description Data format for a pre-training completion task. - + Instruction Tuning @@ -367,7 +367,7 @@ Description Instruction tuning formats for supervised fine-tuning. - + Conversation @@ -375,7 +375,7 @@ Description Conversation format for supervised fine-tuning. - + Template-Free @@ -383,7 +383,7 @@ Description Construct prompts without a template. - + Custom Pre-Tokenized Dataset diff --git a/docs/rlhf.html b/docs/rlhf.html index b9e235697..95b18af6f 100644 --- a/docs/rlhf.html +++ b/docs/rlhf.html @@ -353,7 +353,7 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin chat_template: chatml datasets: - path: argilla/ultrafeedback-binarized-preferences-cleaned - type: orpo.chat_template + type: chat_template.argilla

Using local dataset files

diff --git a/search.json b/search.json index 1e78f639f..35c92dffb 100644 --- a/search.json +++ b/search.json @@ -150,7 +150,7 @@ "href": "docs/rlhf.html", "title": "RLHF (Beta)", "section": "", - "text": "Overview\nReinforcement Learning from Human Feedback is a method whereby a language model is optimized from data using human feedback. Various methods include, but not limited to:\n\nProximal Policy Optimization (PPO) (not yet supported in axolotl)\nDirect Preference Optimization (DPO)\nIdentity Preference Optimization (IPO)\n\n\n\nRLHF using Axolotl\n\n[!IMPORTANT] This is a BETA feature and many features are not fully implemented. You are encouraged to open new PRs to improve the integration and functionality.\n\nThe various RL training methods are implemented in trl and wrapped via axolotl. Below are various examples with how you can use various preference datasets to train models that use ChatML\n\nDPO\nrl: dpo\ndatasets:\n - path: Intel/orca_dpo_pairs\n split: train\n type: chatml.intel\n - path: argilla/ultrafeedback-binarized-preferences\n split: train\n type: chatml.argilla\n\n\nIPO\nrl: ipo\n\n\nORPO\nPaper: https://arxiv.org/abs/2403.07691\nrl: orpo\norpo_alpha: 0.1\nremove_unused_columns: false\n\nchat_template: chatml\ndatasets:\n - path: argilla/ultrafeedback-binarized-preferences-cleaned\n type: orpo.chat_template\n\n\nUsing local dataset files\ndatasets:\n - ds_type: json\n data_files:\n - orca_rlhf.jsonl\n split: train\n type: chatml.intel\n\n\nTrl autounwrap for peft\nTrl supports autounwrapping peft models, so that a ref model does not need to be additionally loaded, leading to less VRAM needed. This is on by default. To turn it off, pass the following config.\n# load ref model when adapter training.\nrl_adapter_ref_model: true", + "text": "Overview\nReinforcement Learning from Human Feedback is a method whereby a language model is optimized from data using human feedback. Various methods include, but not limited to:\n\nProximal Policy Optimization (PPO) (not yet supported in axolotl)\nDirect Preference Optimization (DPO)\nIdentity Preference Optimization (IPO)\n\n\n\nRLHF using Axolotl\n\n[!IMPORTANT] This is a BETA feature and many features are not fully implemented. You are encouraged to open new PRs to improve the integration and functionality.\n\nThe various RL training methods are implemented in trl and wrapped via axolotl. Below are various examples with how you can use various preference datasets to train models that use ChatML\n\nDPO\nrl: dpo\ndatasets:\n - path: Intel/orca_dpo_pairs\n split: train\n type: chatml.intel\n - path: argilla/ultrafeedback-binarized-preferences\n split: train\n type: chatml.argilla\n\n\nIPO\nrl: ipo\n\n\nORPO\nPaper: https://arxiv.org/abs/2403.07691\nrl: orpo\norpo_alpha: 0.1\nremove_unused_columns: false\n\nchat_template: chatml\ndatasets:\n - path: argilla/ultrafeedback-binarized-preferences-cleaned\n type: chat_template.argilla\n\n\nUsing local dataset files\ndatasets:\n - ds_type: json\n data_files:\n - orca_rlhf.jsonl\n split: train\n type: chatml.intel\n\n\nTrl autounwrap for peft\nTrl supports autounwrapping peft models, so that a ref model does not need to be additionally loaded, leading to less VRAM needed. This is on by default. To turn it off, pass the following config.\n# load ref model when adapter training.\nrl_adapter_ref_model: true", "crumbs": [ "How-To Guides", "RLHF (Beta)" diff --git a/sitemap.xml b/sitemap.xml index 391638071..1fb683179 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,90 +2,90 @@ https://OpenAccess-AI-Collective.github.io/axolotl/index.html - 2024-04-22T20:00:16.953Z + 2024-04-27T16:07:19.312Z https://OpenAccess-AI-Collective.github.io/axolotl/TODO.html - 2024-04-22T20:00:16.937Z + 2024-04-27T16:07:19.296Z https://OpenAccess-AI-Collective.github.io/axolotl/docs/multi-node.html - 2024-04-22T20:00:16.941Z + 2024-04-27T16:07:19.300Z https://OpenAccess-AI-Collective.github.io/axolotl/docs/rlhf.html - 2024-04-22T20:00:16.941Z + 2024-04-27T16:07:19.300Z https://OpenAccess-AI-Collective.github.io/axolotl/docs/nccl.html - 2024-04-22T20:00:16.941Z + 2024-04-27T16:07:19.300Z https://OpenAccess-AI-Collective.github.io/axolotl/docs/multipack.html - 2024-04-22T20:00:16.941Z + 2024-04-27T16:07:19.300Z https://OpenAccess-AI-Collective.github.io/axolotl/docs/dataset-formats/tokenized.html - 2024-04-22T20:00:16.937Z + 2024-04-27T16:07:19.300Z https://OpenAccess-AI-Collective.github.io/axolotl/docs/dataset-formats/inst_tune.html - 2024-04-22T20:00:16.937Z + 2024-04-27T16:07:19.300Z https://OpenAccess-AI-Collective.github.io/axolotl/docs/dataset-formats/conversation.html - 2024-04-22T20:00:16.937Z + 2024-04-27T16:07:19.300Z https://OpenAccess-AI-Collective.github.io/axolotl/docs/batch_vs_grad.html - 2024-04-22T20:00:16.937Z + 2024-04-27T16:07:19.300Z https://OpenAccess-AI-Collective.github.io/axolotl/docs/input_output.html - 2024-04-22T20:00:16.941Z + 2024-04-27T16:07:19.300Z https://OpenAccess-AI-Collective.github.io/axolotl/docs/faq.html - 2024-04-22T20:00:16.937Z + 2024-04-27T16:07:19.300Z https://OpenAccess-AI-Collective.github.io/axolotl/docs/dataset_preprocessing.html - 2024-04-22T20:00:16.937Z + 2024-04-27T16:07:19.300Z https://OpenAccess-AI-Collective.github.io/axolotl/docs/dataset-formats/template_free.html - 2024-04-22T20:00:16.937Z + 2024-04-27T16:07:19.300Z https://OpenAccess-AI-Collective.github.io/axolotl/docs/dataset-formats/pretraining.html - 2024-04-22T20:00:16.937Z + 2024-04-27T16:07:19.300Z https://OpenAccess-AI-Collective.github.io/axolotl/docs/dataset-formats/index.html - 2024-04-22T20:00:16.937Z + 2024-04-27T16:07:19.300Z https://OpenAccess-AI-Collective.github.io/axolotl/docs/mac.html - 2024-04-22T20:00:16.941Z + 2024-04-27T16:07:19.300Z https://OpenAccess-AI-Collective.github.io/axolotl/docs/config.html - 2024-04-22T20:00:16.937Z + 2024-04-27T16:07:19.300Z https://OpenAccess-AI-Collective.github.io/axolotl/docs/debugging.html - 2024-04-22T20:00:16.937Z + 2024-04-27T16:07:19.300Z https://OpenAccess-AI-Collective.github.io/axolotl/docs/fsdp_qlora.html - 2024-04-22T20:00:16.941Z + 2024-04-27T16:07:19.300Z https://OpenAccess-AI-Collective.github.io/axolotl/examples/colab-notebooks/colab-axolotl-example.html - 2024-04-22T20:00:16.941Z + 2024-04-27T16:07:19.300Z https://OpenAccess-AI-Collective.github.io/axolotl/FAQS.html - 2024-04-22T20:00:16.937Z + 2024-04-27T16:07:19.296Z