diff --git a/.nojekyll b/.nojekyll
index d0b3740b2..483cc7e42 100644
--- a/.nojekyll
+++ b/.nojekyll
@@ -1 +1 @@
-644e3464
\ No newline at end of file
+e6983b1a
\ No newline at end of file
diff --git a/docs/rlhf.html b/docs/rlhf.html
index 292c2ed30..bd8143cfa 100644
--- a/docs/rlhf.html
+++ b/docs/rlhf.html
@@ -994,7 +994,7 @@ Important
vllm_server_host : 0.0.0.0
vllm_server_port : 8000
vllm_server_timeout : 300
-
CUDA_VISIBLE_DEVICES = 2,3 axolotl vllm_serve grpo.yaml
+CUDA_VISIBLE_DEVICES = 2,3 axolotl vllm-serve grpo.yaml
Your vLLM instance will now attempt to spin up, and it’s time to kick off training utilizing our remaining two GPUs. In another terminal, execute:
CUDA_VISIBLE_DEVICES = 0,1 axolotl train grpo.yaml --num-processes 2
diff --git a/search.json b/search.json
index a149b0691..b8d463506 100644
--- a/search.json
+++ b/search.json
@@ -2857,7 +2857,7 @@
"href": "docs/rlhf.html#rlhf-using-axolotl",
"title": "RLHF (Beta)",
"section": "RLHF using Axolotl",
- "text": "RLHF using Axolotl\n\n\n\n\n\n\nImportant\n\n\n\nThis is a BETA feature and many features are not fully implemented. You are encouraged to open new PRs to improve the integration and functionality.\n\n\nWe rely on the TRL library for implementations of various RL training methods, which we wrap around to expose in axolotl. Each method has their own supported ways of loading datasets and prompt formats.\n\n\n\n\n\n\nTip\n\n\n\nYou can find what each method supports by going into src/axolotl/prompt_strategies/{method} where {method} is one of our supported methods. The type: can be retrieved from {method}.{function_name}.\n\n\n\nDPO\nExample config:\nrl: dpo\ndatasets:\n - path: Intel/orca_dpo_pairs\n split: train\n type: chatml.intel\n - path: argilla/ultrafeedback-binarized-preferences\n split: train\n type: chatml\nDPO supports the following types with the following dataset format:\n\nchatml.argilla\n{\n \"system\": \"...\", // optional\n \"instruction\": \"...\",\n \"chosen_response\": \"...\",\n \"rejected_response\": \"...\"\n}\n\n\nchatml.argilla_chat\n{\n \"chosen\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ],\n \"rejected\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\nchatml.icr\n{\n \"system\": \"...\", // optional\n \"input\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\nchatml.intel\n{\n \"system\": \"...\", // optional\n \"question\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\nchatml.prompt_pairs\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\nchatml.ultra\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"chosen\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ],\n \"rejected\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\nllama3.argilla\n{\n \"system\": \"...\", // optional\n \"instruction\": \"...\",\n \"chosen_response\": \"...\",\n \"rejected_response\": \"...\"\n}\n\n\nllama3.argilla_chat\n{\n \"chosen\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ],\n \"rejected\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\nllama3.icr\n{\n \"system\": \"...\", // optional\n \"input\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\nllama3.intel\n{\n \"system\": \"...\", // optional\n \"question\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\nllama3.prompt_pairs\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\nllama3.ultra\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"chosen\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ],\n \"rejected\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\nzephyr.nectar\n{\n \"prompt\": \"...\",\n \"answers\": [\n {\n \"answer\": \"...\",\n \"rank\": 1\n },\n {\n \"answer\": \"...\",\n \"rank\": 2\n }\n // ... more answers with ranks\n ]\n}\n\n\nchat_template.default\nrl: dpo\ndatasets:\n - path: ...\n split: train\n type: chat_template.default\n field_messages: \"messages\"\n field_chosen: \"chosen\"\n field_rejected: \"rejected\"\n message_property_mappings:\n role: role\n content: content\n roles:\n user: [\"user\"]\n assistant: [\"assistant\"]\n system: [\"system\"]\nSample input format:\n{\n \"messages\": [\n {\n \"role\": \"system\",\n \"content\": \"...\"\n },\n {\n \"role\": \"user\",\n \"content\": \"...\"\n },\n // ... more messages\n ],\n \"chosen\": {\n \"role\": \"assistant\",\n \"content\": \"...\"\n },\n \"rejected\": {\n \"role\": \"assistant\",\n \"content\": \"...\"\n }\n}\n\n\nuser_defined.default\nFor custom behaviors,\nrl: dpo\ndatasets:\n - path: ...\n split: train\n type: user_defined.default\n\n field_prompt: \"prompt\"\n field_system: \"system\"\n field_chosen: \"chosen\"\n field_rejected: \"rejected\"\n prompt_format: \"{prompt}\"\n chosen_format: \"{chosen}\"\n rejected_format: \"{rejected}\"\nThe input format is a simple JSON input with customizable fields based on the above config.\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\n\nIPO\nAs IPO is just DPO with a different loss function, all supported dataset formats for DPO are also supported for IPO.\nrl: ipo\n\n\nORPO\nPaper: https://arxiv.org/abs/2403.07691\nrl: orpo\norpo_alpha: 0.1\nremove_unused_columns: false\n\nchat_template: chatml\ndatasets:\n - path: argilla/ultrafeedback-binarized-preferences-cleaned\n type: chat_template.argilla\nORPO supports the following types with the following dataset format:\n\nchat_template.argilla\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\", // if available, will be taken as user message for single-turn instead of from list below\n\n // chosen/rejected should be same till last content and only even-number of alternating user/assistant turns\n \"chosen\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ],\n \"rejected\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\n\nKTO\nrl: kto\nrl_beta: 0.1 # default\nkto_desirable_weight: 1.0 # default\nkto_undesirable_weight: 1.0 # default\n\nremove_unused_columns: false\n\ndatasets:\n - path: argilla/ultrafeedback-binarized-preferences-cleaned-kto\n type: llama3.ultra\n split: train\n\ngradient_checkpointing: true\ngradient_checkpointing_kwargs:\n use_reentrant: true\nKTO supports the following types with the following dataset format:\n\nchatml.argilla\n{\n \"system\": \"...\", // optional\n \"instruction\": \"...\",\n \"completion\": \"...\"\n}\n\n\nchatml.argilla_chat\n{\n \"chosen\": [\n {\"role\": \"user\", \"content\": \"...\"}\n ],\n \"completion\": [\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\nchatml.intel\n{\n \"system\": \"...\", // optional\n \"question\": \"...\",\n \"completion\": \"...\"\n}\n\n\nchatml.prompt_pairs\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"completion\": \"...\"\n}\n\n\nchatml.ultra\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"completion\": \"...\"\n}\n\n\nllama3.argilla\n{\n \"system\": \"...\", // optional\n \"instruction\": \"...\",\n \"completion\": \"...\"\n}\n\n\nllama3.argilla_chat\n{\n \"completion\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\nllama3.intel\n{\n \"system\": \"...\", // optional\n \"question\": \"...\",\n \"completion\": \"...\"\n}\n\n\nllama3.prompt_pairs\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"completion\": \"...\"\n}\n\n\nllama3.ultra\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"completion\": \"...\"\n}\n\n\nuser_defined.default\nFor custom behaviors,\nrl: kto\ndatasets:\n - path: ...\n split: train\n type: user_defined.default\n\n field_prompt: \"prompt\"\n field_system: \"system\"\n field_completion: \"completion\"\n field_label: \"label\"\n prompt_format: \"{prompt}\"\n completion_format: \"{completion}\"\nThe input format is a simple JSON input with customizable fields based on the above config.\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"completion\": \"...\",\n \"label\": \"...\"\n}\n\n\n\nGRPO\n\n\n\n\n\n\nTip\n\n\n\nCheck out our GRPO cookbook.\n\n\nIf you have multiple GPUs available, we reccomend using vLLM with the GRPOTrainer to significantly speedup trajectory generation during training.\nFirst, launch a vLLM server using trl vllm-serve - you may use a config file or CLI overrides to configure your vLLM server. In this example, we’re\nusing 4 GPUs - 2 for training, and 2 for vLLM:\n\n\n\n\n\n\nImportant\n\n\n\nMake sure you’ve installed the correct version of vLLM by including it as an extra when installing axolotl, e.g. pip install axolotl[vllm].\n\n\nbase_model: Qwen/Qwen2.5-1.5B-Instruct\n\nvllm:\n host: 0.0.0.0\n port: 8000\n tensor_parallel_size: 2\n gpu_memory_utilization: 0.85\n dtype: auto\n # max_model_len: # you may find it useful to set the vLLM model context length if you know this beforehand\n\nrl: grpo\ntrl:\n use_vllm: true\n vllm_server_host: 0.0.0.0\n vllm_server_port: 8000\n vllm_server_timeout: 300\nCUDA_VISIBLE_DEVICES=2,3 axolotl vllm_serve grpo.yaml\nYour vLLM instance will now attempt to spin up, and it’s time to kick off training utilizing our remaining two GPUs. In another terminal, execute:\nCUDA_VISIBLE_DEVICES=0,1 axolotl train grpo.yaml --num-processes 2\n\nReward functions\nGRPO uses custom reward functions and transformations. Please have them ready locally.\nFor example, to load OpenAI’s GSM8K and use a random reward for completions:\n# rewards.py\nimport random\n\ndef rand_reward_func(completions, **kwargs) -> list[float]:\n return [random.uniform(0, 1) for _ in completions]\n\ndef oai_gsm8k_transform(cfg, *args, **kwargs):\n def transform_fn(example, tokenizer=None):\n label = example[\"answer\"].split(\"####\")[-1].strip().replace(\",\", \"\")\n return {\n \"prompt\": [{\"role\": \"user\", \"content\": example[\"question\"]},],\n \"answer\": label,\n }\n return transform_fn, {\"remove_columns\": [\"question\"]}\nrl: grpo\n\ntrl:\n beta: 0.001\n max_completion_length: 256\n use_vllm: True\n num_generations: 4\n reward_funcs: [\"rewards.rand_reward_func\"] # format: '{file_name}.{fn_name}'\n reward_weights: [1.0]\ndatasets:\n - path: openai/gsm8k\n name: main\n type: rewards.oai_gsm8k_transform # format: '{file_name}.{fn_name}'\nTo see other examples of custom reward functions, please see TRL GRPO Docs.\nTo see description of the configs, please see TRLConfig.\n\n\n\nSimPO\nSimPO uses CPOTrainer but with alternative loss function.\nrl: simpo\nrl_beta: 0.1 # default in CPOTrainer\ncpo_alpha: 1.0 # default in CPOTrainer\nsimpo_gamma: 0.5 # default in CPOTrainer\nThis method uses the same dataset format as DPO.\n\n\nUsing local dataset files\ndatasets:\n - ds_type: json\n data_files:\n - orca_rlhf.jsonl\n split: train\n type: chatml.intel\n\n\nTRL auto-unwrapping for PEFT\nTRL supports auto-unwrapping PEFT models for RL training paradigms which rely on a reference model. This significantly reduces memory pressure as an additional refreference model does not need to be loaded, and reference model log-probabilities can be obtained by disabling PEFT adapters. This is enabled by default. To turn it off, pass the following config:\n# load ref model when adapter training.\nrl_adapter_ref_model: true",
+ "text": "RLHF using Axolotl\n\n\n\n\n\n\nImportant\n\n\n\nThis is a BETA feature and many features are not fully implemented. You are encouraged to open new PRs to improve the integration and functionality.\n\n\nWe rely on the TRL library for implementations of various RL training methods, which we wrap around to expose in axolotl. Each method has their own supported ways of loading datasets and prompt formats.\n\n\n\n\n\n\nTip\n\n\n\nYou can find what each method supports by going into src/axolotl/prompt_strategies/{method} where {method} is one of our supported methods. The type: can be retrieved from {method}.{function_name}.\n\n\n\nDPO\nExample config:\nrl: dpo\ndatasets:\n - path: Intel/orca_dpo_pairs\n split: train\n type: chatml.intel\n - path: argilla/ultrafeedback-binarized-preferences\n split: train\n type: chatml\nDPO supports the following types with the following dataset format:\n\nchatml.argilla\n{\n \"system\": \"...\", // optional\n \"instruction\": \"...\",\n \"chosen_response\": \"...\",\n \"rejected_response\": \"...\"\n}\n\n\nchatml.argilla_chat\n{\n \"chosen\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ],\n \"rejected\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\nchatml.icr\n{\n \"system\": \"...\", // optional\n \"input\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\nchatml.intel\n{\n \"system\": \"...\", // optional\n \"question\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\nchatml.prompt_pairs\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\nchatml.ultra\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"chosen\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ],\n \"rejected\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\nllama3.argilla\n{\n \"system\": \"...\", // optional\n \"instruction\": \"...\",\n \"chosen_response\": \"...\",\n \"rejected_response\": \"...\"\n}\n\n\nllama3.argilla_chat\n{\n \"chosen\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ],\n \"rejected\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\nllama3.icr\n{\n \"system\": \"...\", // optional\n \"input\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\nllama3.intel\n{\n \"system\": \"...\", // optional\n \"question\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\nllama3.prompt_pairs\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\nllama3.ultra\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"chosen\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ],\n \"rejected\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\nzephyr.nectar\n{\n \"prompt\": \"...\",\n \"answers\": [\n {\n \"answer\": \"...\",\n \"rank\": 1\n },\n {\n \"answer\": \"...\",\n \"rank\": 2\n }\n // ... more answers with ranks\n ]\n}\n\n\nchat_template.default\nrl: dpo\ndatasets:\n - path: ...\n split: train\n type: chat_template.default\n field_messages: \"messages\"\n field_chosen: \"chosen\"\n field_rejected: \"rejected\"\n message_property_mappings:\n role: role\n content: content\n roles:\n user: [\"user\"]\n assistant: [\"assistant\"]\n system: [\"system\"]\nSample input format:\n{\n \"messages\": [\n {\n \"role\": \"system\",\n \"content\": \"...\"\n },\n {\n \"role\": \"user\",\n \"content\": \"...\"\n },\n // ... more messages\n ],\n \"chosen\": {\n \"role\": \"assistant\",\n \"content\": \"...\"\n },\n \"rejected\": {\n \"role\": \"assistant\",\n \"content\": \"...\"\n }\n}\n\n\nuser_defined.default\nFor custom behaviors,\nrl: dpo\ndatasets:\n - path: ...\n split: train\n type: user_defined.default\n\n field_prompt: \"prompt\"\n field_system: \"system\"\n field_chosen: \"chosen\"\n field_rejected: \"rejected\"\n prompt_format: \"{prompt}\"\n chosen_format: \"{chosen}\"\n rejected_format: \"{rejected}\"\nThe input format is a simple JSON input with customizable fields based on the above config.\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\n\nIPO\nAs IPO is just DPO with a different loss function, all supported dataset formats for DPO are also supported for IPO.\nrl: ipo\n\n\nORPO\nPaper: https://arxiv.org/abs/2403.07691\nrl: orpo\norpo_alpha: 0.1\nremove_unused_columns: false\n\nchat_template: chatml\ndatasets:\n - path: argilla/ultrafeedback-binarized-preferences-cleaned\n type: chat_template.argilla\nORPO supports the following types with the following dataset format:\n\nchat_template.argilla\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\", // if available, will be taken as user message for single-turn instead of from list below\n\n // chosen/rejected should be same till last content and only even-number of alternating user/assistant turns\n \"chosen\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ],\n \"rejected\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\n\nKTO\nrl: kto\nrl_beta: 0.1 # default\nkto_desirable_weight: 1.0 # default\nkto_undesirable_weight: 1.0 # default\n\nremove_unused_columns: false\n\ndatasets:\n - path: argilla/ultrafeedback-binarized-preferences-cleaned-kto\n type: llama3.ultra\n split: train\n\ngradient_checkpointing: true\ngradient_checkpointing_kwargs:\n use_reentrant: true\nKTO supports the following types with the following dataset format:\n\nchatml.argilla\n{\n \"system\": \"...\", // optional\n \"instruction\": \"...\",\n \"completion\": \"...\"\n}\n\n\nchatml.argilla_chat\n{\n \"chosen\": [\n {\"role\": \"user\", \"content\": \"...\"}\n ],\n \"completion\": [\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\nchatml.intel\n{\n \"system\": \"...\", // optional\n \"question\": \"...\",\n \"completion\": \"...\"\n}\n\n\nchatml.prompt_pairs\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"completion\": \"...\"\n}\n\n\nchatml.ultra\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"completion\": \"...\"\n}\n\n\nllama3.argilla\n{\n \"system\": \"...\", // optional\n \"instruction\": \"...\",\n \"completion\": \"...\"\n}\n\n\nllama3.argilla_chat\n{\n \"completion\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\nllama3.intel\n{\n \"system\": \"...\", // optional\n \"question\": \"...\",\n \"completion\": \"...\"\n}\n\n\nllama3.prompt_pairs\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"completion\": \"...\"\n}\n\n\nllama3.ultra\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"completion\": \"...\"\n}\n\n\nuser_defined.default\nFor custom behaviors,\nrl: kto\ndatasets:\n - path: ...\n split: train\n type: user_defined.default\n\n field_prompt: \"prompt\"\n field_system: \"system\"\n field_completion: \"completion\"\n field_label: \"label\"\n prompt_format: \"{prompt}\"\n completion_format: \"{completion}\"\nThe input format is a simple JSON input with customizable fields based on the above config.\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"completion\": \"...\",\n \"label\": \"...\"\n}\n\n\n\nGRPO\n\n\n\n\n\n\nTip\n\n\n\nCheck out our GRPO cookbook.\n\n\nIf you have multiple GPUs available, we reccomend using vLLM with the GRPOTrainer to significantly speedup trajectory generation during training.\nFirst, launch a vLLM server using trl vllm-serve - you may use a config file or CLI overrides to configure your vLLM server. In this example, we’re\nusing 4 GPUs - 2 for training, and 2 for vLLM:\n\n\n\n\n\n\nImportant\n\n\n\nMake sure you’ve installed the correct version of vLLM by including it as an extra when installing axolotl, e.g. pip install axolotl[vllm].\n\n\nbase_model: Qwen/Qwen2.5-1.5B-Instruct\n\nvllm:\n host: 0.0.0.0\n port: 8000\n tensor_parallel_size: 2\n gpu_memory_utilization: 0.85\n dtype: auto\n # max_model_len: # you may find it useful to set the vLLM model context length if you know this beforehand\n\nrl: grpo\ntrl:\n use_vllm: true\n vllm_server_host: 0.0.0.0\n vllm_server_port: 8000\n vllm_server_timeout: 300\nCUDA_VISIBLE_DEVICES=2,3 axolotl vllm-serve grpo.yaml\nYour vLLM instance will now attempt to spin up, and it’s time to kick off training utilizing our remaining two GPUs. In another terminal, execute:\nCUDA_VISIBLE_DEVICES=0,1 axolotl train grpo.yaml --num-processes 2\n\nReward functions\nGRPO uses custom reward functions and transformations. Please have them ready locally.\nFor example, to load OpenAI’s GSM8K and use a random reward for completions:\n# rewards.py\nimport random\n\ndef rand_reward_func(completions, **kwargs) -> list[float]:\n return [random.uniform(0, 1) for _ in completions]\n\ndef oai_gsm8k_transform(cfg, *args, **kwargs):\n def transform_fn(example, tokenizer=None):\n label = example[\"answer\"].split(\"####\")[-1].strip().replace(\",\", \"\")\n return {\n \"prompt\": [{\"role\": \"user\", \"content\": example[\"question\"]},],\n \"answer\": label,\n }\n return transform_fn, {\"remove_columns\": [\"question\"]}\nrl: grpo\n\ntrl:\n beta: 0.001\n max_completion_length: 256\n use_vllm: True\n num_generations: 4\n reward_funcs: [\"rewards.rand_reward_func\"] # format: '{file_name}.{fn_name}'\n reward_weights: [1.0]\ndatasets:\n - path: openai/gsm8k\n name: main\n type: rewards.oai_gsm8k_transform # format: '{file_name}.{fn_name}'\nTo see other examples of custom reward functions, please see TRL GRPO Docs.\nTo see description of the configs, please see TRLConfig.\n\n\n\nSimPO\nSimPO uses CPOTrainer but with alternative loss function.\nrl: simpo\nrl_beta: 0.1 # default in CPOTrainer\ncpo_alpha: 1.0 # default in CPOTrainer\nsimpo_gamma: 0.5 # default in CPOTrainer\nThis method uses the same dataset format as DPO.\n\n\nUsing local dataset files\ndatasets:\n - ds_type: json\n data_files:\n - orca_rlhf.jsonl\n split: train\n type: chatml.intel\n\n\nTRL auto-unwrapping for PEFT\nTRL supports auto-unwrapping PEFT models for RL training paradigms which rely on a reference model. This significantly reduces memory pressure as an additional refreference model does not need to be loaded, and reference model log-probabilities can be obtained by disabling PEFT adapters. This is enabled by default. To turn it off, pass the following config:\n# load ref model when adapter training.\nrl_adapter_ref_model: true",
"crumbs": [
"How To Guides",
"RLHF (Beta)"
diff --git a/sitemap.xml b/sitemap.xml
index fa1e7103d..c0d089edf 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,682 +2,682 @@
https://docs.axolotl.ai/examples/colab-notebooks/colab-axolotl-example.html
- 2025-04-10T05:34:37.181Z
+ 2025-04-10T15:33:22.548Z
https://docs.axolotl.ai/docs/dataset-formats/stepwise_supervised.html
- 2025-04-10T05:34:37.177Z
+ 2025-04-10T15:33:22.544Z
https://docs.axolotl.ai/docs/dataset-formats/template_free.html
- 2025-04-10T05:34:37.177Z
+ 2025-04-10T15:33:22.544Z
https://docs.axolotl.ai/docs/dataset-formats/tokenized.html
- 2025-04-10T05:34:37.177Z
+ 2025-04-10T15:33:22.544Z
https://docs.axolotl.ai/docs/nccl.html
- 2025-04-10T05:34:37.180Z
+ 2025-04-10T15:33:22.547Z
https://docs.axolotl.ai/docs/amd_hpc.html
- 2025-04-10T05:34:37.176Z
+ 2025-04-10T15:33:22.543Z
https://docs.axolotl.ai/docs/config.html
- 2025-04-10T05:34:37.176Z
+ 2025-04-10T15:33:22.543Z
https://docs.axolotl.ai/docs/multi-gpu.html
- 2025-04-10T05:34:37.180Z
+ 2025-04-10T15:33:22.547Z
https://docs.axolotl.ai/docs/installation.html
- 2025-04-10T05:34:37.180Z
+ 2025-04-10T15:33:22.547Z
https://docs.axolotl.ai/docs/torchao.html
- 2025-04-10T05:34:37.180Z
+ 2025-04-10T15:33:22.547Z
https://docs.axolotl.ai/docs/reward_modelling.html
- 2025-04-10T05:34:37.180Z
+ 2025-04-10T15:33:22.547Z
https://docs.axolotl.ai/docs/input_output.html
- 2025-04-10T05:34:37.180Z
+ 2025-04-10T15:33:22.547Z
https://docs.axolotl.ai/docs/multimodal.html
- 2025-04-10T05:34:37.180Z
+ 2025-04-10T15:33:22.547Z
https://docs.axolotl.ai/docs/api/utils.callbacks.mlflow_.html
- 2025-04-10T05:35:05.591Z
+ 2025-04-10T15:33:52.397Z
https://docs.axolotl.ai/docs/api/monkeypatch.trainer_fsdp_optim.html
- 2025-04-10T05:35:05.184Z
+ 2025-04-10T15:33:51.991Z
https://docs.axolotl.ai/docs/api/monkeypatch.data.batch_dataset_fetcher.html
- 2025-04-10T05:35:05.200Z
+ 2025-04-10T15:33:52.008Z
https://docs.axolotl.ai/docs/api/prompt_strategies.stepwise_supervised.html
- 2025-04-10T05:35:04.886Z
+ 2025-04-10T15:33:51.697Z
https://docs.axolotl.ai/docs/api/monkeypatch.mistral_attn_hijack_flash.html
- 2025-04-10T05:35:05.130Z
+ 2025-04-10T15:33:51.937Z
https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.user_defined.html
- 2025-04-10T05:35:04.933Z
+ 2025-04-10T15:33:51.743Z
https://docs.axolotl.ai/docs/api/integrations.liger.args.html
- 2025-04-10T05:35:05.507Z
+ 2025-04-10T15:33:52.315Z
https://docs.axolotl.ai/docs/api/utils.schemas.training.html
- 2025-04-10T05:35:05.371Z
+ 2025-04-10T15:33:52.178Z
https://docs.axolotl.ai/docs/api/datasets.html
- 2025-04-10T05:35:04.376Z
+ 2025-04-10T15:33:51.197Z
https://docs.axolotl.ai/docs/api/kernels.geglu.html
- 2025-04-10T05:35:05.069Z
+ 2025-04-10T15:33:51.877Z
https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_flash.html
- 2025-04-10T05:35:05.114Z
+ 2025-04-10T15:33:51.921Z
https://docs.axolotl.ai/docs/api/cli.sweeps.html
- 2025-04-10T05:35:04.718Z
+ 2025-04-10T15:33:51.531Z
https://docs.axolotl.ai/docs/api/utils.freeze.html
- 2025-04-10T05:35:05.272Z
+ 2025-04-10T15:33:52.080Z
https://docs.axolotl.ai/docs/api/monkeypatch.multipack.html
- 2025-04-10T05:35:05.131Z
+ 2025-04-10T15:33:51.939Z
https://docs.axolotl.ai/docs/api/cli.main.html
- 2025-04-10T05:35:04.611Z
+ 2025-04-10T15:33:51.427Z
https://docs.axolotl.ai/docs/api/core.trainers.trl.html
- 2025-04-10T05:35:04.795Z
+ 2025-04-10T15:33:51.607Z
https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.passthrough.html
- 2025-04-10T05:35:04.935Z
+ 2025-04-10T15:33:51.744Z
https://docs.axolotl.ai/docs/api/core.chat.format.llama3x.html
- 2025-04-10T05:35:04.565Z
+ 2025-04-10T15:33:51.382Z
https://docs.axolotl.ai/docs/api/core.datasets.transforms.chat_builder.html
- 2025-04-10T05:35:04.580Z
+ 2025-04-10T15:33:51.396Z
https://docs.axolotl.ai/docs/api/prompt_strategies.kto.user_defined.html
- 2025-04-10T05:35:04.952Z
+ 2025-04-10T15:33:51.762Z
https://docs.axolotl.ai/docs/api/utils.collators.mamba.html
- 2025-04-10T05:35:05.563Z
+ 2025-04-10T15:33:52.370Z
https://docs.axolotl.ai/docs/api/integrations.base.html
- 2025-04-10T05:35:05.492Z
+ 2025-04-10T15:33:52.300Z
https://docs.axolotl.ai/docs/api/utils.bench.html
- 2025-04-10T05:35:05.264Z
+ 2025-04-10T15:33:52.072Z
https://docs.axolotl.ai/docs/api/kernels.swiglu.html
- 2025-04-10T05:35:05.079Z
+ 2025-04-10T15:33:51.887Z
https://docs.axolotl.ai/docs/api/core.chat.format.shared.html
- 2025-04-10T05:35:04.567Z
+ 2025-04-10T15:33:51.384Z
https://docs.axolotl.ai/docs/api/integrations.cut_cross_entropy.args.html
- 2025-04-10T05:35:05.496Z
+ 2025-04-10T15:33:52.303Z
https://docs.axolotl.ai/docs/api/core.datasets.chat.html
- 2025-04-10T05:35:04.572Z
+ 2025-04-10T15:33:51.389Z
https://docs.axolotl.ai/docs/api/utils.callbacks.lisa.html
- 2025-04-10T05:35:05.587Z
+ 2025-04-10T15:33:52.394Z
https://docs.axolotl.ai/docs/api/integrations.grokfast.optimizer.html
- 2025-04-10T05:35:05.497Z
+ 2025-04-10T15:33:52.304Z
https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_chat.html
- 2025-04-10T05:35:04.835Z
+ 2025-04-10T15:33:51.647Z
https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_instruct.html
- 2025-04-10T05:35:04.837Z
+ 2025-04-10T15:33:51.648Z
https://docs.axolotl.ai/docs/api/prompt_strategies.kto.chatml.html
- 2025-04-10T05:35:04.951Z
+ 2025-04-10T15:33:51.760Z
https://docs.axolotl.ai/docs/api/utils.schemas.integrations.html
- 2025-04-10T05:35:05.418Z
+ 2025-04-10T15:33:52.224Z
https://docs.axolotl.ai/docs/api/utils.schemas.trl.html
- 2025-04-10T05:35:05.400Z
+ 2025-04-10T15:33:52.207Z
https://docs.axolotl.ai/docs/api/prompt_tokenizers.html
- 2025-04-10T05:35:04.432Z
+ 2025-04-10T15:33:51.252Z
https://docs.axolotl.ai/docs/api/utils.data.sft.html
- 2025-04-10T05:35:05.348Z
+ 2025-04-10T15:33:52.155Z
https://docs.axolotl.ai/docs/api/utils.schedulers.html
- 2025-04-10T05:35:05.314Z
+ 2025-04-10T15:33:52.121Z
https://docs.axolotl.ai/docs/api/utils.chat_templates.html
- 2025-04-10T05:35:05.247Z
+ 2025-04-10T15:33:52.054Z
https://docs.axolotl.ai/docs/api/utils.models.html
- 2025-04-10T05:35:05.231Z
+ 2025-04-10T15:33:52.038Z
https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chatml.html
- 2025-04-10T05:35:04.930Z
+ 2025-04-10T15:33:51.740Z
https://docs.axolotl.ai/docs/api/utils.distributed.html
- 2025-04-10T05:35:05.334Z
+ 2025-04-10T15:33:52.141Z
https://docs.axolotl.ai/docs/api/monkeypatch.utils.html
- 2025-04-10T05:35:05.172Z
+ 2025-04-10T15:33:51.980Z
https://docs.axolotl.ai/docs/api/utils.schemas.utils.html
- 2025-04-10T05:35:05.430Z
+ 2025-04-10T15:33:52.236Z
https://docs.axolotl.ai/docs/api/monkeypatch.llama_expand_mask.html
- 2025-04-10T05:35:05.139Z
+ 2025-04-10T15:33:51.947Z
https://docs.axolotl.ai/docs/api/common.datasets.html
- 2025-04-10T05:35:05.533Z
+ 2025-04-10T15:33:52.340Z
https://docs.axolotl.ai/docs/api/logging_config.html
- 2025-04-10T05:35:04.437Z
+ 2025-04-10T15:33:51.257Z
https://docs.axolotl.ai/docs/api/kernels.quantize.html
- 2025-04-10T05:35:05.086Z
+ 2025-04-10T15:33:51.894Z
https://docs.axolotl.ai/docs/api/monkeypatch.llama_patch_multipack.html
- 2025-04-10T05:35:05.175Z
+ 2025-04-10T15:33:51.983Z
https://docs.axolotl.ai/docs/api/utils.schemas.model.html
- 2025-04-10T05:35:05.366Z
+ 2025-04-10T15:33:52.173Z
https://docs.axolotl.ai/docs/api/monkeypatch.stablelm_attn_hijack_flash.html
- 2025-04-10T05:35:05.181Z
+ 2025-04-10T15:33:51.988Z
https://docs.axolotl.ai/docs/api/monkeypatch.mixtral.html
- 2025-04-10T05:35:05.202Z
+ 2025-04-10T15:33:52.009Z
https://docs.axolotl.ai/docs/api/utils.tokenization.html
- 2025-04-10T05:35:05.237Z
+ 2025-04-10T15:33:52.044Z
https://docs.axolotl.ai/docs/api/integrations.kd.trainer.html
- 2025-04-10T05:35:05.504Z
+ 2025-04-10T15:33:52.311Z
https://docs.axolotl.ai/docs/api/utils.schemas.datasets.html
- 2025-04-10T05:35:05.388Z
+ 2025-04-10T15:33:52.195Z
https://docs.axolotl.ai/docs/api/utils.collators.core.html
- 2025-04-10T05:35:05.535Z
+ 2025-04-10T15:33:52.343Z
https://docs.axolotl.ai/docs/api/monkeypatch.btlm_attn_hijack_flash.html
- 2025-04-10T05:35:05.174Z
+ 2025-04-10T15:33:51.981Z
https://docs.axolotl.ai/docs/api/utils.optimizers.adopt.html
- 2025-04-10T05:35:05.345Z
+ 2025-04-10T15:33:52.152Z
https://docs.axolotl.ai/docs/api/prompt_strategies.input_output.html
- 2025-04-10T05:35:04.882Z
+ 2025-04-10T15:33:51.692Z
https://docs.axolotl.ai/docs/api/index.html
- 2025-04-10T05:35:04.297Z
+ 2025-04-10T15:33:51.119Z
https://docs.axolotl.ai/docs/api/cli.cloud.modal_.html
- 2025-04-10T05:35:04.764Z
+ 2025-04-10T15:33:51.576Z
https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.llama3.html
- 2025-04-10T05:35:04.920Z
+ 2025-04-10T15:33:51.730Z
https://docs.axolotl.ai/docs/api/cli.train.html
- 2025-04-10T05:35:04.620Z
+ 2025-04-10T15:33:51.436Z
https://docs.axolotl.ai/docs/api/core.trainer_builder.html
- 2025-04-10T05:35:04.452Z
+ 2025-04-10T15:33:51.272Z
https://docs.axolotl.ai/docs/api/utils.callbacks.perplexity.html
- 2025-04-10T05:35:05.582Z
+ 2025-04-10T15:33:52.389Z
https://docs.axolotl.ai/docs/getting-started.html
- 2025-04-10T05:34:37.177Z
+ 2025-04-10T15:33:22.544Z
https://docs.axolotl.ai/docs/dataset_loading.html
- 2025-04-10T05:34:37.177Z
+ 2025-04-10T15:33:22.544Z
https://docs.axolotl.ai/docs/batch_vs_grad.html
- 2025-04-10T05:34:37.176Z
+ 2025-04-10T15:33:22.543Z
https://docs.axolotl.ai/docs/faq.html
- 2025-04-10T05:34:37.177Z
+ 2025-04-10T15:33:22.544Z
https://docs.axolotl.ai/docs/debugging.html
- 2025-04-10T05:34:37.177Z
+ 2025-04-10T15:33:22.544Z
https://docs.axolotl.ai/docs/lr_groups.html
- 2025-04-10T05:34:37.180Z
+ 2025-04-10T15:33:22.547Z
https://docs.axolotl.ai/TODO.html
- 2025-04-10T05:34:37.175Z
+ 2025-04-10T15:33:22.542Z
https://docs.axolotl.ai/src/axolotl/integrations/LICENSE.html
- 2025-04-10T05:34:37.196Z
+ 2025-04-10T15:33:22.563Z
https://docs.axolotl.ai/index.html
- 2025-04-10T05:34:37.193Z
+ 2025-04-10T15:33:22.560Z
https://docs.axolotl.ai/src/axolotl/integrations/cut_cross_entropy/ACKNOWLEDGEMENTS.html
- 2025-04-10T05:34:37.196Z
+ 2025-04-10T15:33:22.563Z
https://docs.axolotl.ai/FAQS.html
- 2025-04-10T05:34:37.175Z
+ 2025-04-10T15:33:22.542Z
https://docs.axolotl.ai/docs/multi-node.html
- 2025-04-10T05:34:37.180Z
+ 2025-04-10T15:33:22.547Z
https://docs.axolotl.ai/docs/sequence_parallelism.html
- 2025-04-10T05:34:37.180Z
+ 2025-04-10T15:33:22.547Z
https://docs.axolotl.ai/docs/multipack.html
- 2025-04-10T05:34:37.180Z
+ 2025-04-10T15:33:22.547Z
https://docs.axolotl.ai/docs/inference.html
- 2025-04-10T05:34:37.180Z
+ 2025-04-10T15:33:22.547Z
https://docs.axolotl.ai/docs/lora_optims.html
- 2025-04-10T05:34:37.180Z
+ 2025-04-10T15:33:22.547Z
https://docs.axolotl.ai/docs/api/utils.lora_embeddings.html
- 2025-04-10T05:35:05.255Z
+ 2025-04-10T15:33:52.063Z
https://docs.axolotl.ai/docs/api/kernels.utils.html
- 2025-04-10T05:35:05.088Z
+ 2025-04-10T15:33:51.896Z
https://docs.axolotl.ai/docs/api/prompt_strategies.chat_template.html
- 2025-04-10T05:35:04.822Z
+ 2025-04-10T15:33:51.633Z
https://docs.axolotl.ai/docs/api/convert.html
- 2025-04-10T05:35:04.390Z
+ 2025-04-10T15:33:51.211Z
https://docs.axolotl.ai/docs/api/common.const.html
- 2025-04-10T05:35:05.517Z
+ 2025-04-10T15:33:52.324Z
https://docs.axolotl.ai/docs/api/cli.cloud.base.html
- 2025-04-10T05:35:04.757Z
+ 2025-04-10T15:33:51.570Z
https://docs.axolotl.ai/docs/api/monkeypatch.relora.html
- 2025-04-10T05:35:05.138Z
+ 2025-04-10T15:33:51.946Z
https://docs.axolotl.ai/docs/api/utils.lora.html
- 2025-04-10T05:35:05.252Z
+ 2025-04-10T15:33:52.059Z
https://docs.axolotl.ai/docs/api/cli.merge_lora.html
- 2025-04-10T05:35:04.692Z
+ 2025-04-10T15:33:51.506Z
https://docs.axolotl.ai/docs/api/prompt_strategies.bradley_terry.llama3.html
- 2025-04-10T05:35:04.976Z
+ 2025-04-10T15:33:51.785Z
https://docs.axolotl.ai/docs/api/cli.merge_sharded_fsdp_weights.html
- 2025-04-10T05:35:04.704Z
+ 2025-04-10T15:33:51.517Z
https://docs.axolotl.ai/docs/api/integrations.spectrum.args.html
- 2025-04-10T05:35:05.514Z
+ 2025-04-10T15:33:52.321Z
https://docs.axolotl.ai/docs/api/models.mamba.modeling_mamba.html
- 2025-04-10T05:35:05.534Z
+ 2025-04-10T15:33:52.341Z
https://docs.axolotl.ai/docs/api/common.architectures.html
- 2025-04-10T05:35:05.515Z
+ 2025-04-10T15:33:52.323Z
https://docs.axolotl.ai/docs/api/utils.trainer.html
- 2025-04-10T05:35:05.289Z
+ 2025-04-10T15:33:52.096Z
https://docs.axolotl.ai/docs/api/utils.callbacks.comet_.html
- 2025-04-10T05:35:05.594Z
+ 2025-04-10T15:33:52.401Z
https://docs.axolotl.ai/docs/api/cli.vllm_serve.html
- 2025-04-10T05:35:04.754Z
+ 2025-04-10T15:33:51.567Z
https://docs.axolotl.ai/docs/api/utils.schemas.multimodal.html
- 2025-04-10T05:35:05.406Z
+ 2025-04-10T15:33:52.212Z
https://docs.axolotl.ai/docs/api/utils.gradient_checkpointing.unsloth.html
- 2025-04-10T05:35:05.351Z
+ 2025-04-10T15:33:52.158Z
https://docs.axolotl.ai/docs/api/core.trainers.base.html
- 2025-04-10T05:35:04.778Z
+ 2025-04-10T15:33:51.590Z
https://docs.axolotl.ai/docs/api/monkeypatch.unsloth_.html
- 2025-04-10T05:35:05.192Z
+ 2025-04-10T15:33:51.999Z
https://docs.axolotl.ai/docs/api/utils.samplers.multipack.html
- 2025-04-10T05:35:05.576Z
+ 2025-04-10T15:33:52.382Z
https://docs.axolotl.ai/docs/api/utils.callbacks.profiler.html
- 2025-04-10T05:35:05.586Z
+ 2025-04-10T15:33:52.392Z
https://docs.axolotl.ai/docs/api/integrations.lm_eval.args.html
- 2025-04-10T05:35:05.510Z
+ 2025-04-10T15:33:52.318Z
https://docs.axolotl.ai/docs/api/utils.data.pretraining.html
- 2025-04-10T05:35:05.346Z
+ 2025-04-10T15:33:52.153Z
https://docs.axolotl.ai/docs/api/evaluate.html
- 2025-04-10T05:35:04.369Z
+ 2025-04-10T15:33:51.190Z
https://docs.axolotl.ai/docs/api/utils.dict.html
- 2025-04-10T05:35:05.337Z
+ 2025-04-10T15:33:52.145Z
https://docs.axolotl.ai/docs/api/cli.utils.html
- 2025-04-10T05:35:04.749Z
+ 2025-04-10T15:33:51.562Z
https://docs.axolotl.ai/docs/api/prompt_strategies.pygmalion.html
- 2025-04-10T05:35:04.904Z
+ 2025-04-10T15:33:51.714Z
https://docs.axolotl.ai/docs/api/core.training_args.html
- 2025-04-10T05:35:04.539Z
+ 2025-04-10T15:33:51.357Z
https://docs.axolotl.ai/docs/api/cli.inference.html
- 2025-04-10T05:35:04.684Z
+ 2025-04-10T15:33:51.498Z
https://docs.axolotl.ai/docs/api/kernels.lora.html
- 2025-04-10T05:35:05.058Z
+ 2025-04-10T15:33:51.867Z
https://docs.axolotl.ai/docs/api/cli.evaluate.html
- 2025-04-10T05:35:04.628Z
+ 2025-04-10T15:33:51.444Z
https://docs.axolotl.ai/docs/api/utils.collators.batching.html
- 2025-04-10T05:35:05.559Z
+ 2025-04-10T15:33:52.366Z
https://docs.axolotl.ai/docs/api/prompt_strategies.completion.html
- 2025-04-10T05:35:04.876Z
+ 2025-04-10T15:33:51.686Z
https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.zephyr.html
- 2025-04-10T05:35:04.932Z
+ 2025-04-10T15:33:51.741Z
https://docs.axolotl.ai/docs/api/prompt_strategies.metharme.html
- 2025-04-10T05:35:04.893Z
+ 2025-04-10T15:33:51.704Z
https://docs.axolotl.ai/docs/api/prompt_strategies.orpo.chat_template.html
- 2025-04-10T05:35:04.973Z
+ 2025-04-10T15:33:51.782Z
https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_w_system.html
- 2025-04-10T05:35:04.849Z
+ 2025-04-10T15:33:51.660Z
https://docs.axolotl.ai/docs/api/utils.model_shard_quant.html
- 2025-04-10T05:35:05.261Z
+ 2025-04-10T15:33:52.068Z
https://docs.axolotl.ai/docs/api/cli.config.html
- 2025-04-10T05:35:04.669Z
+ 2025-04-10T15:33:51.484Z
https://docs.axolotl.ai/docs/api/utils.schemas.enums.html
- 2025-04-10T05:35:05.425Z
+ 2025-04-10T15:33:52.230Z
https://docs.axolotl.ai/docs/api/cli.preprocess.html
- 2025-04-10T05:35:04.712Z
+ 2025-04-10T15:33:51.525Z
https://docs.axolotl.ai/docs/api/core.chat.messages.html
- 2025-04-10T05:35:04.562Z
+ 2025-04-10T15:33:51.379Z
https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chat_template.html
- 2025-04-10T05:35:04.910Z
+ 2025-04-10T15:33:51.720Z
https://docs.axolotl.ai/docs/api/utils.schemas.peft.html
- 2025-04-10T05:35:05.397Z
+ 2025-04-10T15:33:52.204Z
https://docs.axolotl.ai/docs/api/train.html
- 2025-04-10T05:35:04.358Z
+ 2025-04-10T15:33:51.180Z
https://docs.axolotl.ai/docs/api/prompt_strategies.messages.chat.html
- 2025-04-10T05:35:04.908Z
+ 2025-04-10T15:33:51.718Z
https://docs.axolotl.ai/docs/api/prompt_strategies.orcamini.html
- 2025-04-10T05:35:04.897Z
+ 2025-04-10T15:33:51.707Z
https://docs.axolotl.ai/docs/api/utils.collators.mm_chat.html
- 2025-04-10T05:35:05.568Z
+ 2025-04-10T15:33:52.375Z
https://docs.axolotl.ai/docs/api/prompt_strategies.kto.llama3.html
- 2025-04-10T05:35:04.943Z
+ 2025-04-10T15:33:51.752Z
https://docs.axolotl.ai/docs/api/monkeypatch.attention.mllama.html
- 2025-04-10T05:35:05.199Z
+ 2025-04-10T15:33:52.006Z
https://docs.axolotl.ai/docs/api/cli.checks.html
- 2025-04-10T05:35:04.652Z
+ 2025-04-10T15:33:51.467Z
https://docs.axolotl.ai/docs/api/monkeypatch.transformers_fa_utils.html
- 2025-04-10T05:35:05.191Z
+ 2025-04-10T15:33:51.998Z
https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_xformers.html
- 2025-04-10T05:35:05.115Z
+ 2025-04-10T15:33:51.923Z
https://docs.axolotl.ai/docs/api/core.trainers.dpo.trainer.html
- 2025-04-10T05:35:04.802Z
+ 2025-04-10T15:33:51.614Z
https://docs.axolotl.ai/docs/api/prompt_strategies.user_defined.html
- 2025-04-10T05:35:04.857Z
+ 2025-04-10T15:33:51.668Z
https://docs.axolotl.ai/docs/api/cli.args.html
- 2025-04-10T05:35:04.645Z
+ 2025-04-10T15:33:51.460Z
https://docs.axolotl.ai/docs/api/prompt_strategies.llama2_chat.html
- 2025-04-10T05:35:04.870Z
+ 2025-04-10T15:33:51.681Z
https://docs.axolotl.ai/docs/api/utils.schemas.config.html
- 2025-04-10T05:35:05.359Z
+ 2025-04-10T15:33:52.166Z
https://docs.axolotl.ai/docs/api/core.trainers.grpo.trainer.html
- 2025-04-10T05:35:04.805Z
+ 2025-04-10T15:33:51.617Z
https://docs.axolotl.ai/docs/api/core.chat.format.chatml.html
- 2025-04-10T05:35:04.564Z
+ 2025-04-10T15:33:51.381Z
https://docs.axolotl.ai/docs/api/monkeypatch.lora_kernels.html
- 2025-04-10T05:35:05.164Z
+ 2025-04-10T15:33:51.972Z
https://docs.axolotl.ai/docs/api/prompt_strategies.base.html
- 2025-04-10T05:35:04.807Z
+ 2025-04-10T15:33:51.618Z
https://docs.axolotl.ai/docs/rlhf.html
- 2025-04-10T05:34:37.180Z
+ 2025-04-10T15:33:22.547Z
https://docs.axolotl.ai/docs/cli.html
- 2025-04-10T05:34:37.176Z
+ 2025-04-10T15:33:22.543Z
https://docs.axolotl.ai/docs/unsloth.html
- 2025-04-10T05:34:37.180Z
+ 2025-04-10T15:33:22.547Z
https://docs.axolotl.ai/docs/fsdp_qlora.html
- 2025-04-10T05:34:37.177Z
+ 2025-04-10T15:33:22.544Z
https://docs.axolotl.ai/docs/dataset_preprocessing.html
- 2025-04-10T05:34:37.177Z
+ 2025-04-10T15:33:22.544Z
https://docs.axolotl.ai/docs/custom_integrations.html
- 2025-04-10T05:34:37.176Z
+ 2025-04-10T15:33:22.544Z
https://docs.axolotl.ai/docs/mac.html
- 2025-04-10T05:34:37.180Z
+ 2025-04-10T15:33:22.547Z
https://docs.axolotl.ai/docs/docker.html
- 2025-04-10T05:34:37.177Z
+ 2025-04-10T15:33:22.544Z
https://docs.axolotl.ai/docs/ray-integration.html
- 2025-04-10T05:34:37.180Z
+ 2025-04-10T15:33:22.547Z
https://docs.axolotl.ai/docs/dataset-formats/index.html
- 2025-04-10T05:34:37.177Z
+ 2025-04-10T15:33:22.544Z
https://docs.axolotl.ai/docs/dataset-formats/conversation.html
- 2025-04-10T05:34:37.176Z
+ 2025-04-10T15:33:22.544Z
https://docs.axolotl.ai/docs/dataset-formats/pretraining.html
- 2025-04-10T05:34:37.177Z
+ 2025-04-10T15:33:22.544Z
https://docs.axolotl.ai/docs/dataset-formats/inst_tune.html
- 2025-04-10T05:34:37.177Z
+ 2025-04-10T15:33:22.544Z