diff --git a/.nojekyll b/.nojekyll index d0b3740b2..483cc7e42 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -644e3464 \ No newline at end of file +e6983b1a \ No newline at end of file diff --git a/docs/rlhf.html b/docs/rlhf.html index 292c2ed30..bd8143cfa 100644 --- a/docs/rlhf.html +++ b/docs/rlhf.html @@ -994,7 +994,7 @@ Important vllm_server_host: 0.0.0.0 vllm_server_port: 8000 vllm_server_timeout: 300 -
CUDA_VISIBLE_DEVICES=2,3 axolotl vllm_serve grpo.yaml
+
CUDA_VISIBLE_DEVICES=2,3 axolotl vllm-serve grpo.yaml

Your vLLM instance will now attempt to spin up, and it’s time to kick off training utilizing our remaining two GPUs. In another terminal, execute:

CUDA_VISIBLE_DEVICES=0,1 axolotl train grpo.yaml --num-processes 2
diff --git a/search.json b/search.json index a149b0691..b8d463506 100644 --- a/search.json +++ b/search.json @@ -2857,7 +2857,7 @@ "href": "docs/rlhf.html#rlhf-using-axolotl", "title": "RLHF (Beta)", "section": "RLHF using Axolotl", - "text": "RLHF using Axolotl\n\n\n\n\n\n\nImportant\n\n\n\nThis is a BETA feature and many features are not fully implemented. You are encouraged to open new PRs to improve the integration and functionality.\n\n\nWe rely on the TRL library for implementations of various RL training methods, which we wrap around to expose in axolotl. Each method has their own supported ways of loading datasets and prompt formats.\n\n\n\n\n\n\nTip\n\n\n\nYou can find what each method supports by going into src/axolotl/prompt_strategies/{method} where {method} is one of our supported methods. The type: can be retrieved from {method}.{function_name}.\n\n\n\nDPO\nExample config:\nrl: dpo\ndatasets:\n - path: Intel/orca_dpo_pairs\n split: train\n type: chatml.intel\n - path: argilla/ultrafeedback-binarized-preferences\n split: train\n type: chatml\nDPO supports the following types with the following dataset format:\n\nchatml.argilla\n{\n \"system\": \"...\", // optional\n \"instruction\": \"...\",\n \"chosen_response\": \"...\",\n \"rejected_response\": \"...\"\n}\n\n\nchatml.argilla_chat\n{\n \"chosen\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ],\n \"rejected\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\nchatml.icr\n{\n \"system\": \"...\", // optional\n \"input\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\nchatml.intel\n{\n \"system\": \"...\", // optional\n \"question\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\nchatml.prompt_pairs\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\nchatml.ultra\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"chosen\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ],\n \"rejected\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\nllama3.argilla\n{\n \"system\": \"...\", // optional\n \"instruction\": \"...\",\n \"chosen_response\": \"...\",\n \"rejected_response\": \"...\"\n}\n\n\nllama3.argilla_chat\n{\n \"chosen\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ],\n \"rejected\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\nllama3.icr\n{\n \"system\": \"...\", // optional\n \"input\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\nllama3.intel\n{\n \"system\": \"...\", // optional\n \"question\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\nllama3.prompt_pairs\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\nllama3.ultra\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"chosen\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ],\n \"rejected\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\nzephyr.nectar\n{\n \"prompt\": \"...\",\n \"answers\": [\n {\n \"answer\": \"...\",\n \"rank\": 1\n },\n {\n \"answer\": \"...\",\n \"rank\": 2\n }\n // ... more answers with ranks\n ]\n}\n\n\nchat_template.default\nrl: dpo\ndatasets:\n - path: ...\n split: train\n type: chat_template.default\n field_messages: \"messages\"\n field_chosen: \"chosen\"\n field_rejected: \"rejected\"\n message_property_mappings:\n role: role\n content: content\n roles:\n user: [\"user\"]\n assistant: [\"assistant\"]\n system: [\"system\"]\nSample input format:\n{\n \"messages\": [\n {\n \"role\": \"system\",\n \"content\": \"...\"\n },\n {\n \"role\": \"user\",\n \"content\": \"...\"\n },\n // ... more messages\n ],\n \"chosen\": {\n \"role\": \"assistant\",\n \"content\": \"...\"\n },\n \"rejected\": {\n \"role\": \"assistant\",\n \"content\": \"...\"\n }\n}\n\n\nuser_defined.default\nFor custom behaviors,\nrl: dpo\ndatasets:\n - path: ...\n split: train\n type: user_defined.default\n\n field_prompt: \"prompt\"\n field_system: \"system\"\n field_chosen: \"chosen\"\n field_rejected: \"rejected\"\n prompt_format: \"{prompt}\"\n chosen_format: \"{chosen}\"\n rejected_format: \"{rejected}\"\nThe input format is a simple JSON input with customizable fields based on the above config.\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\n\nIPO\nAs IPO is just DPO with a different loss function, all supported dataset formats for DPO are also supported for IPO.\nrl: ipo\n\n\nORPO\nPaper: https://arxiv.org/abs/2403.07691\nrl: orpo\norpo_alpha: 0.1\nremove_unused_columns: false\n\nchat_template: chatml\ndatasets:\n - path: argilla/ultrafeedback-binarized-preferences-cleaned\n type: chat_template.argilla\nORPO supports the following types with the following dataset format:\n\nchat_template.argilla\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\", // if available, will be taken as user message for single-turn instead of from list below\n\n // chosen/rejected should be same till last content and only even-number of alternating user/assistant turns\n \"chosen\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ],\n \"rejected\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\n\nKTO\nrl: kto\nrl_beta: 0.1 # default\nkto_desirable_weight: 1.0 # default\nkto_undesirable_weight: 1.0 # default\n\nremove_unused_columns: false\n\ndatasets:\n - path: argilla/ultrafeedback-binarized-preferences-cleaned-kto\n type: llama3.ultra\n split: train\n\ngradient_checkpointing: true\ngradient_checkpointing_kwargs:\n use_reentrant: true\nKTO supports the following types with the following dataset format:\n\nchatml.argilla\n{\n \"system\": \"...\", // optional\n \"instruction\": \"...\",\n \"completion\": \"...\"\n}\n\n\nchatml.argilla_chat\n{\n \"chosen\": [\n {\"role\": \"user\", \"content\": \"...\"}\n ],\n \"completion\": [\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\nchatml.intel\n{\n \"system\": \"...\", // optional\n \"question\": \"...\",\n \"completion\": \"...\"\n}\n\n\nchatml.prompt_pairs\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"completion\": \"...\"\n}\n\n\nchatml.ultra\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"completion\": \"...\"\n}\n\n\nllama3.argilla\n{\n \"system\": \"...\", // optional\n \"instruction\": \"...\",\n \"completion\": \"...\"\n}\n\n\nllama3.argilla_chat\n{\n \"completion\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\nllama3.intel\n{\n \"system\": \"...\", // optional\n \"question\": \"...\",\n \"completion\": \"...\"\n}\n\n\nllama3.prompt_pairs\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"completion\": \"...\"\n}\n\n\nllama3.ultra\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"completion\": \"...\"\n}\n\n\nuser_defined.default\nFor custom behaviors,\nrl: kto\ndatasets:\n - path: ...\n split: train\n type: user_defined.default\n\n field_prompt: \"prompt\"\n field_system: \"system\"\n field_completion: \"completion\"\n field_label: \"label\"\n prompt_format: \"{prompt}\"\n completion_format: \"{completion}\"\nThe input format is a simple JSON input with customizable fields based on the above config.\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"completion\": \"...\",\n \"label\": \"...\"\n}\n\n\n\nGRPO\n\n\n\n\n\n\nTip\n\n\n\nCheck out our GRPO cookbook.\n\n\nIf you have multiple GPUs available, we reccomend using vLLM with the GRPOTrainer to significantly speedup trajectory generation during training.\nFirst, launch a vLLM server using trl vllm-serve - you may use a config file or CLI overrides to configure your vLLM server. In this example, we’re\nusing 4 GPUs - 2 for training, and 2 for vLLM:\n\n\n\n\n\n\nImportant\n\n\n\nMake sure you’ve installed the correct version of vLLM by including it as an extra when installing axolotl, e.g. pip install axolotl[vllm].\n\n\nbase_model: Qwen/Qwen2.5-1.5B-Instruct\n\nvllm:\n host: 0.0.0.0\n port: 8000\n tensor_parallel_size: 2\n gpu_memory_utilization: 0.85\n dtype: auto\n # max_model_len: # you may find it useful to set the vLLM model context length if you know this beforehand\n\nrl: grpo\ntrl:\n use_vllm: true\n vllm_server_host: 0.0.0.0\n vllm_server_port: 8000\n vllm_server_timeout: 300\nCUDA_VISIBLE_DEVICES=2,3 axolotl vllm_serve grpo.yaml\nYour vLLM instance will now attempt to spin up, and it’s time to kick off training utilizing our remaining two GPUs. In another terminal, execute:\nCUDA_VISIBLE_DEVICES=0,1 axolotl train grpo.yaml --num-processes 2\n\nReward functions\nGRPO uses custom reward functions and transformations. Please have them ready locally.\nFor example, to load OpenAI’s GSM8K and use a random reward for completions:\n# rewards.py\nimport random\n\ndef rand_reward_func(completions, **kwargs) -> list[float]:\n return [random.uniform(0, 1) for _ in completions]\n\ndef oai_gsm8k_transform(cfg, *args, **kwargs):\n def transform_fn(example, tokenizer=None):\n label = example[\"answer\"].split(\"####\")[-1].strip().replace(\",\", \"\")\n return {\n \"prompt\": [{\"role\": \"user\", \"content\": example[\"question\"]},],\n \"answer\": label,\n }\n return transform_fn, {\"remove_columns\": [\"question\"]}\nrl: grpo\n\ntrl:\n beta: 0.001\n max_completion_length: 256\n use_vllm: True\n num_generations: 4\n reward_funcs: [\"rewards.rand_reward_func\"] # format: '{file_name}.{fn_name}'\n reward_weights: [1.0]\ndatasets:\n - path: openai/gsm8k\n name: main\n type: rewards.oai_gsm8k_transform # format: '{file_name}.{fn_name}'\nTo see other examples of custom reward functions, please see TRL GRPO Docs.\nTo see description of the configs, please see TRLConfig.\n\n\n\nSimPO\nSimPO uses CPOTrainer but with alternative loss function.\nrl: simpo\nrl_beta: 0.1 # default in CPOTrainer\ncpo_alpha: 1.0 # default in CPOTrainer\nsimpo_gamma: 0.5 # default in CPOTrainer\nThis method uses the same dataset format as DPO.\n\n\nUsing local dataset files\ndatasets:\n - ds_type: json\n data_files:\n - orca_rlhf.jsonl\n split: train\n type: chatml.intel\n\n\nTRL auto-unwrapping for PEFT\nTRL supports auto-unwrapping PEFT models for RL training paradigms which rely on a reference model. This significantly reduces memory pressure as an additional refreference model does not need to be loaded, and reference model log-probabilities can be obtained by disabling PEFT adapters. This is enabled by default. To turn it off, pass the following config:\n# load ref model when adapter training.\nrl_adapter_ref_model: true", + "text": "RLHF using Axolotl\n\n\n\n\n\n\nImportant\n\n\n\nThis is a BETA feature and many features are not fully implemented. You are encouraged to open new PRs to improve the integration and functionality.\n\n\nWe rely on the TRL library for implementations of various RL training methods, which we wrap around to expose in axolotl. Each method has their own supported ways of loading datasets and prompt formats.\n\n\n\n\n\n\nTip\n\n\n\nYou can find what each method supports by going into src/axolotl/prompt_strategies/{method} where {method} is one of our supported methods. The type: can be retrieved from {method}.{function_name}.\n\n\n\nDPO\nExample config:\nrl: dpo\ndatasets:\n - path: Intel/orca_dpo_pairs\n split: train\n type: chatml.intel\n - path: argilla/ultrafeedback-binarized-preferences\n split: train\n type: chatml\nDPO supports the following types with the following dataset format:\n\nchatml.argilla\n{\n \"system\": \"...\", // optional\n \"instruction\": \"...\",\n \"chosen_response\": \"...\",\n \"rejected_response\": \"...\"\n}\n\n\nchatml.argilla_chat\n{\n \"chosen\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ],\n \"rejected\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\nchatml.icr\n{\n \"system\": \"...\", // optional\n \"input\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\nchatml.intel\n{\n \"system\": \"...\", // optional\n \"question\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\nchatml.prompt_pairs\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\nchatml.ultra\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"chosen\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ],\n \"rejected\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\nllama3.argilla\n{\n \"system\": \"...\", // optional\n \"instruction\": \"...\",\n \"chosen_response\": \"...\",\n \"rejected_response\": \"...\"\n}\n\n\nllama3.argilla_chat\n{\n \"chosen\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ],\n \"rejected\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\nllama3.icr\n{\n \"system\": \"...\", // optional\n \"input\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\nllama3.intel\n{\n \"system\": \"...\", // optional\n \"question\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\nllama3.prompt_pairs\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\nllama3.ultra\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"chosen\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ],\n \"rejected\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\nzephyr.nectar\n{\n \"prompt\": \"...\",\n \"answers\": [\n {\n \"answer\": \"...\",\n \"rank\": 1\n },\n {\n \"answer\": \"...\",\n \"rank\": 2\n }\n // ... more answers with ranks\n ]\n}\n\n\nchat_template.default\nrl: dpo\ndatasets:\n - path: ...\n split: train\n type: chat_template.default\n field_messages: \"messages\"\n field_chosen: \"chosen\"\n field_rejected: \"rejected\"\n message_property_mappings:\n role: role\n content: content\n roles:\n user: [\"user\"]\n assistant: [\"assistant\"]\n system: [\"system\"]\nSample input format:\n{\n \"messages\": [\n {\n \"role\": \"system\",\n \"content\": \"...\"\n },\n {\n \"role\": \"user\",\n \"content\": \"...\"\n },\n // ... more messages\n ],\n \"chosen\": {\n \"role\": \"assistant\",\n \"content\": \"...\"\n },\n \"rejected\": {\n \"role\": \"assistant\",\n \"content\": \"...\"\n }\n}\n\n\nuser_defined.default\nFor custom behaviors,\nrl: dpo\ndatasets:\n - path: ...\n split: train\n type: user_defined.default\n\n field_prompt: \"prompt\"\n field_system: \"system\"\n field_chosen: \"chosen\"\n field_rejected: \"rejected\"\n prompt_format: \"{prompt}\"\n chosen_format: \"{chosen}\"\n rejected_format: \"{rejected}\"\nThe input format is a simple JSON input with customizable fields based on the above config.\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"chosen\": \"...\",\n \"rejected\": \"...\"\n}\n\n\n\nIPO\nAs IPO is just DPO with a different loss function, all supported dataset formats for DPO are also supported for IPO.\nrl: ipo\n\n\nORPO\nPaper: https://arxiv.org/abs/2403.07691\nrl: orpo\norpo_alpha: 0.1\nremove_unused_columns: false\n\nchat_template: chatml\ndatasets:\n - path: argilla/ultrafeedback-binarized-preferences-cleaned\n type: chat_template.argilla\nORPO supports the following types with the following dataset format:\n\nchat_template.argilla\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\", // if available, will be taken as user message for single-turn instead of from list below\n\n // chosen/rejected should be same till last content and only even-number of alternating user/assistant turns\n \"chosen\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ],\n \"rejected\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\n\nKTO\nrl: kto\nrl_beta: 0.1 # default\nkto_desirable_weight: 1.0 # default\nkto_undesirable_weight: 1.0 # default\n\nremove_unused_columns: false\n\ndatasets:\n - path: argilla/ultrafeedback-binarized-preferences-cleaned-kto\n type: llama3.ultra\n split: train\n\ngradient_checkpointing: true\ngradient_checkpointing_kwargs:\n use_reentrant: true\nKTO supports the following types with the following dataset format:\n\nchatml.argilla\n{\n \"system\": \"...\", // optional\n \"instruction\": \"...\",\n \"completion\": \"...\"\n}\n\n\nchatml.argilla_chat\n{\n \"chosen\": [\n {\"role\": \"user\", \"content\": \"...\"}\n ],\n \"completion\": [\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\nchatml.intel\n{\n \"system\": \"...\", // optional\n \"question\": \"...\",\n \"completion\": \"...\"\n}\n\n\nchatml.prompt_pairs\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"completion\": \"...\"\n}\n\n\nchatml.ultra\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"completion\": \"...\"\n}\n\n\nllama3.argilla\n{\n \"system\": \"...\", // optional\n \"instruction\": \"...\",\n \"completion\": \"...\"\n}\n\n\nllama3.argilla_chat\n{\n \"completion\": [\n {\"role\": \"user\", \"content\": \"...\"},\n {\"role\": \"assistant\", \"content\": \"...\"}\n ]\n}\n\n\nllama3.intel\n{\n \"system\": \"...\", // optional\n \"question\": \"...\",\n \"completion\": \"...\"\n}\n\n\nllama3.prompt_pairs\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"completion\": \"...\"\n}\n\n\nllama3.ultra\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"completion\": \"...\"\n}\n\n\nuser_defined.default\nFor custom behaviors,\nrl: kto\ndatasets:\n - path: ...\n split: train\n type: user_defined.default\n\n field_prompt: \"prompt\"\n field_system: \"system\"\n field_completion: \"completion\"\n field_label: \"label\"\n prompt_format: \"{prompt}\"\n completion_format: \"{completion}\"\nThe input format is a simple JSON input with customizable fields based on the above config.\n{\n \"system\": \"...\", // optional\n \"prompt\": \"...\",\n \"completion\": \"...\",\n \"label\": \"...\"\n}\n\n\n\nGRPO\n\n\n\n\n\n\nTip\n\n\n\nCheck out our GRPO cookbook.\n\n\nIf you have multiple GPUs available, we reccomend using vLLM with the GRPOTrainer to significantly speedup trajectory generation during training.\nFirst, launch a vLLM server using trl vllm-serve - you may use a config file or CLI overrides to configure your vLLM server. In this example, we’re\nusing 4 GPUs - 2 for training, and 2 for vLLM:\n\n\n\n\n\n\nImportant\n\n\n\nMake sure you’ve installed the correct version of vLLM by including it as an extra when installing axolotl, e.g. pip install axolotl[vllm].\n\n\nbase_model: Qwen/Qwen2.5-1.5B-Instruct\n\nvllm:\n host: 0.0.0.0\n port: 8000\n tensor_parallel_size: 2\n gpu_memory_utilization: 0.85\n dtype: auto\n # max_model_len: # you may find it useful to set the vLLM model context length if you know this beforehand\n\nrl: grpo\ntrl:\n use_vllm: true\n vllm_server_host: 0.0.0.0\n vllm_server_port: 8000\n vllm_server_timeout: 300\nCUDA_VISIBLE_DEVICES=2,3 axolotl vllm-serve grpo.yaml\nYour vLLM instance will now attempt to spin up, and it’s time to kick off training utilizing our remaining two GPUs. In another terminal, execute:\nCUDA_VISIBLE_DEVICES=0,1 axolotl train grpo.yaml --num-processes 2\n\nReward functions\nGRPO uses custom reward functions and transformations. Please have them ready locally.\nFor example, to load OpenAI’s GSM8K and use a random reward for completions:\n# rewards.py\nimport random\n\ndef rand_reward_func(completions, **kwargs) -> list[float]:\n return [random.uniform(0, 1) for _ in completions]\n\ndef oai_gsm8k_transform(cfg, *args, **kwargs):\n def transform_fn(example, tokenizer=None):\n label = example[\"answer\"].split(\"####\")[-1].strip().replace(\",\", \"\")\n return {\n \"prompt\": [{\"role\": \"user\", \"content\": example[\"question\"]},],\n \"answer\": label,\n }\n return transform_fn, {\"remove_columns\": [\"question\"]}\nrl: grpo\n\ntrl:\n beta: 0.001\n max_completion_length: 256\n use_vllm: True\n num_generations: 4\n reward_funcs: [\"rewards.rand_reward_func\"] # format: '{file_name}.{fn_name}'\n reward_weights: [1.0]\ndatasets:\n - path: openai/gsm8k\n name: main\n type: rewards.oai_gsm8k_transform # format: '{file_name}.{fn_name}'\nTo see other examples of custom reward functions, please see TRL GRPO Docs.\nTo see description of the configs, please see TRLConfig.\n\n\n\nSimPO\nSimPO uses CPOTrainer but with alternative loss function.\nrl: simpo\nrl_beta: 0.1 # default in CPOTrainer\ncpo_alpha: 1.0 # default in CPOTrainer\nsimpo_gamma: 0.5 # default in CPOTrainer\nThis method uses the same dataset format as DPO.\n\n\nUsing local dataset files\ndatasets:\n - ds_type: json\n data_files:\n - orca_rlhf.jsonl\n split: train\n type: chatml.intel\n\n\nTRL auto-unwrapping for PEFT\nTRL supports auto-unwrapping PEFT models for RL training paradigms which rely on a reference model. This significantly reduces memory pressure as an additional refreference model does not need to be loaded, and reference model log-probabilities can be obtained by disabling PEFT adapters. This is enabled by default. To turn it off, pass the following config:\n# load ref model when adapter training.\nrl_adapter_ref_model: true", "crumbs": [ "How To Guides", "RLHF (Beta)" diff --git a/sitemap.xml b/sitemap.xml index fa1e7103d..c0d089edf 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -2,682 +2,682 @@ https://docs.axolotl.ai/examples/colab-notebooks/colab-axolotl-example.html - 2025-04-10T05:34:37.181Z + 2025-04-10T15:33:22.548Z https://docs.axolotl.ai/docs/dataset-formats/stepwise_supervised.html - 2025-04-10T05:34:37.177Z + 2025-04-10T15:33:22.544Z https://docs.axolotl.ai/docs/dataset-formats/template_free.html - 2025-04-10T05:34:37.177Z + 2025-04-10T15:33:22.544Z https://docs.axolotl.ai/docs/dataset-formats/tokenized.html - 2025-04-10T05:34:37.177Z + 2025-04-10T15:33:22.544Z https://docs.axolotl.ai/docs/nccl.html - 2025-04-10T05:34:37.180Z + 2025-04-10T15:33:22.547Z https://docs.axolotl.ai/docs/amd_hpc.html - 2025-04-10T05:34:37.176Z + 2025-04-10T15:33:22.543Z https://docs.axolotl.ai/docs/config.html - 2025-04-10T05:34:37.176Z + 2025-04-10T15:33:22.543Z https://docs.axolotl.ai/docs/multi-gpu.html - 2025-04-10T05:34:37.180Z + 2025-04-10T15:33:22.547Z https://docs.axolotl.ai/docs/installation.html - 2025-04-10T05:34:37.180Z + 2025-04-10T15:33:22.547Z https://docs.axolotl.ai/docs/torchao.html - 2025-04-10T05:34:37.180Z + 2025-04-10T15:33:22.547Z https://docs.axolotl.ai/docs/reward_modelling.html - 2025-04-10T05:34:37.180Z + 2025-04-10T15:33:22.547Z https://docs.axolotl.ai/docs/input_output.html - 2025-04-10T05:34:37.180Z + 2025-04-10T15:33:22.547Z https://docs.axolotl.ai/docs/multimodal.html - 2025-04-10T05:34:37.180Z + 2025-04-10T15:33:22.547Z https://docs.axolotl.ai/docs/api/utils.callbacks.mlflow_.html - 2025-04-10T05:35:05.591Z + 2025-04-10T15:33:52.397Z https://docs.axolotl.ai/docs/api/monkeypatch.trainer_fsdp_optim.html - 2025-04-10T05:35:05.184Z + 2025-04-10T15:33:51.991Z https://docs.axolotl.ai/docs/api/monkeypatch.data.batch_dataset_fetcher.html - 2025-04-10T05:35:05.200Z + 2025-04-10T15:33:52.008Z https://docs.axolotl.ai/docs/api/prompt_strategies.stepwise_supervised.html - 2025-04-10T05:35:04.886Z + 2025-04-10T15:33:51.697Z https://docs.axolotl.ai/docs/api/monkeypatch.mistral_attn_hijack_flash.html - 2025-04-10T05:35:05.130Z + 2025-04-10T15:33:51.937Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.user_defined.html - 2025-04-10T05:35:04.933Z + 2025-04-10T15:33:51.743Z https://docs.axolotl.ai/docs/api/integrations.liger.args.html - 2025-04-10T05:35:05.507Z + 2025-04-10T15:33:52.315Z https://docs.axolotl.ai/docs/api/utils.schemas.training.html - 2025-04-10T05:35:05.371Z + 2025-04-10T15:33:52.178Z https://docs.axolotl.ai/docs/api/datasets.html - 2025-04-10T05:35:04.376Z + 2025-04-10T15:33:51.197Z https://docs.axolotl.ai/docs/api/kernels.geglu.html - 2025-04-10T05:35:05.069Z + 2025-04-10T15:33:51.877Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_flash.html - 2025-04-10T05:35:05.114Z + 2025-04-10T15:33:51.921Z https://docs.axolotl.ai/docs/api/cli.sweeps.html - 2025-04-10T05:35:04.718Z + 2025-04-10T15:33:51.531Z https://docs.axolotl.ai/docs/api/utils.freeze.html - 2025-04-10T05:35:05.272Z + 2025-04-10T15:33:52.080Z https://docs.axolotl.ai/docs/api/monkeypatch.multipack.html - 2025-04-10T05:35:05.131Z + 2025-04-10T15:33:51.939Z https://docs.axolotl.ai/docs/api/cli.main.html - 2025-04-10T05:35:04.611Z + 2025-04-10T15:33:51.427Z https://docs.axolotl.ai/docs/api/core.trainers.trl.html - 2025-04-10T05:35:04.795Z + 2025-04-10T15:33:51.607Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.passthrough.html - 2025-04-10T05:35:04.935Z + 2025-04-10T15:33:51.744Z https://docs.axolotl.ai/docs/api/core.chat.format.llama3x.html - 2025-04-10T05:35:04.565Z + 2025-04-10T15:33:51.382Z https://docs.axolotl.ai/docs/api/core.datasets.transforms.chat_builder.html - 2025-04-10T05:35:04.580Z + 2025-04-10T15:33:51.396Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.user_defined.html - 2025-04-10T05:35:04.952Z + 2025-04-10T15:33:51.762Z https://docs.axolotl.ai/docs/api/utils.collators.mamba.html - 2025-04-10T05:35:05.563Z + 2025-04-10T15:33:52.370Z https://docs.axolotl.ai/docs/api/integrations.base.html - 2025-04-10T05:35:05.492Z + 2025-04-10T15:33:52.300Z https://docs.axolotl.ai/docs/api/utils.bench.html - 2025-04-10T05:35:05.264Z + 2025-04-10T15:33:52.072Z https://docs.axolotl.ai/docs/api/kernels.swiglu.html - 2025-04-10T05:35:05.079Z + 2025-04-10T15:33:51.887Z https://docs.axolotl.ai/docs/api/core.chat.format.shared.html - 2025-04-10T05:35:04.567Z + 2025-04-10T15:33:51.384Z https://docs.axolotl.ai/docs/api/integrations.cut_cross_entropy.args.html - 2025-04-10T05:35:05.496Z + 2025-04-10T15:33:52.303Z https://docs.axolotl.ai/docs/api/core.datasets.chat.html - 2025-04-10T05:35:04.572Z + 2025-04-10T15:33:51.389Z https://docs.axolotl.ai/docs/api/utils.callbacks.lisa.html - 2025-04-10T05:35:05.587Z + 2025-04-10T15:33:52.394Z https://docs.axolotl.ai/docs/api/integrations.grokfast.optimizer.html - 2025-04-10T05:35:05.497Z + 2025-04-10T15:33:52.304Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_chat.html - 2025-04-10T05:35:04.835Z + 2025-04-10T15:33:51.647Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_instruct.html - 2025-04-10T05:35:04.837Z + 2025-04-10T15:33:51.648Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.chatml.html - 2025-04-10T05:35:04.951Z + 2025-04-10T15:33:51.760Z https://docs.axolotl.ai/docs/api/utils.schemas.integrations.html - 2025-04-10T05:35:05.418Z + 2025-04-10T15:33:52.224Z https://docs.axolotl.ai/docs/api/utils.schemas.trl.html - 2025-04-10T05:35:05.400Z + 2025-04-10T15:33:52.207Z https://docs.axolotl.ai/docs/api/prompt_tokenizers.html - 2025-04-10T05:35:04.432Z + 2025-04-10T15:33:51.252Z https://docs.axolotl.ai/docs/api/utils.data.sft.html - 2025-04-10T05:35:05.348Z + 2025-04-10T15:33:52.155Z https://docs.axolotl.ai/docs/api/utils.schedulers.html - 2025-04-10T05:35:05.314Z + 2025-04-10T15:33:52.121Z https://docs.axolotl.ai/docs/api/utils.chat_templates.html - 2025-04-10T05:35:05.247Z + 2025-04-10T15:33:52.054Z https://docs.axolotl.ai/docs/api/utils.models.html - 2025-04-10T05:35:05.231Z + 2025-04-10T15:33:52.038Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chatml.html - 2025-04-10T05:35:04.930Z + 2025-04-10T15:33:51.740Z https://docs.axolotl.ai/docs/api/utils.distributed.html - 2025-04-10T05:35:05.334Z + 2025-04-10T15:33:52.141Z https://docs.axolotl.ai/docs/api/monkeypatch.utils.html - 2025-04-10T05:35:05.172Z + 2025-04-10T15:33:51.980Z https://docs.axolotl.ai/docs/api/utils.schemas.utils.html - 2025-04-10T05:35:05.430Z + 2025-04-10T15:33:52.236Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_expand_mask.html - 2025-04-10T05:35:05.139Z + 2025-04-10T15:33:51.947Z https://docs.axolotl.ai/docs/api/common.datasets.html - 2025-04-10T05:35:05.533Z + 2025-04-10T15:33:52.340Z https://docs.axolotl.ai/docs/api/logging_config.html - 2025-04-10T05:35:04.437Z + 2025-04-10T15:33:51.257Z https://docs.axolotl.ai/docs/api/kernels.quantize.html - 2025-04-10T05:35:05.086Z + 2025-04-10T15:33:51.894Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_patch_multipack.html - 2025-04-10T05:35:05.175Z + 2025-04-10T15:33:51.983Z https://docs.axolotl.ai/docs/api/utils.schemas.model.html - 2025-04-10T05:35:05.366Z + 2025-04-10T15:33:52.173Z https://docs.axolotl.ai/docs/api/monkeypatch.stablelm_attn_hijack_flash.html - 2025-04-10T05:35:05.181Z + 2025-04-10T15:33:51.988Z https://docs.axolotl.ai/docs/api/monkeypatch.mixtral.html - 2025-04-10T05:35:05.202Z + 2025-04-10T15:33:52.009Z https://docs.axolotl.ai/docs/api/utils.tokenization.html - 2025-04-10T05:35:05.237Z + 2025-04-10T15:33:52.044Z https://docs.axolotl.ai/docs/api/integrations.kd.trainer.html - 2025-04-10T05:35:05.504Z + 2025-04-10T15:33:52.311Z https://docs.axolotl.ai/docs/api/utils.schemas.datasets.html - 2025-04-10T05:35:05.388Z + 2025-04-10T15:33:52.195Z https://docs.axolotl.ai/docs/api/utils.collators.core.html - 2025-04-10T05:35:05.535Z + 2025-04-10T15:33:52.343Z https://docs.axolotl.ai/docs/api/monkeypatch.btlm_attn_hijack_flash.html - 2025-04-10T05:35:05.174Z + 2025-04-10T15:33:51.981Z https://docs.axolotl.ai/docs/api/utils.optimizers.adopt.html - 2025-04-10T05:35:05.345Z + 2025-04-10T15:33:52.152Z https://docs.axolotl.ai/docs/api/prompt_strategies.input_output.html - 2025-04-10T05:35:04.882Z + 2025-04-10T15:33:51.692Z https://docs.axolotl.ai/docs/api/index.html - 2025-04-10T05:35:04.297Z + 2025-04-10T15:33:51.119Z https://docs.axolotl.ai/docs/api/cli.cloud.modal_.html - 2025-04-10T05:35:04.764Z + 2025-04-10T15:33:51.576Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.llama3.html - 2025-04-10T05:35:04.920Z + 2025-04-10T15:33:51.730Z https://docs.axolotl.ai/docs/api/cli.train.html - 2025-04-10T05:35:04.620Z + 2025-04-10T15:33:51.436Z https://docs.axolotl.ai/docs/api/core.trainer_builder.html - 2025-04-10T05:35:04.452Z + 2025-04-10T15:33:51.272Z https://docs.axolotl.ai/docs/api/utils.callbacks.perplexity.html - 2025-04-10T05:35:05.582Z + 2025-04-10T15:33:52.389Z https://docs.axolotl.ai/docs/getting-started.html - 2025-04-10T05:34:37.177Z + 2025-04-10T15:33:22.544Z https://docs.axolotl.ai/docs/dataset_loading.html - 2025-04-10T05:34:37.177Z + 2025-04-10T15:33:22.544Z https://docs.axolotl.ai/docs/batch_vs_grad.html - 2025-04-10T05:34:37.176Z + 2025-04-10T15:33:22.543Z https://docs.axolotl.ai/docs/faq.html - 2025-04-10T05:34:37.177Z + 2025-04-10T15:33:22.544Z https://docs.axolotl.ai/docs/debugging.html - 2025-04-10T05:34:37.177Z + 2025-04-10T15:33:22.544Z https://docs.axolotl.ai/docs/lr_groups.html - 2025-04-10T05:34:37.180Z + 2025-04-10T15:33:22.547Z https://docs.axolotl.ai/TODO.html - 2025-04-10T05:34:37.175Z + 2025-04-10T15:33:22.542Z https://docs.axolotl.ai/src/axolotl/integrations/LICENSE.html - 2025-04-10T05:34:37.196Z + 2025-04-10T15:33:22.563Z https://docs.axolotl.ai/index.html - 2025-04-10T05:34:37.193Z + 2025-04-10T15:33:22.560Z https://docs.axolotl.ai/src/axolotl/integrations/cut_cross_entropy/ACKNOWLEDGEMENTS.html - 2025-04-10T05:34:37.196Z + 2025-04-10T15:33:22.563Z https://docs.axolotl.ai/FAQS.html - 2025-04-10T05:34:37.175Z + 2025-04-10T15:33:22.542Z https://docs.axolotl.ai/docs/multi-node.html - 2025-04-10T05:34:37.180Z + 2025-04-10T15:33:22.547Z https://docs.axolotl.ai/docs/sequence_parallelism.html - 2025-04-10T05:34:37.180Z + 2025-04-10T15:33:22.547Z https://docs.axolotl.ai/docs/multipack.html - 2025-04-10T05:34:37.180Z + 2025-04-10T15:33:22.547Z https://docs.axolotl.ai/docs/inference.html - 2025-04-10T05:34:37.180Z + 2025-04-10T15:33:22.547Z https://docs.axolotl.ai/docs/lora_optims.html - 2025-04-10T05:34:37.180Z + 2025-04-10T15:33:22.547Z https://docs.axolotl.ai/docs/api/utils.lora_embeddings.html - 2025-04-10T05:35:05.255Z + 2025-04-10T15:33:52.063Z https://docs.axolotl.ai/docs/api/kernels.utils.html - 2025-04-10T05:35:05.088Z + 2025-04-10T15:33:51.896Z https://docs.axolotl.ai/docs/api/prompt_strategies.chat_template.html - 2025-04-10T05:35:04.822Z + 2025-04-10T15:33:51.633Z https://docs.axolotl.ai/docs/api/convert.html - 2025-04-10T05:35:04.390Z + 2025-04-10T15:33:51.211Z https://docs.axolotl.ai/docs/api/common.const.html - 2025-04-10T05:35:05.517Z + 2025-04-10T15:33:52.324Z https://docs.axolotl.ai/docs/api/cli.cloud.base.html - 2025-04-10T05:35:04.757Z + 2025-04-10T15:33:51.570Z https://docs.axolotl.ai/docs/api/monkeypatch.relora.html - 2025-04-10T05:35:05.138Z + 2025-04-10T15:33:51.946Z https://docs.axolotl.ai/docs/api/utils.lora.html - 2025-04-10T05:35:05.252Z + 2025-04-10T15:33:52.059Z https://docs.axolotl.ai/docs/api/cli.merge_lora.html - 2025-04-10T05:35:04.692Z + 2025-04-10T15:33:51.506Z https://docs.axolotl.ai/docs/api/prompt_strategies.bradley_terry.llama3.html - 2025-04-10T05:35:04.976Z + 2025-04-10T15:33:51.785Z https://docs.axolotl.ai/docs/api/cli.merge_sharded_fsdp_weights.html - 2025-04-10T05:35:04.704Z + 2025-04-10T15:33:51.517Z https://docs.axolotl.ai/docs/api/integrations.spectrum.args.html - 2025-04-10T05:35:05.514Z + 2025-04-10T15:33:52.321Z https://docs.axolotl.ai/docs/api/models.mamba.modeling_mamba.html - 2025-04-10T05:35:05.534Z + 2025-04-10T15:33:52.341Z https://docs.axolotl.ai/docs/api/common.architectures.html - 2025-04-10T05:35:05.515Z + 2025-04-10T15:33:52.323Z https://docs.axolotl.ai/docs/api/utils.trainer.html - 2025-04-10T05:35:05.289Z + 2025-04-10T15:33:52.096Z https://docs.axolotl.ai/docs/api/utils.callbacks.comet_.html - 2025-04-10T05:35:05.594Z + 2025-04-10T15:33:52.401Z https://docs.axolotl.ai/docs/api/cli.vllm_serve.html - 2025-04-10T05:35:04.754Z + 2025-04-10T15:33:51.567Z https://docs.axolotl.ai/docs/api/utils.schemas.multimodal.html - 2025-04-10T05:35:05.406Z + 2025-04-10T15:33:52.212Z https://docs.axolotl.ai/docs/api/utils.gradient_checkpointing.unsloth.html - 2025-04-10T05:35:05.351Z + 2025-04-10T15:33:52.158Z https://docs.axolotl.ai/docs/api/core.trainers.base.html - 2025-04-10T05:35:04.778Z + 2025-04-10T15:33:51.590Z https://docs.axolotl.ai/docs/api/monkeypatch.unsloth_.html - 2025-04-10T05:35:05.192Z + 2025-04-10T15:33:51.999Z https://docs.axolotl.ai/docs/api/utils.samplers.multipack.html - 2025-04-10T05:35:05.576Z + 2025-04-10T15:33:52.382Z https://docs.axolotl.ai/docs/api/utils.callbacks.profiler.html - 2025-04-10T05:35:05.586Z + 2025-04-10T15:33:52.392Z https://docs.axolotl.ai/docs/api/integrations.lm_eval.args.html - 2025-04-10T05:35:05.510Z + 2025-04-10T15:33:52.318Z https://docs.axolotl.ai/docs/api/utils.data.pretraining.html - 2025-04-10T05:35:05.346Z + 2025-04-10T15:33:52.153Z https://docs.axolotl.ai/docs/api/evaluate.html - 2025-04-10T05:35:04.369Z + 2025-04-10T15:33:51.190Z https://docs.axolotl.ai/docs/api/utils.dict.html - 2025-04-10T05:35:05.337Z + 2025-04-10T15:33:52.145Z https://docs.axolotl.ai/docs/api/cli.utils.html - 2025-04-10T05:35:04.749Z + 2025-04-10T15:33:51.562Z https://docs.axolotl.ai/docs/api/prompt_strategies.pygmalion.html - 2025-04-10T05:35:04.904Z + 2025-04-10T15:33:51.714Z https://docs.axolotl.ai/docs/api/core.training_args.html - 2025-04-10T05:35:04.539Z + 2025-04-10T15:33:51.357Z https://docs.axolotl.ai/docs/api/cli.inference.html - 2025-04-10T05:35:04.684Z + 2025-04-10T15:33:51.498Z https://docs.axolotl.ai/docs/api/kernels.lora.html - 2025-04-10T05:35:05.058Z + 2025-04-10T15:33:51.867Z https://docs.axolotl.ai/docs/api/cli.evaluate.html - 2025-04-10T05:35:04.628Z + 2025-04-10T15:33:51.444Z https://docs.axolotl.ai/docs/api/utils.collators.batching.html - 2025-04-10T05:35:05.559Z + 2025-04-10T15:33:52.366Z https://docs.axolotl.ai/docs/api/prompt_strategies.completion.html - 2025-04-10T05:35:04.876Z + 2025-04-10T15:33:51.686Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.zephyr.html - 2025-04-10T05:35:04.932Z + 2025-04-10T15:33:51.741Z https://docs.axolotl.ai/docs/api/prompt_strategies.metharme.html - 2025-04-10T05:35:04.893Z + 2025-04-10T15:33:51.704Z https://docs.axolotl.ai/docs/api/prompt_strategies.orpo.chat_template.html - 2025-04-10T05:35:04.973Z + 2025-04-10T15:33:51.782Z https://docs.axolotl.ai/docs/api/prompt_strategies.alpaca_w_system.html - 2025-04-10T05:35:04.849Z + 2025-04-10T15:33:51.660Z https://docs.axolotl.ai/docs/api/utils.model_shard_quant.html - 2025-04-10T05:35:05.261Z + 2025-04-10T15:33:52.068Z https://docs.axolotl.ai/docs/api/cli.config.html - 2025-04-10T05:35:04.669Z + 2025-04-10T15:33:51.484Z https://docs.axolotl.ai/docs/api/utils.schemas.enums.html - 2025-04-10T05:35:05.425Z + 2025-04-10T15:33:52.230Z https://docs.axolotl.ai/docs/api/cli.preprocess.html - 2025-04-10T05:35:04.712Z + 2025-04-10T15:33:51.525Z https://docs.axolotl.ai/docs/api/core.chat.messages.html - 2025-04-10T05:35:04.562Z + 2025-04-10T15:33:51.379Z https://docs.axolotl.ai/docs/api/prompt_strategies.dpo.chat_template.html - 2025-04-10T05:35:04.910Z + 2025-04-10T15:33:51.720Z https://docs.axolotl.ai/docs/api/utils.schemas.peft.html - 2025-04-10T05:35:05.397Z + 2025-04-10T15:33:52.204Z https://docs.axolotl.ai/docs/api/train.html - 2025-04-10T05:35:04.358Z + 2025-04-10T15:33:51.180Z https://docs.axolotl.ai/docs/api/prompt_strategies.messages.chat.html - 2025-04-10T05:35:04.908Z + 2025-04-10T15:33:51.718Z https://docs.axolotl.ai/docs/api/prompt_strategies.orcamini.html - 2025-04-10T05:35:04.897Z + 2025-04-10T15:33:51.707Z https://docs.axolotl.ai/docs/api/utils.collators.mm_chat.html - 2025-04-10T05:35:05.568Z + 2025-04-10T15:33:52.375Z https://docs.axolotl.ai/docs/api/prompt_strategies.kto.llama3.html - 2025-04-10T05:35:04.943Z + 2025-04-10T15:33:51.752Z https://docs.axolotl.ai/docs/api/monkeypatch.attention.mllama.html - 2025-04-10T05:35:05.199Z + 2025-04-10T15:33:52.006Z https://docs.axolotl.ai/docs/api/cli.checks.html - 2025-04-10T05:35:04.652Z + 2025-04-10T15:33:51.467Z https://docs.axolotl.ai/docs/api/monkeypatch.transformers_fa_utils.html - 2025-04-10T05:35:05.191Z + 2025-04-10T15:33:51.998Z https://docs.axolotl.ai/docs/api/monkeypatch.llama_attn_hijack_xformers.html - 2025-04-10T05:35:05.115Z + 2025-04-10T15:33:51.923Z https://docs.axolotl.ai/docs/api/core.trainers.dpo.trainer.html - 2025-04-10T05:35:04.802Z + 2025-04-10T15:33:51.614Z https://docs.axolotl.ai/docs/api/prompt_strategies.user_defined.html - 2025-04-10T05:35:04.857Z + 2025-04-10T15:33:51.668Z https://docs.axolotl.ai/docs/api/cli.args.html - 2025-04-10T05:35:04.645Z + 2025-04-10T15:33:51.460Z https://docs.axolotl.ai/docs/api/prompt_strategies.llama2_chat.html - 2025-04-10T05:35:04.870Z + 2025-04-10T15:33:51.681Z https://docs.axolotl.ai/docs/api/utils.schemas.config.html - 2025-04-10T05:35:05.359Z + 2025-04-10T15:33:52.166Z https://docs.axolotl.ai/docs/api/core.trainers.grpo.trainer.html - 2025-04-10T05:35:04.805Z + 2025-04-10T15:33:51.617Z https://docs.axolotl.ai/docs/api/core.chat.format.chatml.html - 2025-04-10T05:35:04.564Z + 2025-04-10T15:33:51.381Z https://docs.axolotl.ai/docs/api/monkeypatch.lora_kernels.html - 2025-04-10T05:35:05.164Z + 2025-04-10T15:33:51.972Z https://docs.axolotl.ai/docs/api/prompt_strategies.base.html - 2025-04-10T05:35:04.807Z + 2025-04-10T15:33:51.618Z https://docs.axolotl.ai/docs/rlhf.html - 2025-04-10T05:34:37.180Z + 2025-04-10T15:33:22.547Z https://docs.axolotl.ai/docs/cli.html - 2025-04-10T05:34:37.176Z + 2025-04-10T15:33:22.543Z https://docs.axolotl.ai/docs/unsloth.html - 2025-04-10T05:34:37.180Z + 2025-04-10T15:33:22.547Z https://docs.axolotl.ai/docs/fsdp_qlora.html - 2025-04-10T05:34:37.177Z + 2025-04-10T15:33:22.544Z https://docs.axolotl.ai/docs/dataset_preprocessing.html - 2025-04-10T05:34:37.177Z + 2025-04-10T15:33:22.544Z https://docs.axolotl.ai/docs/custom_integrations.html - 2025-04-10T05:34:37.176Z + 2025-04-10T15:33:22.544Z https://docs.axolotl.ai/docs/mac.html - 2025-04-10T05:34:37.180Z + 2025-04-10T15:33:22.547Z https://docs.axolotl.ai/docs/docker.html - 2025-04-10T05:34:37.177Z + 2025-04-10T15:33:22.544Z https://docs.axolotl.ai/docs/ray-integration.html - 2025-04-10T05:34:37.180Z + 2025-04-10T15:33:22.547Z https://docs.axolotl.ai/docs/dataset-formats/index.html - 2025-04-10T05:34:37.177Z + 2025-04-10T15:33:22.544Z https://docs.axolotl.ai/docs/dataset-formats/conversation.html - 2025-04-10T05:34:37.176Z + 2025-04-10T15:33:22.544Z https://docs.axolotl.ai/docs/dataset-formats/pretraining.html - 2025-04-10T05:34:37.177Z + 2025-04-10T15:33:22.544Z https://docs.axolotl.ai/docs/dataset-formats/inst_tune.html - 2025-04-10T05:34:37.177Z + 2025-04-10T15:33:22.544Z