diff --git a/_quarto.yml b/_quarto.yml index c0536e730..943ed5293 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -32,8 +32,9 @@ website: contents: - docs/getting-started.qmd - docs/installation.qmd - - docs/cli.qmd - docs/inference.qmd + - docs/cli.qmd + - docs/config.qmd - section: "Dataset Formats" contents: docs/dataset-formats/* @@ -74,10 +75,6 @@ website: - docs/debugging.qmd - docs/nccl.qmd - - section: "Reference" - contents: - - docs/config.qmd - format: html: theme: darkly diff --git a/docs/config.qmd b/docs/config.qmd index fb0c4b59b..38ec368a1 100644 --- a/docs/config.qmd +++ b/docs/config.qmd @@ -1,5 +1,5 @@ --- -title: Config options +title: Config Reference description: A complete list of all configuration options. --- @@ -30,6 +30,8 @@ tokenizer_legacy: # Resize the model embeddings when new tokens are added to multiples of 32 # This is reported to improve training speed on some models resize_token_embeddings_to_32x: +# Optional[bool] Whether to shrink the embeddings to len(tokenizer). By default, we won't shrink. +shrink_embeddings: # (Internal use only) # Used to identify which the model is based on @@ -205,10 +207,46 @@ test_datasets: data_files: - /workspace/data/eval.jsonl -# use RL training: 'dpo', 'ipo', 'kto' +# use RL training: 'dpo', 'ipo', 'kto', 'simpo', 'orpo', 'grpo' rl: -# whether to perform weighting if doing DPO training. Boolean. -dpo_use_weighting: +rl_beta: # Optional[float]. The beta parameter for the RL training. + +# dpo +dpo_use_weighting: # Optional[bool]. Whether to perform weighting. +rpo_alpha: # Optional[float]. Weighting of NLL term in loss from RPO paper. + +# orpo +orpo_alpha: 0.1 # Parameter controlling the relative ratio loss weight in the ORPO loss. Passed to `beta` in `ORPOConfig` due to trl mapping. + +# kto +kto_desirable_weight: # Optional[float]. Factor for desirable loss term in KTO loss. +kto_undesirable_weight: # Optional[float]. Factor for undesirable loss term in KTO loss. + +# simpo +cpo_alpha: 1.0 # Weight of the BC regularizer +simpo_gamma: 0.5 # Target reward margin for the SimPO loss + +# grpo +trl: + use_vllm: # Optional[bool]. Whether to use VLLM for RL training. + vllm_device: # Optional[str]. Device to use for VLLM. + vllm_gpu_memory_utilization: # Optional[float]. GPU memory utilization for VLLM. + vllm_max_model_len: # Optional[int]. Maximum length of the model for VLLM. + vllm_dtype: # Optional[str]. Data type for VLLM. + + beta: # Optional[float]. Beta parameter for the RL training. Same as `rl_beta`. Use + max_completion_length: # Optional[int]. Maximum length of the completion for RL training. + + reward_funcs: # Optional[list[str]]. List of reward functions to load. Paths must be importable from current dir. + reward_weights: # Optional[list[float]]. List of reward weights for the reward functions. + + num_generations: # Optional[int]. Number of generations to sample. + log_completions: # Optional[bool]. Whether to log completions. + + sync_ref_model: # Optional[bool]. Whether to sync the reference model. + ref_model_mixup_alpha: # Optional[float]. Mixup alpha for the reference model. + ref_model_sync_steps: # Optional[int]. Sync steps for the reference model. + # reward modelling: `True` or `False` reward_model: @@ -232,7 +270,7 @@ default_system_message: You are a helpful assistant. Please give a long and deta # subsequent training attempts load faster, relative path dataset_prepared_path: data/last_run_prepared # Push prepared dataset to hub -push_dataset_to_hub: # repo path +push_dataset_to_hub: # Optional[str] repo_org/repo_name # The maximum number of processes to use while preprocessing your input dataset. This defaults to `os.cpu_count()` # if not set. dataset_processes: # defaults to os.cpu_count() if not set diff --git a/docs/faq.qmd b/docs/faq.qmd index ba7ac1265..acec1886e 100644 --- a/docs/faq.qmd +++ b/docs/faq.qmd @@ -27,6 +27,16 @@ description: Frequently asked questions > A: This is usually an issue with the GPU. This can be resolved through setting the os environment variable `CUDA_VISIBLE_DEVICES=0`. If you are on runpod, this is usually a pod issue. Starting a new pod should take care of it. +**Q: Received mismatch error on merge adapters / loading adapters between torch.Size of checkpoint and model.** + +> A: This is likely due to vocab size mismatch. By default, Axolotl expands the model's embeddings if the tokenizer has more tokens than the model. Please use the `axolotl merge-lora` command to merge the adapters instead of using your own scripts. + +> On the other hand, if the model has more tokens than the tokenizer, Axolotl does not shrink the model's embeddings unless `shrink_embeddings: true` is set in the config. + +**Q: How to call Axolotl via custom python scripts?** + +> A: Yes, since Axolotl is just Python, please see `src/axolotl/cli/main.py` on how each command is called. + ### Chat templates **Q: `jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'content' / 'role' / ____`** diff --git a/docs/getting-started.qmd b/docs/getting-started.qmd index 8e826b959..a0501ad21 100644 --- a/docs/getting-started.qmd +++ b/docs/getting-started.qmd @@ -36,7 +36,9 @@ The YAML configuration file controls everything about your training. Here's what ```yaml base_model: NousResearch/Llama-3.2-1B -# hub_model_id: username/custom_model_name + +load_in_8bit: true +adapter: lora datasets: - path: teknium/GPT4-LLM-Cleaned @@ -44,11 +46,15 @@ datasets: dataset_prepared_path: last_run_prepared val_set_size: 0.1 output_dir: ./outputs/lora-out - -adapter: lora -lora_model_dir: ``` +::: {.callout-tip} +`load_in_8bit: true` and `adapter: lora` enables LoRA adapter finetuning. + +- To perform Full finetuning, remove these two lines. +- To perform QLoRA finetuning, replace with `load_in_4bit: true` and `adapter: qlora`. +::: + See our [Config options](config.qmd) for more details. ### Training {#sec-training} @@ -56,7 +62,7 @@ See our [Config options](config.qmd) for more details. When you run `axolotl train`, Axolotl: 1. Downloads the base model -2. (If specified) applies LoRA adapter layers +2. (If specified) applies QLoRA/LoRA adapter layers 3. Loads and processes the dataset 4. Runs the training loop 5. Saves the trained model and / or LoRA weights @@ -69,6 +75,8 @@ Let's modify the example for your own data: ```yaml base_model: NousResearch/Nous-Hermes-llama-1b-v1 + +load_in_8bit: true adapter: lora # Training settings @@ -104,8 +112,6 @@ format): {"instruction": "Classify this text", "input": "Not good at all", "output": "negative"} ``` -Please consult the supported [Dataset Formats](dataset-formats/) for more details. - 3. Run the training: ```bash diff --git a/docs/inference.qmd b/docs/inference.qmd index aded400d0..6917d3c33 100644 --- a/docs/inference.qmd +++ b/docs/inference.qmd @@ -1,5 +1,5 @@ --- -title: "Inference" +title: "Inference and Merging" format: html: toc: true @@ -9,10 +9,14 @@ execute: enabled: false --- -This guide covers how to use your trained models for inference, including model loading, interactive testing, and common troubleshooting steps. +This guide covers how to use your trained models for inference, including model loading, interactive testing, merging adapters, and common troubleshooting steps. ## Quick Start {#sec-quickstart} +::: {.callout-tip} +Use the same config used for training on inference/merging. +::: + ### Basic Inference {#sec-basic} ::: {.panel-tabset} diff --git a/docs/rlhf.qmd b/docs/rlhf.qmd index ac1cf0393..6bef7c831 100644 --- a/docs/rlhf.qmd +++ b/docs/rlhf.qmd @@ -298,7 +298,7 @@ The input format is a simple JSON input with customizable fields based on the ab ### IPO -As IPO is just DPO with a different loss function, all supported options for DPO works here. +As IPO is just DPO with a different loss function, all supported dataset formats for [DPO](#dpo) are also supported for IPO. ```yaml rl: ipo @@ -344,8 +344,9 @@ ORPO supports the following types with the following dataset format: ```yaml rl: kto -rl_beta: 0.5 -kto_desirable_weight: 0.2 +rl_beta: 0.1 # default +kto_desirable_weight: 1.0 # default +kto_undesirable_weight: 1.0 # default remove_unused_columns: false @@ -544,6 +545,19 @@ To see other examples of custom reward functions, please see [TRL GRPO Docs](htt To see description of the configs, please see [TRLConfig](https://github.com/axolotl-ai-cloud/axolotl/blob/main/src/axolotl/utils/config/models/input/v0_4_1/trl.py). +### SimPO + +SimPO uses [CPOTrainer](https://huggingface.co/docs/trl/main/en/cpo_trainer) but with alternative loss function. + +```yaml +rl: simpo +rl_beta: 0.1 # default in CPOTrainer +cpo_alpha: 1.0 # default in CPOTrainer +simpo_gamma: 0.5 # default in CPOTrainer +``` + +This method uses the same dataset format as [DPO](#dpo). + ### Using local dataset files ```yaml diff --git a/src/axolotl/utils/config/models/input/v0_4_1/__init__.py b/src/axolotl/utils/config/models/input/v0_4_1/__init__.py index 921a015d3..b2584cfcd 100644 --- a/src/axolotl/utils/config/models/input/v0_4_1/__init__.py +++ b/src/axolotl/utils/config/models/input/v0_4_1/__init__.py @@ -1,4 +1,5 @@ """Module with Pydantic models for configuration.""" + # pylint: disable=too-many-lines import logging @@ -1827,6 +1828,14 @@ class AxolotlConfigWCapabilities(AxolotlInputConfig): data["torch_compile"] = False return data + @model_validator(mode="before") + @classmethod + def check_beta_and_trl_beta_match(cls, data): + if data.get("beta") and data.get("trl", {}).get("beta"): + if data["beta"] != data["trl"]["beta"]: + raise ValueError("beta and trl.beta must match or one must be removed") + return data + def handle_legacy_message_fields_logic(data: dict) -> dict: """