quartodoc integration

2025-03-14 16:16:07 +00:00
parent c907ac173e
commit e4fd7aad0b
18 changed files with 1005 additions and 1 deletions
--- a/docs/api/train.qmd
+++ b/docs/api/train.qmd
@@ -0,0 +1,199 @@
+# train { #axolotl.train }
+
+`train`
+
+Prepare and train a model on a dataset. Can also infer from a model or merge lora
+
+## Functions
+
+| Name | Description |
+| --- | --- |
+| [create_model_card](#axolotl.train.create_model_card) | Create a model card for the trained model if needed. |
+| [determine_resume_checkpoint](#axolotl.train.determine_resume_checkpoint) | Determine the checkpoint to resume from based on configuration. |
+| [execute_training](#axolotl.train.execute_training) | Execute the training process with appropriate backend configurations. |
+| [handle_untrained_tokens_fix](#axolotl.train.handle_untrained_tokens_fix) | Apply fixes for untrained tokens if configured. |
+| [save_initial_configs](#axolotl.train.save_initial_configs) | Save initial configurations before training. |
+| [save_trained_model](#axolotl.train.save_trained_model) | Save the trained model according to configuration and training setup. |
+| [setup_model_and_tokenizer](#axolotl.train.setup_model_and_tokenizer) | Load the tokenizer, processor (for multimodal models), and model based on configuration. |
+| [setup_model_and_trainer](#axolotl.train.setup_model_and_trainer) | Load model, tokenizer, trainer, etc. Helper function to encapsulate the full |
+| [setup_model_card](#axolotl.train.setup_model_card) | Set up the Axolotl badge and add the Axolotl config to the model card if available. |
+| [setup_reference_model](#axolotl.train.setup_reference_model) | Set up the reference model for RL training if needed. |
+| [setup_signal_handler](#axolotl.train.setup_signal_handler) | Set up signal handler for graceful termination. |
+| [train](#axolotl.train.train) | Train a model on the given dataset. |
+
+### create_model_card { #axolotl.train.create_model_card }
+
+```python
+train.create_model_card(cfg, trainer)
+```
+
+Create a model card for the trained model if needed.
+
+Args:
+    cfg: Dictionary mapping `axolotl` config keys to values.
+    trainer: The trainer object with model card creation capabilities.
+
+### determine_resume_checkpoint { #axolotl.train.determine_resume_checkpoint }
+
+```python
+train.determine_resume_checkpoint(cfg)
+```
+
+Determine the checkpoint to resume from based on configuration.
+
+Args:
+    cfg: Dictionary mapping `axolotl` config keys to values.
+
+Returns:
+    Path to the checkpoint to resume from, or `None` if not resuming.
+
+### execute_training { #axolotl.train.execute_training }
+
+```python
+train.execute_training(cfg, trainer, resume_from_checkpoint)
+```
+
+Execute the training process with appropriate backend configurations.
+
+Args:
+    cfg: Dictionary mapping `axolotl` config keys to values.
+    trainer: The configured trainer object.
+    resume_from_checkpoint: Path to checkpoint to resume from, if applicable.
+
+### handle_untrained_tokens_fix { #axolotl.train.handle_untrained_tokens_fix }
+
+```python
+train.handle_untrained_tokens_fix(
+    cfg,
+    model,
+    tokenizer,
+    train_dataset,
+    safe_serialization,
+)
+```
+
+Apply fixes for untrained tokens if configured.
+
+Args:
+    cfg: Dictionary mapping `axolotl` config keys to values.
+    model: The model to apply fixes to.
+    tokenizer: The tokenizer for token identification.
+    train_dataset: The training dataset to use.
+    safe_serialization: Whether to use safe serialization when saving.
+
+### save_initial_configs { #axolotl.train.save_initial_configs }
+
+```python
+train.save_initial_configs(cfg, tokenizer, model, peft_config)
+```
+
+Save initial configurations before training.
+
+Args:
+    cfg: Dictionary mapping `axolotl` config keys to values.
+    tokenizer: The tokenizer to save.
+    model: The model to save configuration for.
+    peft_config: The PEFT configuration to save if applicable.
+
+### save_trained_model { #axolotl.train.save_trained_model }
+
+```python
+train.save_trained_model(cfg, trainer, model, safe_serialization)
+```
+
+Save the trained model according to configuration and training setup.
+
+Args:
+    cfg: Dictionary mapping `axolotl` config keys to values.
+    trainer: The trainer object.
+    model: The trained model to save.
+    safe_serialization: Whether to use safe serialization.
+
+### setup_model_and_tokenizer { #axolotl.train.setup_model_and_tokenizer }
+
+```python
+train.setup_model_and_tokenizer(cfg)
+```
+
+Load the tokenizer, processor (for multimodal models), and model based on configuration.
+
+Args:
+    cfg: Dictionary mapping `axolotl` config keys to values.
+
+Returns:
+    Tuple containing model, tokenizer, `peft_config` (if LoRA / QLoRA, else
+        `None`), and processor (if multimodal, else `None`).
+
+### setup_model_and_trainer { #axolotl.train.setup_model_and_trainer }
+
+```python
+train.setup_model_and_trainer(cfg, dataset_meta)
+```
+
+Load model, tokenizer, trainer, etc. Helper function to encapsulate the full
+trainer setup.
+
+Args:
+    cfg: The configuration dictionary with training parameters.
+    dataset_meta: Object with training, validation datasets and metadata.
+
+Returns:
+    Tuple of:
+        - Trainer (Causal or RLHF)
+        - Model
+        - Tokenizer
+        - PEFT config
+
+### setup_model_card { #axolotl.train.setup_model_card }
+
+```python
+train.setup_model_card(cfg)
+```
+
+Set up the Axolotl badge and add the Axolotl config to the model card if available.
+
+Args:
+    cfg: Dictionary mapping `axolotl` config keys to values.
+
+### setup_reference_model { #axolotl.train.setup_reference_model }
+
+```python
+train.setup_reference_model(cfg, tokenizer)
+```
+
+Set up the reference model for RL training if needed.
+
+Args:
+    cfg: Dictionary mapping `axolotl` config keys to values.
+    tokenizer: The tokenizer to use for the reference model.
+
+Returns:
+    Reference model if needed for RL training, `None` otherwise.
+
+### setup_signal_handler { #axolotl.train.setup_signal_handler }
+
+```python
+train.setup_signal_handler(cfg, model, safe_serialization)
+```
+
+Set up signal handler for graceful termination.
+
+Args:
+    cfg: Dictionary mapping `axolotl` config keys to values.
+    model: The model to save on termination
+    safe_serialization: Whether to use safe serialization when saving
+
+### train { #axolotl.train.train }
+
+```python
+train.train(cfg, dataset_meta)
+```
+
+Train a model on the given dataset.
+
+Args:
+    cfg: The configuration dictionary with training parameters
+    dataset_meta: Object with training, validation datasets and metadata
+
+Returns:
+    Tuple of (model, tokenizer) after training