remove LICENSE and fix README

This commit is contained in:
zeke
2025-04-14 18:33:27 -08:00
committed by Wing Lian
parent c2fc35f520
commit cb7185998b
2 changed files with 128 additions and 170 deletions

View File

@@ -1,21 +0,0 @@
MIT License
Copyright (c) 2023 runpod-workers
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@@ -1,15 +1,5 @@
<h1>LLM Training- Full finetune, LoRA, QLoRa etc. Llama/Mistral/Gemma</h1> <h1>LLM Training- Full finetune, LoRA, QLoRa etc. Llama/Mistral/Gemma</h1>
## RunPod Worker Images
Below is a summary of the available RunPod Worker images, categorized by image stability and CUDA version compatibility.
| Preview Image Tag | Development Image Tag |
-----------------------------------|-----------------------------------|
| `runpod/llm-finetuning:preview` | `runpod/llm-finetuning:dev`
# Configuration Options # Configuration Options
This document outlines all available configuration options for training models. The configuration can be provided as a JSON request. This document outlines all available configuration options for training models. The configuration can be provided as a JSON request.
@@ -19,6 +9,7 @@ This document outlines all available configuration options for training models.
You can use these configuration Options: You can use these configuration Options:
1. As a JSON request body: 1. As a JSON request body:
```json ```json
{ {
"input": { "input": {
@@ -41,187 +32,180 @@ You can use these configuration Options:
### Model Configuration ### Model Configuration
| Option | Description | Default | | Option | Description | Default |
|--------|-------------|---------| | ------------------- | --------------------------------------------------------------------------------------------- | -------------------- |
| `base_model` | Path to the base model (local or HuggingFace) | Required | | `base_model` | Path to the base model (local or HuggingFace) | Required |
| `base_model_config` | Configuration path for the base model | Same as base_model | | `base_model_config` | Configuration path for the base model | Same as base_model |
| `revision_of_model` | Specific model revision from HuggingFace hub | Latest | | `revision_of_model` | Specific model revision from HuggingFace hub | Latest |
| `tokenizer_config` | Custom tokenizer configuration path | Optional | | `tokenizer_config` | Custom tokenizer configuration path | Optional |
| `model_type` | Type of model to load | AutoModelForCausalLM | | `model_type` | Type of model to load | AutoModelForCausalLM |
| `tokenizer_type` | Type of tokenizer to use | AutoTokenizer | | `tokenizer_type` | Type of tokenizer to use | AutoTokenizer |
| `hub_model_id` | Repository ID where the model will be pushed on Hugging Face Hub (format: username/repo-name) | Optional | | `hub_model_id` | Repository ID where the model will be pushed on Hugging Face Hub (format: username/repo-name) | Optional |
## Model Family Identification ## Model Family Identification
| Option | Default | Description | | Option | Default | Description |
|--------|---------|-------------| | -------------------------- | ------- | ------------------------------ |
| `is_falcon_derived_model` | `false` | Whether model is Falcon-based | | `is_falcon_derived_model` | `false` | Whether model is Falcon-based |
| `is_llama_derived_model` | `false` | Whether model is LLaMA-based | | `is_llama_derived_model` | `false` | Whether model is LLaMA-based |
| `is_qwen_derived_model` | `false` | Whether model is Qwen-based | | `is_qwen_derived_model` | `false` | Whether model is Qwen-based |
| `is_mistral_derived_model` | `false` | Whether model is Mistral-based | | `is_mistral_derived_model` | `false` | Whether model is Mistral-based |
## Model Configuration Overrides ## Model Configuration Overrides
| Option | Default | Description | | Option | Default | Description |
|--------|---------|-------------| | ----------------------------------------------- | ---------- | ---------------------------------- |
| `overrides_of_model_config.rope_scaling.type` | `"linear"` | RoPE scaling type (linear/dynamic) | | `overrides_of_model_config.rope_scaling.type` | `"linear"` | RoPE scaling type (linear/dynamic) |
| `overrides_of_model_config.rope_scaling.factor` | `1.0` | RoPE scaling factor | | `overrides_of_model_config.rope_scaling.factor` | `1.0` | RoPE scaling factor |
### Model Loading Options ### Model Loading Options
| Option | Description | Default | | Option | Description | Default |
|--------|-------------|---------| | -------------- | ----------------------------- | ------- |
| `load_in_8bit` | Load model in 8-bit precision | false | | `load_in_8bit` | Load model in 8-bit precision | false |
| `load_in_4bit` | Load model in 4-bit precision | false | | `load_in_4bit` | Load model in 4-bit precision | false |
| `bf16` | Use bfloat16 precision | false | | `bf16` | Use bfloat16 precision | false |
| `fp16` | Use float16 precision | false | | `fp16` | Use float16 precision | false |
| `tf32` | Use tensor float 32 precision | false | | `tf32` | Use tensor float 32 precision | false |
## Memory and Device Settings ## Memory and Device Settings
| Option | Default | Description | | Option | Default | Description |
|--------|---------|-------------| | ------------------ | --------- | ----------------------- |
| `gpu_memory_limit` | `"20GiB"` | GPU memory limit | | `gpu_memory_limit` | `"20GiB"` | GPU memory limit |
| `lora_on_cpu` | `false` | Load LoRA on CPU | | `lora_on_cpu` | `false` | Load LoRA on CPU |
| `device_map` | `"auto"` | Device mapping strategy | | `device_map` | `"auto"` | Device mapping strategy |
| `max_memory` | `null` | Max memory per device | | `max_memory` | `null` | Max memory per device |
## Training Hyperparameters ## Training Hyperparameters
| Option | Default | Description | | Option | Default | Description |
|--------|---------|-------------| | ----------------------------- | --------- | --------------------------- |
| `gradient_accumulation_steps` | `1` | Gradient accumulation steps | | `gradient_accumulation_steps` | `1` | Gradient accumulation steps |
| `micro_batch_size` | `2` | Batch size per GPU | | `micro_batch_size` | `2` | Batch size per GPU |
| `eval_batch_size` | `null` | Evaluation batch size | | `eval_batch_size` | `null` | Evaluation batch size |
| `num_epochs` | `4` | Number of training epochs | | `num_epochs` | `4` | Number of training epochs |
| `warmup_steps` | `100` | Warmup steps | | `warmup_steps` | `100` | Warmup steps |
| `warmup_ratio` | `0.05` | Warmup ratio | | `warmup_ratio` | `0.05` | Warmup ratio |
| `learning_rate` | `0.00003` | Learning rate | | `learning_rate` | `0.00003` | Learning rate |
| `lr_quadratic_warmup` | `false` | Quadratic warmup | | `lr_quadratic_warmup` | `false` | Quadratic warmup |
| `logging_steps` | `null` | Logging frequency | | `logging_steps` | `null` | Logging frequency |
| `eval_steps` | `null` | Evaluation frequency | | `eval_steps` | `null` | Evaluation frequency |
| `evals_per_epoch` | `null` | Evaluations per epoch | | `evals_per_epoch` | `null` | Evaluations per epoch |
| `save_strategy` | `"epoch"` | Checkpoint saving strategy | | `save_strategy` | `"epoch"` | Checkpoint saving strategy |
| `save_steps` | `null` | Saving frequency | | `save_steps` | `null` | Saving frequency |
| `saves_per_epoch` | `null` | Saves per epoch | | `saves_per_epoch` | `null` | Saves per epoch |
| `save_total_limit` | `null` | Maximum checkpoints to keep | | `save_total_limit` | `null` | Maximum checkpoints to keep |
| `max_steps` | `null` | Maximum training steps | | `max_steps` | `null` | Maximum training steps |
### Dataset Configuration ### Dataset Configuration
```yaml ```yaml
datasets: datasets:
- path: vicgalle/alpaca-gpt4 # HuggingFace dataset or TODO: You will be able to add the local path. - path: vicgalle/alpaca-gpt4 # HuggingFace dataset or TODO: You will be able to add the local path.
type: alpaca # Format type (alpaca, gpteacher, oasst, etc.) type: alpaca # Format type (alpaca, gpteacher, oasst, etc.)
ds_type: json # Dataset type ds_type: json # Dataset type
data_files: path/to/data # Source data files data_files: path/to/data # Source data files
train_on_split: train # Dataset split to use train_on_split: train # Dataset split to use
``` ```
## Chat Template Settings ## Chat Template Settings
| Option | Default | Description | | Option | Default | Description |
|--------|---------|-------------| | ------------------------ | -------------------------------- | ---------------------- |
| `chat_template` | `"tokenizer_default"` | Chat template type | | `chat_template` | `"tokenizer_default"` | Chat template type |
| `chat_template_jinja` | `null` | Custom Jinja template | | `chat_template_jinja` | `null` | Custom Jinja template |
| `default_system_message` | `"You are a helpful assistant."` | Default system message | | `default_system_message` | `"You are a helpful assistant."` | Default system message |
## Dataset Processing ## Dataset Processing
| Option | Default | Description | | Option | Default | Description |
|--------|---------|-------------| | ----------------------------- | -------------------------- | --------------------------------- |
| `dataset_prepared_path` | `"data/last_run_prepared"` | Path for prepared dataset | | `dataset_prepared_path` | `"data/last_run_prepared"` | Path for prepared dataset |
| `push_dataset_to_hub` | `""` | Push dataset to HF hub | | `push_dataset_to_hub` | `""` | Push dataset to HF hub |
| `dataset_processes` | `4` | Number of preprocessing processes | | `dataset_processes` | `4` | Number of preprocessing processes |
| `dataset_keep_in_memory` | `false` | Keep dataset in memory | | `dataset_keep_in_memory` | `false` | Keep dataset in memory |
| `shuffle_merged_datasets` | `true` | Shuffle merged datasets | | `shuffle_merged_datasets` | `true` | Shuffle merged datasets |
| `dataset_exact_deduplication` | `true` | Deduplicate datasets | | `dataset_exact_deduplication` | `true` | Deduplicate datasets |
## LoRA Configuration ## LoRA Configuration
| Option | Default | Description | | Option | Default | Description |
|--------|---------|-------------| | -------------------------- | ---------------------- | ------------------------------ |
| `adapter` | `"lora"` | Adapter type (lora/qlora) | | `adapter` | `"lora"` | Adapter type (lora/qlora) |
| `lora_model_dir` | `""` | Directory with pretrained LoRA | | `lora_model_dir` | `""` | Directory with pretrained LoRA |
| `lora_r` | `8` | LoRA attention dimension | | `lora_r` | `8` | LoRA attention dimension |
| `lora_alpha` | `16` | LoRA alpha parameter | | `lora_alpha` | `16` | LoRA alpha parameter |
| `lora_dropout` | `0.05` | LoRA dropout | | `lora_dropout` | `0.05` | LoRA dropout |
| `lora_target_modules` | `["q_proj", "v_proj"]` | Modules to apply LoRA | | `lora_target_modules` | `["q_proj", "v_proj"]` | Modules to apply LoRA |
| `lora_target_linear` | `false` | Target all linear modules | | `lora_target_linear` | `false` | Target all linear modules |
| `peft_layers_to_transform` | `[]` | Layers to transform | | `peft_layers_to_transform` | `[]` | Layers to transform |
| `lora_modules_to_save` | `[]` | Modules to save | | `lora_modules_to_save` | `[]` | Modules to save |
| `lora_fan_in_fan_out` | `false` | Fan in/out structure | | `lora_fan_in_fan_out` | `false` | Fan in/out structure |
## Optimization Settings ## Optimization Settings
| Option | Default | Description | | Option | Default | Description |
|--------|---------|-------------| | ------------------------- | ------- | -------------------------- |
| `train_on_inputs` | `false` | Train on input prompts | | `train_on_inputs` | `false` | Train on input prompts |
| `group_by_length` | `false` | Group by sequence length | | `group_by_length` | `false` | Group by sequence length |
| `gradient_checkpointing` | `false` | Use gradient checkpointing | | `gradient_checkpointing` | `false` | Use gradient checkpointing |
| `early_stopping_patience` | `3` | Early stopping patience | | `early_stopping_patience` | `3` | Early stopping patience |
## Learning Rate Scheduling ## Learning Rate Scheduling
| Option | Default | Description | | Option | Default | Description |
|--------|---------|-------------| | -------------------------- | ---------- | -------------------- |
| `lr_scheduler` | `"cosine"` | Scheduler type | | `lr_scheduler` | `"cosine"` | Scheduler type |
| `lr_scheduler_kwargs` | `{}` | Scheduler parameters | | `lr_scheduler_kwargs` | `{}` | Scheduler parameters |
| `cosine_min_lr_ratio` | `null` | Minimum LR ratio | | `cosine_min_lr_ratio` | `null` | Minimum LR ratio |
| `cosine_constant_lr_ratio` | `null` | Constant LR ratio | | `cosine_constant_lr_ratio` | `null` | Constant LR ratio |
| `lr_div_factor` | `null` | LR division factor | | `lr_div_factor` | `null` | LR division factor |
## Optimizer Settings ## Optimizer Settings
| Option | Default | Description | | Option | Default | Description |
|--------|---------|-------------| | ---------------------- | ------------ | ------------------- |
| `optimizer` | `"adamw_hf"` | Optimizer choice | | `optimizer` | `"adamw_hf"` | Optimizer choice |
| `optim_args` | `{}` | Optimizer arguments | | `optim_args` | `{}` | Optimizer arguments |
| `optim_target_modules` | `[]` | Target modules | | `optim_target_modules` | `[]` | Target modules |
| `weight_decay` | `null` | Weight decay | | `weight_decay` | `null` | Weight decay |
| `adam_beta1` | `null` | Adam beta1 | | `adam_beta1` | `null` | Adam beta1 |
| `adam_beta2` | `null` | Adam beta2 | | `adam_beta2` | `null` | Adam beta2 |
| `adam_epsilon` | `null` | Adam epsilon | | `adam_epsilon` | `null` | Adam epsilon |
| `max_grad_norm` | `null` | Gradient clipping | | `max_grad_norm` | `null` | Gradient clipping |
## Attention Implementations ## Attention Implementations
| Option | Default | Description | | Option | Default | Description |
|--------|---------|-------------| | -------------------------- | ------- | ----------------------------- |
| `flash_optimum` | `false` | Use better transformers | | `flash_optimum` | `false` | Use better transformers |
| `xformers_attention` | `false` | Use xformers | | `xformers_attention` | `false` | Use xformers |
| `flash_attention` | `false` | Use flash attention | | `flash_attention` | `false` | Use flash attention |
| `flash_attn_cross_entropy` | `false` | Flash attention cross entropy | | `flash_attn_cross_entropy` | `false` | Flash attention cross entropy |
| `flash_attn_rms_norm` | `false` | Flash attention RMS norm | | `flash_attn_rms_norm` | `false` | Flash attention RMS norm |
| `flash_attn_fuse_qkv` | `false` | Fuse QKV operations | | `flash_attn_fuse_qkv` | `false` | Fuse QKV operations |
| `flash_attn_fuse_mlp` | `false` | Fuse MLP operations | | `flash_attn_fuse_mlp` | `false` | Fuse MLP operations |
| `sdp_attention` | `false` | Use scaled dot product | | `sdp_attention` | `false` | Use scaled dot product |
| `s2_attention` | `false` | Use shifted sparse attention | | `s2_attention` | `false` | Use shifted sparse attention |
## Tokenizer Modifications ## Tokenizer Modifications
| Option | Default | Description | | Option | Default | Description |
|--------|---------|-------------| | ---------------- | ------- | ---------------------------- |
| `special_tokens` | - | Special tokens to add/modify | | `special_tokens` | - | Special tokens to add/modify |
| `tokens` | `[]` | Additional tokens | | `tokens` | `[]` | Additional tokens |
## Distributed Training ## Distributed Training
| Option | Default | Description | | Option | Default | Description |
|--------|---------|-------------| | ----------------------- | ------- | --------------------- |
| `fsdp` | `null` | FSDP configuration | | `fsdp` | `null` | FSDP configuration |
| `fsdp_config` | `null` | FSDP config options | | `fsdp_config` | `null` | FSDP config options |
| `deepspeed` | `null` | Deepspeed config path | | `deepspeed` | `null` | Deepspeed config path |
| `ddp_timeout` | `null` | DDP timeout | | `ddp_timeout` | `null` | DDP timeout |
| `ddp_bucket_cap_mb` | `null` | DDP bucket capacity | | `ddp_bucket_cap_mb` | `null` | DDP bucket capacity |
| `ddp_broadcast_buffers` | `null` | DDP broadcast buffers | | `ddp_broadcast_buffers` | `null` | DDP broadcast buffers |
<details> <details>
<summary><h3>Example Configuration Request:</h3></summary> <summary><h3>Example Configuration Request:</h3></summary>
@@ -299,20 +283,21 @@ Here's a complete example for fine-tuning a LLaMA model using LoRA:
} }
} }
``` ```
</details> </details>
### Advanced Features ### Advanced Features
#### Wandb Integration #### Wandb Integration
- `wandb_project`: Project name for Weights & Biases - `wandb_project`: Project name for Weights & Biases
- `wandb_entity`: Team name in W&B - `wandb_entity`: Team name in W&B
- `wandb_watch`: Monitor model with W&B - `wandb_watch`: Monitor model with W&B
- `wandb_name`: Name of the W&B run - `wandb_name`: Name of the W&B run
- `wandb_run_id`: ID for the W&B run - `wandb_run_id`: ID for the W&B run
#### Performance Optimization #### Performance Optimization
- `sample_packing`: Enable efficient sequence packing - `sample_packing`: Enable efficient sequence packing
- `eval_sample_packing`: Use sequence packing during evaluation - `eval_sample_packing`: Use sequence packing during evaluation
- `torch_compile`: Enable PyTorch 2.0 compilation - `torch_compile`: Enable PyTorch 2.0 compilation
@@ -336,8 +321,6 @@ The following optimizers are supported:
- `sgd`: Stochastic Gradient Descent - `sgd`: Stochastic Gradient Descent
- `adagrad`: Adagrad optimizer - `adagrad`: Adagrad optimizer
## Notes ## Notes
- Set `load_in_8bit: true` or `load_in_4bit: true` for memory-efficient training - Set `load_in_8bit: true` or `load_in_4bit: true` for memory-efficient training
@@ -347,10 +330,6 @@ The following optimizers are supported:
For more detailed information, please refer to the [documentation](https://axolotl-ai-cloud.github.io/axolotl/docs/config.html). For more detailed information, please refer to the [documentation](https://axolotl-ai-cloud.github.io/axolotl/docs/config.html).
### Errors:
### Errors:
- if you face any issues with the Flash Attention-2, Delete yoor worker and Re-start. - if you face any issues with the Flash Attention-2, Delete yoor worker and Re-start.