remove LICENSE and fix README
This commit is contained in:
@@ -1,21 +0,0 @@
|
|||||||
MIT License
|
|
||||||
|
|
||||||
Copyright (c) 2023 runpod-workers
|
|
||||||
|
|
||||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
||||||
of this software and associated documentation files (the "Software"), to deal
|
|
||||||
in the Software without restriction, including without limitation the rights
|
|
||||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
||||||
copies of the Software, and to permit persons to whom the Software is
|
|
||||||
furnished to do so, subject to the following conditions:
|
|
||||||
|
|
||||||
The above copyright notice and this permission notice shall be included in all
|
|
||||||
copies or substantial portions of the Software.
|
|
||||||
|
|
||||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
||||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
||||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
||||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
||||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
||||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
||||||
SOFTWARE.
|
|
||||||
@@ -1,15 +1,5 @@
|
|||||||
|
|
||||||
|
|
||||||
<h1>LLM Training- Full finetune, LoRA, QLoRa etc. Llama/Mistral/Gemma</h1>
|
<h1>LLM Training- Full finetune, LoRA, QLoRa etc. Llama/Mistral/Gemma</h1>
|
||||||
|
|
||||||
## RunPod Worker Images
|
|
||||||
|
|
||||||
Below is a summary of the available RunPod Worker images, categorized by image stability and CUDA version compatibility.
|
|
||||||
|
|
||||||
| Preview Image Tag | Development Image Tag |
|
|
||||||
-----------------------------------|-----------------------------------|
|
|
||||||
| `runpod/llm-finetuning:preview` | `runpod/llm-finetuning:dev`
|
|
||||||
|
|
||||||
# Configuration Options
|
# Configuration Options
|
||||||
|
|
||||||
This document outlines all available configuration options for training models. The configuration can be provided as a JSON request.
|
This document outlines all available configuration options for training models. The configuration can be provided as a JSON request.
|
||||||
@@ -19,6 +9,7 @@ This document outlines all available configuration options for training models.
|
|||||||
You can use these configuration Options:
|
You can use these configuration Options:
|
||||||
|
|
||||||
1. As a JSON request body:
|
1. As a JSON request body:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"input": {
|
"input": {
|
||||||
@@ -42,7 +33,7 @@ You can use these configuration Options:
|
|||||||
### Model Configuration
|
### Model Configuration
|
||||||
|
|
||||||
| Option | Description | Default |
|
| Option | Description | Default |
|
||||||
|--------|-------------|---------|
|
| ------------------- | --------------------------------------------------------------------------------------------- | -------------------- |
|
||||||
| `base_model` | Path to the base model (local or HuggingFace) | Required |
|
| `base_model` | Path to the base model (local or HuggingFace) | Required |
|
||||||
| `base_model_config` | Configuration path for the base model | Same as base_model |
|
| `base_model_config` | Configuration path for the base model | Same as base_model |
|
||||||
| `revision_of_model` | Specific model revision from HuggingFace hub | Latest |
|
| `revision_of_model` | Specific model revision from HuggingFace hub | Latest |
|
||||||
@@ -51,12 +42,10 @@ You can use these configuration Options:
|
|||||||
| `tokenizer_type` | Type of tokenizer to use | AutoTokenizer |
|
| `tokenizer_type` | Type of tokenizer to use | AutoTokenizer |
|
||||||
| `hub_model_id` | Repository ID where the model will be pushed on Hugging Face Hub (format: username/repo-name) | Optional |
|
| `hub_model_id` | Repository ID where the model will be pushed on Hugging Face Hub (format: username/repo-name) | Optional |
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## Model Family Identification
|
## Model Family Identification
|
||||||
|
|
||||||
| Option | Default | Description |
|
| Option | Default | Description |
|
||||||
|--------|---------|-------------|
|
| -------------------------- | ------- | ------------------------------ |
|
||||||
| `is_falcon_derived_model` | `false` | Whether model is Falcon-based |
|
| `is_falcon_derived_model` | `false` | Whether model is Falcon-based |
|
||||||
| `is_llama_derived_model` | `false` | Whether model is LLaMA-based |
|
| `is_llama_derived_model` | `false` | Whether model is LLaMA-based |
|
||||||
| `is_qwen_derived_model` | `false` | Whether model is Qwen-based |
|
| `is_qwen_derived_model` | `false` | Whether model is Qwen-based |
|
||||||
@@ -65,25 +54,24 @@ You can use these configuration Options:
|
|||||||
## Model Configuration Overrides
|
## Model Configuration Overrides
|
||||||
|
|
||||||
| Option | Default | Description |
|
| Option | Default | Description |
|
||||||
|--------|---------|-------------|
|
| ----------------------------------------------- | ---------- | ---------------------------------- |
|
||||||
| `overrides_of_model_config.rope_scaling.type` | `"linear"` | RoPE scaling type (linear/dynamic) |
|
| `overrides_of_model_config.rope_scaling.type` | `"linear"` | RoPE scaling type (linear/dynamic) |
|
||||||
| `overrides_of_model_config.rope_scaling.factor` | `1.0` | RoPE scaling factor |
|
| `overrides_of_model_config.rope_scaling.factor` | `1.0` | RoPE scaling factor |
|
||||||
|
|
||||||
### Model Loading Options
|
### Model Loading Options
|
||||||
|
|
||||||
| Option | Description | Default |
|
| Option | Description | Default |
|
||||||
|--------|-------------|---------|
|
| -------------- | ----------------------------- | ------- |
|
||||||
| `load_in_8bit` | Load model in 8-bit precision | false |
|
| `load_in_8bit` | Load model in 8-bit precision | false |
|
||||||
| `load_in_4bit` | Load model in 4-bit precision | false |
|
| `load_in_4bit` | Load model in 4-bit precision | false |
|
||||||
| `bf16` | Use bfloat16 precision | false |
|
| `bf16` | Use bfloat16 precision | false |
|
||||||
| `fp16` | Use float16 precision | false |
|
| `fp16` | Use float16 precision | false |
|
||||||
| `tf32` | Use tensor float 32 precision | false |
|
| `tf32` | Use tensor float 32 precision | false |
|
||||||
|
|
||||||
|
|
||||||
## Memory and Device Settings
|
## Memory and Device Settings
|
||||||
|
|
||||||
| Option | Default | Description |
|
| Option | Default | Description |
|
||||||
|--------|---------|-------------|
|
| ------------------ | --------- | ----------------------- |
|
||||||
| `gpu_memory_limit` | `"20GiB"` | GPU memory limit |
|
| `gpu_memory_limit` | `"20GiB"` | GPU memory limit |
|
||||||
| `lora_on_cpu` | `false` | Load LoRA on CPU |
|
| `lora_on_cpu` | `false` | Load LoRA on CPU |
|
||||||
| `device_map` | `"auto"` | Device mapping strategy |
|
| `device_map` | `"auto"` | Device mapping strategy |
|
||||||
@@ -92,7 +80,7 @@ You can use these configuration Options:
|
|||||||
## Training Hyperparameters
|
## Training Hyperparameters
|
||||||
|
|
||||||
| Option | Default | Description |
|
| Option | Default | Description |
|
||||||
|--------|---------|-------------|
|
| ----------------------------- | --------- | --------------------------- |
|
||||||
| `gradient_accumulation_steps` | `1` | Gradient accumulation steps |
|
| `gradient_accumulation_steps` | `1` | Gradient accumulation steps |
|
||||||
| `micro_batch_size` | `2` | Batch size per GPU |
|
| `micro_batch_size` | `2` | Batch size per GPU |
|
||||||
| `eval_batch_size` | `null` | Evaluation batch size |
|
| `eval_batch_size` | `null` | Evaluation batch size |
|
||||||
@@ -121,11 +109,10 @@ datasets:
|
|||||||
train_on_split: train # Dataset split to use
|
train_on_split: train # Dataset split to use
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## Chat Template Settings
|
## Chat Template Settings
|
||||||
|
|
||||||
| Option | Default | Description |
|
| Option | Default | Description |
|
||||||
|--------|---------|-------------|
|
| ------------------------ | -------------------------------- | ---------------------- |
|
||||||
| `chat_template` | `"tokenizer_default"` | Chat template type |
|
| `chat_template` | `"tokenizer_default"` | Chat template type |
|
||||||
| `chat_template_jinja` | `null` | Custom Jinja template |
|
| `chat_template_jinja` | `null` | Custom Jinja template |
|
||||||
| `default_system_message` | `"You are a helpful assistant."` | Default system message |
|
| `default_system_message` | `"You are a helpful assistant."` | Default system message |
|
||||||
@@ -133,7 +120,7 @@ datasets:
|
|||||||
## Dataset Processing
|
## Dataset Processing
|
||||||
|
|
||||||
| Option | Default | Description |
|
| Option | Default | Description |
|
||||||
|--------|---------|-------------|
|
| ----------------------------- | -------------------------- | --------------------------------- |
|
||||||
| `dataset_prepared_path` | `"data/last_run_prepared"` | Path for prepared dataset |
|
| `dataset_prepared_path` | `"data/last_run_prepared"` | Path for prepared dataset |
|
||||||
| `push_dataset_to_hub` | `""` | Push dataset to HF hub |
|
| `push_dataset_to_hub` | `""` | Push dataset to HF hub |
|
||||||
| `dataset_processes` | `4` | Number of preprocessing processes |
|
| `dataset_processes` | `4` | Number of preprocessing processes |
|
||||||
@@ -144,7 +131,7 @@ datasets:
|
|||||||
## LoRA Configuration
|
## LoRA Configuration
|
||||||
|
|
||||||
| Option | Default | Description |
|
| Option | Default | Description |
|
||||||
|--------|---------|-------------|
|
| -------------------------- | ---------------------- | ------------------------------ |
|
||||||
| `adapter` | `"lora"` | Adapter type (lora/qlora) |
|
| `adapter` | `"lora"` | Adapter type (lora/qlora) |
|
||||||
| `lora_model_dir` | `""` | Directory with pretrained LoRA |
|
| `lora_model_dir` | `""` | Directory with pretrained LoRA |
|
||||||
| `lora_r` | `8` | LoRA attention dimension |
|
| `lora_r` | `8` | LoRA attention dimension |
|
||||||
@@ -156,11 +143,10 @@ datasets:
|
|||||||
| `lora_modules_to_save` | `[]` | Modules to save |
|
| `lora_modules_to_save` | `[]` | Modules to save |
|
||||||
| `lora_fan_in_fan_out` | `false` | Fan in/out structure |
|
| `lora_fan_in_fan_out` | `false` | Fan in/out structure |
|
||||||
|
|
||||||
|
|
||||||
## Optimization Settings
|
## Optimization Settings
|
||||||
|
|
||||||
| Option | Default | Description |
|
| Option | Default | Description |
|
||||||
|--------|---------|-------------|
|
| ------------------------- | ------- | -------------------------- |
|
||||||
| `train_on_inputs` | `false` | Train on input prompts |
|
| `train_on_inputs` | `false` | Train on input prompts |
|
||||||
| `group_by_length` | `false` | Group by sequence length |
|
| `group_by_length` | `false` | Group by sequence length |
|
||||||
| `gradient_checkpointing` | `false` | Use gradient checkpointing |
|
| `gradient_checkpointing` | `false` | Use gradient checkpointing |
|
||||||
@@ -169,7 +155,7 @@ datasets:
|
|||||||
## Learning Rate Scheduling
|
## Learning Rate Scheduling
|
||||||
|
|
||||||
| Option | Default | Description |
|
| Option | Default | Description |
|
||||||
|--------|---------|-------------|
|
| -------------------------- | ---------- | -------------------- |
|
||||||
| `lr_scheduler` | `"cosine"` | Scheduler type |
|
| `lr_scheduler` | `"cosine"` | Scheduler type |
|
||||||
| `lr_scheduler_kwargs` | `{}` | Scheduler parameters |
|
| `lr_scheduler_kwargs` | `{}` | Scheduler parameters |
|
||||||
| `cosine_min_lr_ratio` | `null` | Minimum LR ratio |
|
| `cosine_min_lr_ratio` | `null` | Minimum LR ratio |
|
||||||
@@ -179,7 +165,7 @@ datasets:
|
|||||||
## Optimizer Settings
|
## Optimizer Settings
|
||||||
|
|
||||||
| Option | Default | Description |
|
| Option | Default | Description |
|
||||||
|--------|---------|-------------|
|
| ---------------------- | ------------ | ------------------- |
|
||||||
| `optimizer` | `"adamw_hf"` | Optimizer choice |
|
| `optimizer` | `"adamw_hf"` | Optimizer choice |
|
||||||
| `optim_args` | `{}` | Optimizer arguments |
|
| `optim_args` | `{}` | Optimizer arguments |
|
||||||
| `optim_target_modules` | `[]` | Target modules |
|
| `optim_target_modules` | `[]` | Target modules |
|
||||||
@@ -192,7 +178,7 @@ datasets:
|
|||||||
## Attention Implementations
|
## Attention Implementations
|
||||||
|
|
||||||
| Option | Default | Description |
|
| Option | Default | Description |
|
||||||
|--------|---------|-------------|
|
| -------------------------- | ------- | ----------------------------- |
|
||||||
| `flash_optimum` | `false` | Use better transformers |
|
| `flash_optimum` | `false` | Use better transformers |
|
||||||
| `xformers_attention` | `false` | Use xformers |
|
| `xformers_attention` | `false` | Use xformers |
|
||||||
| `flash_attention` | `false` | Use flash attention |
|
| `flash_attention` | `false` | Use flash attention |
|
||||||
@@ -203,18 +189,17 @@ datasets:
|
|||||||
| `sdp_attention` | `false` | Use scaled dot product |
|
| `sdp_attention` | `false` | Use scaled dot product |
|
||||||
| `s2_attention` | `false` | Use shifted sparse attention |
|
| `s2_attention` | `false` | Use shifted sparse attention |
|
||||||
|
|
||||||
|
|
||||||
## Tokenizer Modifications
|
## Tokenizer Modifications
|
||||||
|
|
||||||
| Option | Default | Description |
|
| Option | Default | Description |
|
||||||
|--------|---------|-------------|
|
| ---------------- | ------- | ---------------------------- |
|
||||||
| `special_tokens` | - | Special tokens to add/modify |
|
| `special_tokens` | - | Special tokens to add/modify |
|
||||||
| `tokens` | `[]` | Additional tokens |
|
| `tokens` | `[]` | Additional tokens |
|
||||||
|
|
||||||
## Distributed Training
|
## Distributed Training
|
||||||
|
|
||||||
| Option | Default | Description |
|
| Option | Default | Description |
|
||||||
|--------|---------|-------------|
|
| ----------------------- | ------- | --------------------- |
|
||||||
| `fsdp` | `null` | FSDP configuration |
|
| `fsdp` | `null` | FSDP configuration |
|
||||||
| `fsdp_config` | `null` | FSDP config options |
|
| `fsdp_config` | `null` | FSDP config options |
|
||||||
| `deepspeed` | `null` | Deepspeed config path |
|
| `deepspeed` | `null` | Deepspeed config path |
|
||||||
@@ -222,7 +207,6 @@ datasets:
|
|||||||
| `ddp_bucket_cap_mb` | `null` | DDP bucket capacity |
|
| `ddp_bucket_cap_mb` | `null` | DDP bucket capacity |
|
||||||
| `ddp_broadcast_buffers` | `null` | DDP broadcast buffers |
|
| `ddp_broadcast_buffers` | `null` | DDP broadcast buffers |
|
||||||
|
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
<summary><h3>Example Configuration Request:</h3></summary>
|
<summary><h3>Example Configuration Request:</h3></summary>
|
||||||
|
|
||||||
@@ -299,20 +283,21 @@ Here's a complete example for fine-tuning a LLaMA model using LoRA:
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
### Advanced Features
|
### Advanced Features
|
||||||
|
|
||||||
#### Wandb Integration
|
#### Wandb Integration
|
||||||
|
|
||||||
- `wandb_project`: Project name for Weights & Biases
|
- `wandb_project`: Project name for Weights & Biases
|
||||||
- `wandb_entity`: Team name in W&B
|
- `wandb_entity`: Team name in W&B
|
||||||
- `wandb_watch`: Monitor model with W&B
|
- `wandb_watch`: Monitor model with W&B
|
||||||
- `wandb_name`: Name of the W&B run
|
- `wandb_name`: Name of the W&B run
|
||||||
- `wandb_run_id`: ID for the W&B run
|
- `wandb_run_id`: ID for the W&B run
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
#### Performance Optimization
|
#### Performance Optimization
|
||||||
|
|
||||||
- `sample_packing`: Enable efficient sequence packing
|
- `sample_packing`: Enable efficient sequence packing
|
||||||
- `eval_sample_packing`: Use sequence packing during evaluation
|
- `eval_sample_packing`: Use sequence packing during evaluation
|
||||||
- `torch_compile`: Enable PyTorch 2.0 compilation
|
- `torch_compile`: Enable PyTorch 2.0 compilation
|
||||||
@@ -336,8 +321,6 @@ The following optimizers are supported:
|
|||||||
- `sgd`: Stochastic Gradient Descent
|
- `sgd`: Stochastic Gradient Descent
|
||||||
- `adagrad`: Adagrad optimizer
|
- `adagrad`: Adagrad optimizer
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
|
|
||||||
- Set `load_in_8bit: true` or `load_in_4bit: true` for memory-efficient training
|
- Set `load_in_8bit: true` or `load_in_4bit: true` for memory-efficient training
|
||||||
@@ -347,10 +330,6 @@ The following optimizers are supported:
|
|||||||
|
|
||||||
For more detailed information, please refer to the [documentation](https://axolotl-ai-cloud.github.io/axolotl/docs/config.html).
|
For more detailed information, please refer to the [documentation](https://axolotl-ai-cloud.github.io/axolotl/docs/config.html).
|
||||||
|
|
||||||
|
|
||||||
### Errors:
|
### Errors:
|
||||||
|
|
||||||
- if you face any issues with the Flash Attention-2, Delete yoor worker and Re-start.
|
- if you face any issues with the Flash Attention-2, Delete yoor worker and Re-start.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user