remove LICENSE and fix README

2025-04-14 18:33:27 -08:00
parent c2fc35f520
commit cb7185998b
2 changed files with 128 additions and 170 deletions
--- a/.runpod/LICENSE
+++ b/.runpod/LICENSE
@@ -1,21 +0,0 @@
 MIT License
 Copyright (c) 2023 runpod-workers
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
--- a/.runpod/README.md
+++ b/.runpod/README.md
@@ -1,15 +1,5 @@
 <h1>LLM Training- Full finetune, LoRA, QLoRa etc. Llama/Mistral/Gemma</h1>
 ## RunPod Worker Images
 Below is a summary of the available RunPod Worker images, categorized by image stability and CUDA version compatibility.
 | Preview Image Tag                  | Development Image Tag             |
 -----------------------------------|-----------------------------------|
 | `runpod/llm-finetuning:preview` | `runpod/llm-finetuning:dev` 
 # Configuration Options
 This document outlines all available configuration options for training models. The configuration can be provided as a JSON request.
@@ -19,6 +9,7 @@ This document outlines all available configuration options for training models.
 You can use these configuration Options:
 1. As a JSON request body:
 ```json
 {
  "input": {
@@ -41,187 +32,180 @@ You can use these configuration Options:
 ### Model Configuration
-| Option | Description | Default |
+| Option              | Description                                                                                   | Default              |
-|--------|-------------|---------|
+| ------------------- | --------------------------------------------------------------------------------------------- | -------------------- |
-| `base_model` | Path to the base model (local or HuggingFace) | Required |
+| `base_model`        | Path to the base model (local or HuggingFace)                                                 | Required             |
-| `base_model_config` | Configuration path for the base model | Same as base_model |
+| `base_model_config` | Configuration path for the base model                                                         | Same as base_model   |
-| `revision_of_model` | Specific model revision from HuggingFace hub | Latest |
+| `revision_of_model` | Specific model revision from HuggingFace hub                                                  | Latest               |
-| `tokenizer_config` | Custom tokenizer configuration path | Optional |
+| `tokenizer_config`  | Custom tokenizer configuration path                                                           | Optional             |
-| `model_type` | Type of model to load | AutoModelForCausalLM |
+| `model_type`        | Type of model to load                                                                         | AutoModelForCausalLM |
-| `tokenizer_type` | Type of tokenizer to use | AutoTokenizer |
+| `tokenizer_type`    | Type of tokenizer to use                                                                      | AutoTokenizer        |
-| `hub_model_id` | Repository ID where the model will be pushed on Hugging Face Hub (format: username/repo-name) | Optional |
+| `hub_model_id`      | Repository ID where the model will be pushed on Hugging Face Hub (format: username/repo-name) | Optional             |
 ## Model Family Identification
-| Option | Default | Description |
+| Option                     | Default | Description                    |
-|--------|---------|-------------|
+| -------------------------- | ------- | ------------------------------ |
-| `is_falcon_derived_model` | `false` | Whether model is Falcon-based |
+| `is_falcon_derived_model`  | `false` | Whether model is Falcon-based  |
-| `is_llama_derived_model` | `false` | Whether model is LLaMA-based |
+| `is_llama_derived_model`   | `false` | Whether model is LLaMA-based   |
-| `is_qwen_derived_model` | `false` | Whether model is Qwen-based |
+| `is_qwen_derived_model`    | `false` | Whether model is Qwen-based    |
 | `is_mistral_derived_model` | `false` | Whether model is Mistral-based |
 ## Model Configuration Overrides
-| Option | Default | Description |
+| Option                                          | Default    | Description                        |
-|--------|---------|-------------|
+| ----------------------------------------------- | ---------- | ---------------------------------- |
-| `overrides_of_model_config.rope_scaling.type` | `"linear"` | RoPE scaling type (linear/dynamic) |
+| `overrides_of_model_config.rope_scaling.type`   | `"linear"` | RoPE scaling type (linear/dynamic) |
-| `overrides_of_model_config.rope_scaling.factor` | `1.0` | RoPE scaling factor |
+| `overrides_of_model_config.rope_scaling.factor` | `1.0`      | RoPE scaling factor                |
 ### Model Loading Options
-| Option | Description | Default |
+| Option         | Description                   | Default |
-|--------|-------------|---------|
+| -------------- | ----------------------------- | ------- |
-| `load_in_8bit` | Load model in 8-bit precision | false |
+| `load_in_8bit` | Load model in 8-bit precision | false   |
-| `load_in_4bit` | Load model in 4-bit precision | false |
+| `load_in_4bit` | Load model in 4-bit precision | false   |
-| `bf16` | Use bfloat16 precision | false |
+| `bf16`         | Use bfloat16 precision        | false   |
-| `fp16` | Use float16 precision | false |
+| `fp16`         | Use float16 precision         | false   |
-| `tf32` | Use tensor float 32 precision | false |
+| `tf32`         | Use tensor float 32 precision | false   |
 ## Memory and Device Settings
-| Option | Default | Description |
+| Option             | Default   | Description             |
-|--------|---------|-------------|
+| ------------------ | --------- | ----------------------- |
-| `gpu_memory_limit` | `"20GiB"` | GPU memory limit |
+| `gpu_memory_limit` | `"20GiB"` | GPU memory limit        |
-| `lora_on_cpu` | `false` | Load LoRA on CPU |
+| `lora_on_cpu`      | `false`   | Load LoRA on CPU        |
-| `device_map` | `"auto"` | Device mapping strategy |
+| `device_map`       | `"auto"`  | Device mapping strategy |
-| `max_memory` | `null` | Max memory per device |
+| `max_memory`       | `null`    | Max memory per device   |
 ## Training Hyperparameters
-| Option | Default | Description |
+| Option                        | Default   | Description                 |
-|--------|---------|-------------|
+| ----------------------------- | --------- | --------------------------- |
-| `gradient_accumulation_steps` | `1` | Gradient accumulation steps |
+| `gradient_accumulation_steps` | `1`       | Gradient accumulation steps |
-| `micro_batch_size` | `2` | Batch size per GPU |
+| `micro_batch_size`            | `2`       | Batch size per GPU          |
-| `eval_batch_size` | `null` | Evaluation batch size |
+| `eval_batch_size`             | `null`    | Evaluation batch size       |
-| `num_epochs` | `4` | Number of training epochs |
+| `num_epochs`                  | `4`       | Number of training epochs   |
-| `warmup_steps` | `100` | Warmup steps |
+| `warmup_steps`                | `100`     | Warmup steps                |
-| `warmup_ratio` | `0.05` | Warmup ratio |
+| `warmup_ratio`                | `0.05`    | Warmup ratio                |
-| `learning_rate` | `0.00003` | Learning rate |
+| `learning_rate`               | `0.00003` | Learning rate               |
-| `lr_quadratic_warmup` | `false` | Quadratic warmup |
+| `lr_quadratic_warmup`         | `false`   | Quadratic warmup            |
-| `logging_steps` | `null` | Logging frequency |
+| `logging_steps`               | `null`    | Logging frequency           |
-| `eval_steps` | `null` | Evaluation frequency |
+| `eval_steps`                  | `null`    | Evaluation frequency        |
-| `evals_per_epoch` | `null` | Evaluations per epoch |
+| `evals_per_epoch`             | `null`    | Evaluations per epoch       |
-| `save_strategy` | `"epoch"` | Checkpoint saving strategy |
+| `save_strategy`               | `"epoch"` | Checkpoint saving strategy  |
-| `save_steps` | `null` | Saving frequency |
+| `save_steps`                  | `null`    | Saving frequency            |
-| `saves_per_epoch` | `null` | Saves per epoch |
+| `saves_per_epoch`             | `null`    | Saves per epoch             |
-| `save_total_limit` | `null` | Maximum checkpoints to keep |
+| `save_total_limit`            | `null`    | Maximum checkpoints to keep |
-| `max_steps` | `null` | Maximum training steps |
+| `max_steps`                   | `null`    | Maximum training steps      |
 ### Dataset Configuration
 ```yaml
 datasets:
-  - path: vicgalle/alpaca-gpt4  # HuggingFace dataset or TODO: You will be able to add the local path. 
+  - path: vicgalle/alpaca-gpt4 # HuggingFace dataset or TODO: You will be able to add the local path.
-    type: alpaca               # Format type (alpaca, gpteacher, oasst, etc.)
+    type: alpaca # Format type (alpaca, gpteacher, oasst, etc.)
-    ds_type: json             # Dataset type
+    ds_type: json # Dataset type
-    data_files: path/to/data  # Source data files
+    data_files: path/to/data # Source data files
-    train_on_split: train     # Dataset split to use
+    train_on_split: train # Dataset split to use
 ```
 ## Chat Template Settings
-| Option | Default | Description |
+| Option                   | Default                          | Description            |
-|--------|---------|-------------|
+| ------------------------ | -------------------------------- | ---------------------- |
-| `chat_template` | `"tokenizer_default"` | Chat template type |
+| `chat_template`          | `"tokenizer_default"`            | Chat template type     |
-| `chat_template_jinja` | `null` | Custom Jinja template |
+| `chat_template_jinja`    | `null`                           | Custom Jinja template  |
 | `default_system_message` | `"You are a helpful assistant."` | Default system message |
 ## Dataset Processing
-| Option | Default | Description |
+| Option                        | Default                    | Description                       |
-|--------|---------|-------------|
+| ----------------------------- | -------------------------- | --------------------------------- |
-| `dataset_prepared_path` | `"data/last_run_prepared"` | Path for prepared dataset |
+| `dataset_prepared_path`       | `"data/last_run_prepared"` | Path for prepared dataset         |
-| `push_dataset_to_hub` | `""` | Push dataset to HF hub |
+| `push_dataset_to_hub`         | `""`                       | Push dataset to HF hub            |
-| `dataset_processes` | `4` | Number of preprocessing processes |
+| `dataset_processes`           | `4`                        | Number of preprocessing processes |
-| `dataset_keep_in_memory` | `false` | Keep dataset in memory |
+| `dataset_keep_in_memory`      | `false`                    | Keep dataset in memory            |
-| `shuffle_merged_datasets` | `true` | Shuffle merged datasets |
+| `shuffle_merged_datasets`     | `true`                     | Shuffle merged datasets           |
-| `dataset_exact_deduplication` | `true` | Deduplicate datasets |
+| `dataset_exact_deduplication` | `true`                     | Deduplicate datasets              |
 ## LoRA Configuration
-| Option | Default | Description |
+| Option                     | Default                | Description                    |
-|--------|---------|-------------|
+| -------------------------- | ---------------------- | ------------------------------ |
-| `adapter` | `"lora"` | Adapter type (lora/qlora) |
+| `adapter`                  | `"lora"`               | Adapter type (lora/qlora)      |
-| `lora_model_dir` | `""` | Directory with pretrained LoRA |
+| `lora_model_dir`           | `""`                   | Directory with pretrained LoRA |
-| `lora_r` | `8` | LoRA attention dimension |
+| `lora_r`                   | `8`                    | LoRA attention dimension       |
-| `lora_alpha` | `16` | LoRA alpha parameter |
+| `lora_alpha`               | `16`                   | LoRA alpha parameter           |
-| `lora_dropout` | `0.05` | LoRA dropout |
+| `lora_dropout`             | `0.05`                 | LoRA dropout                   |
-| `lora_target_modules` | `["q_proj", "v_proj"]` | Modules to apply LoRA |
+| `lora_target_modules`      | `["q_proj", "v_proj"]` | Modules to apply LoRA          |
-| `lora_target_linear` | `false` | Target all linear modules |
+| `lora_target_linear`       | `false`                | Target all linear modules      |
-| `peft_layers_to_transform` | `[]` | Layers to transform |
+| `peft_layers_to_transform` | `[]`                   | Layers to transform            |
-| `lora_modules_to_save` | `[]` | Modules to save |
+| `lora_modules_to_save`     | `[]`                   | Modules to save                |
-| `lora_fan_in_fan_out` | `false` | Fan in/out structure |
+| `lora_fan_in_fan_out`      | `false`                | Fan in/out structure           |
 ## Optimization Settings
-| Option | Default | Description |
+| Option                    | Default | Description                |
-|--------|---------|-------------|
+| ------------------------- | ------- | -------------------------- |
-| `train_on_inputs` | `false` | Train on input prompts |
+| `train_on_inputs`         | `false` | Train on input prompts     |
-| `group_by_length` | `false` | Group by sequence length |
+| `group_by_length`         | `false` | Group by sequence length   |
-| `gradient_checkpointing` | `false` | Use gradient checkpointing |
+| `gradient_checkpointing`  | `false` | Use gradient checkpointing |
-| `early_stopping_patience` | `3` | Early stopping patience |
+| `early_stopping_patience` | `3`     | Early stopping patience    |
 ## Learning Rate Scheduling
-| Option | Default | Description |
+| Option                     | Default    | Description          |
-|--------|---------|-------------|
+| -------------------------- | ---------- | -------------------- |
-| `lr_scheduler` | `"cosine"` | Scheduler type |
+| `lr_scheduler`             | `"cosine"` | Scheduler type       |
-| `lr_scheduler_kwargs` | `{}` | Scheduler parameters |
+| `lr_scheduler_kwargs`      | `{}`       | Scheduler parameters |
-| `cosine_min_lr_ratio` | `null` | Minimum LR ratio |
+| `cosine_min_lr_ratio`      | `null`     | Minimum LR ratio     |
-| `cosine_constant_lr_ratio` | `null` | Constant LR ratio |
+| `cosine_constant_lr_ratio` | `null`     | Constant LR ratio    |
-| `lr_div_factor` | `null` | LR division factor |
+| `lr_div_factor`            | `null`     | LR division factor   |
 ## Optimizer Settings
-| Option | Default | Description |
+| Option                 | Default      | Description         |
-|--------|---------|-------------|
+| ---------------------- | ------------ | ------------------- |
-| `optimizer` | `"adamw_hf"` | Optimizer choice |
+| `optimizer`            | `"adamw_hf"` | Optimizer choice    |
-| `optim_args` | `{}` | Optimizer arguments |
+| `optim_args`           | `{}`         | Optimizer arguments |
-| `optim_target_modules` | `[]` | Target modules |
+| `optim_target_modules` | `[]`         | Target modules      |
-| `weight_decay` | `null` | Weight decay |
+| `weight_decay`         | `null`       | Weight decay        |
-| `adam_beta1` | `null` | Adam beta1 |
+| `adam_beta1`           | `null`       | Adam beta1          |
-| `adam_beta2` | `null` | Adam beta2 |
+| `adam_beta2`           | `null`       | Adam beta2          |
-| `adam_epsilon` | `null` | Adam epsilon |
+| `adam_epsilon`         | `null`       | Adam epsilon        |
-| `max_grad_norm` | `null` | Gradient clipping |
+| `max_grad_norm`        | `null`       | Gradient clipping   |
 ## Attention Implementations
-| Option | Default | Description |
+| Option                     | Default | Description                   |
-|--------|---------|-------------|
+| -------------------------- | ------- | ----------------------------- |
-| `flash_optimum` | `false` | Use better transformers |
+| `flash_optimum`            | `false` | Use better transformers       |
-| `xformers_attention` | `false` | Use xformers |
+| `xformers_attention`       | `false` | Use xformers                  |
-| `flash_attention` | `false` | Use flash attention |
+| `flash_attention`          | `false` | Use flash attention           |
 | `flash_attn_cross_entropy` | `false` | Flash attention cross entropy |
-| `flash_attn_rms_norm` | `false` | Flash attention RMS norm |
+| `flash_attn_rms_norm`      | `false` | Flash attention RMS norm      |
-| `flash_attn_fuse_qkv` | `false` | Fuse QKV operations |
+| `flash_attn_fuse_qkv`      | `false` | Fuse QKV operations           |
-| `flash_attn_fuse_mlp` | `false` | Fuse MLP operations |
+| `flash_attn_fuse_mlp`      | `false` | Fuse MLP operations           |
-| `sdp_attention` | `false` | Use scaled dot product |
+| `sdp_attention`            | `false` | Use scaled dot product        |
-| `s2_attention` | `false` | Use shifted sparse attention |
+| `s2_attention`             | `false` | Use shifted sparse attention  |
 ## Tokenizer Modifications
-| Option | Default | Description |
+| Option           | Default | Description                  |
-|--------|---------|-------------|
+| ---------------- | ------- | ---------------------------- |
-| `special_tokens` | - | Special tokens to add/modify |
+| `special_tokens` | -       | Special tokens to add/modify |
-| `tokens` | `[]` | Additional tokens |
+| `tokens`         | `[]`    | Additional tokens            |
 ## Distributed Training
-| Option | Default | Description |
+| Option                  | Default | Description           |
-|--------|---------|-------------|
+| ----------------------- | ------- | --------------------- |
-| `fsdp` | `null` | FSDP configuration |
+| `fsdp`                  | `null`  | FSDP configuration    |
-| `fsdp_config` | `null` | FSDP config options |
+| `fsdp_config`           | `null`  | FSDP config options   |
-| `deepspeed` | `null` | Deepspeed config path |
+| `deepspeed`             | `null`  | Deepspeed config path |
-| `ddp_timeout` | `null` | DDP timeout |
+| `ddp_timeout`           | `null`  | DDP timeout           |
-| `ddp_bucket_cap_mb` | `null` | DDP bucket capacity |
+| `ddp_bucket_cap_mb`     | `null`  | DDP bucket capacity   |
-| `ddp_broadcast_buffers` | `null` | DDP broadcast buffers |
+| `ddp_broadcast_buffers` | `null`  | DDP broadcast buffers |
 <details>
 <summary><h3>Example Configuration Request:</h3></summary>
@@ -299,20 +283,21 @@ Here's a complete example for fine-tuning a LLaMA model using LoRA:
  }
 }
 ```
 </details>
 ### Advanced Features
 #### Wandb Integration
 - `wandb_project`: Project name for Weights & Biases
 - `wandb_entity`: Team name in W&B
 - `wandb_watch`: Monitor model with W&B
 - `wandb_name`: Name of the W&B run
 - `wandb_run_id`: ID for the W&B run
 #### Performance Optimization
 - `sample_packing`: Enable efficient sequence packing
 - `eval_sample_packing`: Use sequence packing during evaluation
 - `torch_compile`: Enable PyTorch 2.0 compilation
@@ -336,8 +321,6 @@ The following optimizers are supported:
 - `sgd`: Stochastic Gradient Descent
 - `adagrad`: Adagrad optimizer
 ## Notes
 - Set `load_in_8bit: true` or `load_in_4bit: true` for memory-efficient training
@@ -347,10 +330,6 @@ The following optimizers are supported:
 For more detailed information, please refer to the [documentation](https://axolotl-ai-cloud.github.io/axolotl/docs/config.html).
 ### Errors:
 ### Errors: 
 - if you face any issues with the Flash Attention-2, Delete yoor worker and Re-start.