remove LICENSE and fix README

2025-04-14 18:33:27 -08:00
parent c2fc35f520
commit cb7185998b
2 changed files with 128 additions and 170 deletions
--- a/.runpod/LICENSE
+++ b/.runpod/LICENSE
@@ -1,21 +0,0 @@
 MIT License
 Copyright (c) 2023 runpod-workers
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
--- a/.runpod/README.md
+++ b/.runpod/README.md
@@ -1,15 +1,5 @@
 <h1>LLM Training- Full finetune, LoRA, QLoRa etc. Llama/Mistral/Gemma</h1>
 ## RunPod Worker Images
 Below is a summary of the available RunPod Worker images, categorized by image stability and CUDA version compatibility.
 | Preview Image Tag                  | Development Image Tag             |
 -----------------------------------|-----------------------------------|
 | `runpod/llm-finetuning:preview` | `runpod/llm-finetuning:dev` 
 # Configuration Options
 This document outlines all available configuration options for training models. The configuration can be provided as a JSON request.
@@ -19,6 +9,7 @@ This document outlines all available configuration options for training models.
 You can use these configuration Options:
 1. As a JSON request body:
 ```json
 {
  "input": {
@@ -42,7 +33,7 @@ You can use these configuration Options:
 ### Model Configuration
 | Option              | Description                                                                                   | Default              |
-|--------|-------------|---------|
+| ------------------- | --------------------------------------------------------------------------------------------- | -------------------- |
 | `base_model`        | Path to the base model (local or HuggingFace)                                                 | Required             |
 | `base_model_config` | Configuration path for the base model                                                         | Same as base_model   |
 | `revision_of_model` | Specific model revision from HuggingFace hub                                                  | Latest               |
@@ -51,12 +42,10 @@ You can use these configuration Options:
 | `tokenizer_type`    | Type of tokenizer to use                                                                      | AutoTokenizer        |
 | `hub_model_id`      | Repository ID where the model will be pushed on Hugging Face Hub (format: username/repo-name) | Optional             |
 ## Model Family Identification
 | Option                     | Default | Description                    |
-|--------|---------|-------------|
+| -------------------------- | ------- | ------------------------------ |
 | `is_falcon_derived_model`  | `false` | Whether model is Falcon-based  |
 | `is_llama_derived_model`   | `false` | Whether model is LLaMA-based   |
 | `is_qwen_derived_model`    | `false` | Whether model is Qwen-based    |
@@ -65,25 +54,24 @@ You can use these configuration Options:
 ## Model Configuration Overrides
 | Option                                          | Default    | Description                        |
-|--------|---------|-------------|
+| ----------------------------------------------- | ---------- | ---------------------------------- |
 | `overrides_of_model_config.rope_scaling.type`   | `"linear"` | RoPE scaling type (linear/dynamic) |
 | `overrides_of_model_config.rope_scaling.factor` | `1.0`      | RoPE scaling factor                |
 ### Model Loading Options
 | Option         | Description                   | Default |
-|--------|-------------|---------|
+| -------------- | ----------------------------- | ------- |
 | `load_in_8bit` | Load model in 8-bit precision | false   |
 | `load_in_4bit` | Load model in 4-bit precision | false   |
 | `bf16`         | Use bfloat16 precision        | false   |
 | `fp16`         | Use float16 precision         | false   |
 | `tf32`         | Use tensor float 32 precision | false   |
 ## Memory and Device Settings
 | Option             | Default   | Description             |
-|--------|---------|-------------|
+| ------------------ | --------- | ----------------------- |
 | `gpu_memory_limit` | `"20GiB"` | GPU memory limit        |
 | `lora_on_cpu`      | `false`   | Load LoRA on CPU        |
 | `device_map`       | `"auto"`  | Device mapping strategy |
@@ -92,7 +80,7 @@ You can use these configuration Options:
 ## Training Hyperparameters
 | Option                        | Default   | Description                 |
-|--------|---------|-------------|
+| ----------------------------- | --------- | --------------------------- |
 | `gradient_accumulation_steps` | `1`       | Gradient accumulation steps |
 | `micro_batch_size`            | `2`       | Batch size per GPU          |
 | `eval_batch_size`             | `null`    | Evaluation batch size       |
@@ -121,11 +109,10 @@ datasets:
    train_on_split: train # Dataset split to use
 ```
 ## Chat Template Settings
 | Option                   | Default                          | Description            |
-|--------|---------|-------------|
+| ------------------------ | -------------------------------- | ---------------------- |
 | `chat_template`          | `"tokenizer_default"`            | Chat template type     |
 | `chat_template_jinja`    | `null`                           | Custom Jinja template  |
 | `default_system_message` | `"You are a helpful assistant."` | Default system message |
@@ -133,7 +120,7 @@ datasets:
 ## Dataset Processing
 | Option                        | Default                    | Description                       |
-|--------|---------|-------------|
+| ----------------------------- | -------------------------- | --------------------------------- |
 | `dataset_prepared_path`       | `"data/last_run_prepared"` | Path for prepared dataset         |
 | `push_dataset_to_hub`         | `""`                       | Push dataset to HF hub            |
 | `dataset_processes`           | `4`                        | Number of preprocessing processes |
@@ -144,7 +131,7 @@ datasets:
 ## LoRA Configuration
 | Option                     | Default                | Description                    |
-|--------|---------|-------------|
+| -------------------------- | ---------------------- | ------------------------------ |
 | `adapter`                  | `"lora"`               | Adapter type (lora/qlora)      |
 | `lora_model_dir`           | `""`                   | Directory with pretrained LoRA |
 | `lora_r`                   | `8`                    | LoRA attention dimension       |
@@ -156,11 +143,10 @@ datasets:
 | `lora_modules_to_save`     | `[]`                   | Modules to save                |
 | `lora_fan_in_fan_out`      | `false`                | Fan in/out structure           |
 ## Optimization Settings
 | Option                    | Default | Description                |
-|--------|---------|-------------|
+| ------------------------- | ------- | -------------------------- |
 | `train_on_inputs`         | `false` | Train on input prompts     |
 | `group_by_length`         | `false` | Group by sequence length   |
 | `gradient_checkpointing`  | `false` | Use gradient checkpointing |
@@ -169,7 +155,7 @@ datasets:
 ## Learning Rate Scheduling
 | Option                     | Default    | Description          |
-|--------|---------|-------------|
+| -------------------------- | ---------- | -------------------- |
 | `lr_scheduler`             | `"cosine"` | Scheduler type       |
 | `lr_scheduler_kwargs`      | `{}`       | Scheduler parameters |
 | `cosine_min_lr_ratio`      | `null`     | Minimum LR ratio     |
@@ -179,7 +165,7 @@ datasets:
 ## Optimizer Settings
 | Option                 | Default      | Description         |
-|--------|---------|-------------|
+| ---------------------- | ------------ | ------------------- |
 | `optimizer`            | `"adamw_hf"` | Optimizer choice    |
 | `optim_args`           | `{}`         | Optimizer arguments |
 | `optim_target_modules` | `[]`         | Target modules      |
@@ -192,7 +178,7 @@ datasets:
 ## Attention Implementations
 | Option                     | Default | Description                   |
-|--------|---------|-------------|
+| -------------------------- | ------- | ----------------------------- |
 | `flash_optimum`            | `false` | Use better transformers       |
 | `xformers_attention`       | `false` | Use xformers                  |
 | `flash_attention`          | `false` | Use flash attention           |
@@ -203,18 +189,17 @@ datasets:
 | `sdp_attention`            | `false` | Use scaled dot product        |
 | `s2_attention`             | `false` | Use shifted sparse attention  |
 ## Tokenizer Modifications
 | Option           | Default | Description                  |
-|--------|---------|-------------|
+| ---------------- | ------- | ---------------------------- |
 | `special_tokens` | -       | Special tokens to add/modify |
 | `tokens`         | `[]`    | Additional tokens            |
 ## Distributed Training
 | Option                  | Default | Description           |
-|--------|---------|-------------|
+| ----------------------- | ------- | --------------------- |
 | `fsdp`                  | `null`  | FSDP configuration    |
 | `fsdp_config`           | `null`  | FSDP config options   |
 | `deepspeed`             | `null`  | Deepspeed config path |
@@ -222,7 +207,6 @@ datasets:
 | `ddp_bucket_cap_mb`     | `null`  | DDP bucket capacity   |
 | `ddp_broadcast_buffers` | `null`  | DDP broadcast buffers |
 <details>
 <summary><h3>Example Configuration Request:</h3></summary>
@@ -299,20 +283,21 @@ Here's a complete example for fine-tuning a LLaMA model using LoRA:
  }
 }
 ```
 </details>
 ### Advanced Features
 #### Wandb Integration
 - `wandb_project`: Project name for Weights & Biases
 - `wandb_entity`: Team name in W&B
 - `wandb_watch`: Monitor model with W&B
 - `wandb_name`: Name of the W&B run
 - `wandb_run_id`: ID for the W&B run
 #### Performance Optimization
 - `sample_packing`: Enable efficient sequence packing
 - `eval_sample_packing`: Use sequence packing during evaluation
 - `torch_compile`: Enable PyTorch 2.0 compilation
@@ -336,8 +321,6 @@ The following optimizers are supported:
 - `sgd`: Stochastic Gradient Descent
 - `adagrad`: Adagrad optimizer
 ## Notes
 - Set `load_in_8bit: true` or `load_in_4bit: true` for memory-efficient training
@@ -347,10 +330,6 @@ The following optimizers are supported:
 For more detailed information, please refer to the [documentation](https://axolotl-ai-cloud.github.io/axolotl/docs/config.html).
 ### Errors:
 - if you face any issues with the Flash Attention-2, Delete yoor worker and Re-start.