Commit Graph

274 Commits

Author SHA1 Message Date
bursteratom
e038410778 lint 2024-11-27 10:24:37 -05:00
Mengqing Cao
838b74d05b Add Ascend NPU support (#1758) 2024-11-20 21:28:41 -05:00
Chirag Jain
0c8b1d824a Update get_unpad_data patching for multipack (#2013)
* Update `get_unpad_data` patching for multipack

* Update src/axolotl/utils/models.py

* Update src/axolotl/utils/models.py

* Add test case

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
2024-11-15 20:35:50 -05:00
Wing Lian
71d4030b79 gradient accumulation tests, embeddings w pad_token fix, smaller models (#2059)
* add more test cases for gradient accumulation and fix zero3

* swap out for smaller model

* fix missing return

* fix missing pad_token in config

* support concurrency for multigpu testing

* cast empty deepspeed to empty string for zero3 check

* fix temp_dir as fixture so parametrize works properly

* fix test file for multigpu evals

* don't use default

* don't use default for fsdp_state_dict_type

* don't use llama tokenizer w smollm

* also automatically cancel multigpu for concurrency
2024-11-14 12:59:00 -05:00
NanoCode012
5c7e89105d Fix: modelloader handling of model_kwargs load_in*bit (#1999)
* fix: load_in_*bit not properly read

* fix: load_*bit check

* fix: typo

* refactor: load * bit handling

* feat: add test dpo lora multi-gpu

* fix: turn off sample packing for dpo

* fix: missing warmup_steps

* fix: test to load in 8bit for lora

* skip 8bit lora on h100, add 4bit lora on h100 to multi gpu tests

* chore: reduce max_steps

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-10-30 14:41:34 -04:00
NanoCode012
bfc77b0f36 Feat: Add support for tokenizer’s or custom jinja chat_template (#1970)
* Allow using tokenizer's default chat template with fallbacks

Summary of changes:

1. Adds `tokenizer_default` as option for `chat_template` in
   `chat_template` prompt strategy that allows using the chat template
   from tokenizer's config.json
2. Allows falling back to chat templates available in axolotl if
   tokenizer does not have a chat template
3. Adds a mistral chat template which supports system message - taken
   from https://github.com/chujiezheng/chat_templates/blob/main/chat_templates/mistral-instruct.jinja

---

Why?

Many popular models are not trained with chatml format. As a result for
the model to correctly learn chatml we have to turn on train_on_inputs
which requires more compute and time. If we can use the model's already
learned chat template we can just learn the output tokens

---

Todo:

- Write tests

* Add tests

* Fix lint and bug post merge from main

* Add option `chat_template_jinja` to provide a jinja template

* remove custom mistral template

* Address review comments and add docs

* Update docs/dataset-formats/conversation.qmd

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* fix: set default to tokenizer template

* Merge branch 'main' into cj_tokenizer_default_prompt_template

* chore: remove redundant function

* fix: re-arrange enum declaration position

* fix: refactor artifact left from main merge

* feat(doc): updated config with chat template options and clarified examples

* chore: clarify doc

* chore: added example for non-default template

* chore: refactor

* fix: test

* fix: config being dropped and unittest to catch that

* chore: lint

* chore: skip duplicate

* fix: rename var after merge

* feat: add test for levy's dpo case

* fix: remove default setting on edge case where chat template overriden in dataset section

* feat: handle sharegpt deprecation better in docs

* feat: add example using fallback

* feat: handles chat_template requiring specific user/assistant order

* fix: update test based on new defaults

* fix: imported name incorrectly updated on merge

* chore: lint

* fix: update dummy message to prevent potential overlap with real content

* fix(doc): formatting

* fix: update bradleyterry to use new chat_template

---------

Co-authored-by: Chirag Jain <jain.chirag925@gmail.com>
2024-10-29 10:14:51 +07:00
Wing Lian
e1e0556c99 add option for resizing embeddings when adding new tokens (#2000)
* add option for resizing embeddings when adding new tokens

* let's just be opinonated about this setting and set it to False
2024-10-28 17:02:04 -04:00
Wing Lian
d3c45d27b5 fix zero3 (#1994) 2024-10-28 07:32:49 -04:00
Mengqing Cao
1d6a5e2bd6 Refactor func load_model to class ModelLoader (#1909) 2024-10-25 09:06:56 -04:00
Wing Lian
e1915f5625 Multimodal Vision Llama - rudimentary support (#1940)
---------

Co-authored-by: Sunny <sunny@Sunnys-MacBook-Air.local>
Co-authored-by: sunny <sunnyliu19981005@gmail.com>
2024-10-02 21:02:48 -04:00
Wing Lian
5c42f11411 remove dynamic module loader monkeypatch as this was fixed upstream (#1914) 2024-09-13 22:19:54 -04:00
Aman Gupta Karmani
159b8b9a74 monkey-patch transformers to simplify monkey-patching modeling code (#1877)
* monkey-patch transformers so that monkey-patched modeling code doesnt get overwritten

* unnecessary now

* add comment
2024-08-27 17:22:26 -07:00
Wing Lian
22f4eafa55 simplify logic (#1856) 2024-08-23 20:23:08 -04:00
Wing Lian
1f686c576c Liger Kernel integration (#1861)
* add initial plugin support w Liger kernel patches

* integrate the input args classes

* fix liger plugin and dynamic configuration class

* drop untrainable samples and refactor config plugins integration

* fix incorrect inputs and circular imports

* fix bool comparison

* fix for dropping untraibable tokens

* fix licensing so liger integration is Apache 2.0

* add jamba support

* pylint ignore
2024-08-23 12:21:51 -04:00
Wing Lian
fefa95e350 most model types now support flash attention 2 regardless of multipack support (#1854) 2024-08-22 16:39:23 -04:00
Wing Lian
5b0b774e38 ensure that the bias is also in the correct dtype (#1848) [skip ci]
* ensure that the bias is also in the correct dtype

* add nightly for dpo-qlora-fsdp
2024-08-22 11:45:00 -04:00
Gal Cohen (galco)
5aac4bc284 fix: dont change quant storage dtype in case of fsdp (#1837)
* fix: dont change quant storage dtype in case of fsdp

* fix black

---------

Co-authored-by: Gal Cohen <galc@ai21.com>
2024-08-20 12:41:48 -04:00
Wing Lian
5ee4b7325f fix z3 leaf configuration when not using lists (#1817) [skip ci] 2024-08-09 10:54:52 -04:00
Wing Lian
c56e0a79a5 logging improvements (#1808) [skip ci]
* logging improvements

* fix sort
2024-08-06 10:31:50 -04:00
Wing Lian
35d5e59d78 set z3 leaf for deepseek v2 (#1809) [skip ci]
* set z3 leaf for deepseek v2

* add deepseek v2 chat template
2024-08-06 09:30:46 -04:00
Wing Lian
3ebf22464b qlora-fsdp ram efficient loading with hf trainer (#1791)
* fix 405b with lower cpu ram requirements

* make sure to use doouble quant and only skip output embeddings

* set model attributes

* more fixes for sharded fsdp loading

* update the base model in example to use pre-quantized nf4-bf16 weights

* upstream fixes  for qlora+fsdp
2024-07-30 19:21:38 -04:00
Wing Lian
94ba93259f various batch of fixes (#1785)
* various batch of fixes

* more tweaks

* fix autoawq requirement for torch flexibility

* simplify conditionals

* multi-node fixes wip

* bump transformers and include 405b qlora+fsdp yaml
2024-07-28 07:25:54 -04:00
Wing Lian
fe250ada78 fix fsdp loading of models, esp 70b (#1780) 2024-07-23 19:54:28 -04:00
Wing Lian
87455e7f32 swaps to use newer sample packing for mistral (#1773)
* swaps to use newer sample packing for mistral

* fix multipack patch test

* patch the common fa utils

* update for refactor of flash attn unpad

* remove un-needed drop attn mask for mistral

* bump transformers to main to pick up latest mistral fix for 12b and refactor of fa2

* update test
2024-07-23 01:41:11 -04:00
Wing Lian
7830fe04b5 Unsloth rope (#1767)
* Add unsloth rope embeddings support

* support for models weights in 4bit and do some memory gc

* use accelerate logger

* add unsloth llama rms norm optims

* update docs for unsloth

* more docs info
2024-07-18 14:54:41 -04:00
Wing Lian
5f58555bd0 support for llama multipack using updated code/patches (#1754)
* support for llama multipack using updated code/patches

* also support unsloth patches

* incorrect arg

* add config validation for unsloth

* add missing return to validation

* add another missing return to validation
2024-07-16 17:36:29 -04:00
Wing Lian
98af5388ba bump flash attention 2.5.8 -> 2.6.1 (#1738)
* bump flash attention 2.5.8 -> 2.6.1

* use triton implementation of cross entropy from flash attn

* add smoke test for flash attn cross entropy patch

* fix args to xentropy.apply

* handle tuple from triton loss fn

* ensure the patch tests run independently

* use the wrapper already built into flash attn for cross entropy

* mark pytest as forked for patches

* use pytest xdist instead of forked, since cuda doesn't like forking

* limit to 1 process and use dist loadfile for pytest

* change up pytest for fixture to reload transformers w monkeypathc
2024-07-14 19:11:31 -04:00
Wing Lian
a4a5bf057f fixes to prevent vram spike when train starts (#1742) 2024-07-13 09:53:13 -04:00
Wing Lian
a159724e44 bump trl and accelerate for latest releases (#1730)
* bump trl and accelerate for latest releases

* ensure that the CI runs on new gh org

* drop kto_pair support since removed upstream
2024-07-10 11:15:44 -04:00
Wing Lian
c69b7eb2b5 full weights fsdp training seems broken with fsdp_cpu_ram_efficient_loading, disabling for now (#1726) 2024-07-05 09:15:36 -04:00
Ben Redmond
22ae21a6c2 Add KTO support (#1640)
* add kto support

* test cleanup

* fix outdated comment

* fix llama3 ultra

* chore: lint

* update to use rl_beta instead of dpo_beta

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-05-20 16:05:16 -04:00
Wing Lian
8a1572a831 Unsloth optims for Llama (#1609)
* WIP for unsloth integrations

* import the unsloth code in the right context

* add unsloth mlp, qkv, o lora optimizations

* apply unsloth mlp and qkv kernels
2024-05-20 09:55:06 -04:00
NanoCode012
8b9c15b17f feat: exclude mamba blocks for jamba (#1578) 2024-05-07 22:52:57 +09:00
Wing Lian
68601ec6ad make sure everything stays in the same dtype when using dpo + FSDP (#1559) 2024-04-22 16:00:05 -04:00
Wing Lian
6319da1f9b Unsloth gradient checkpointing offload (#1528)
* unsloth gradient checkpointing

* fix validation too

* fixes to make it work with mistral

* monkeypatch the checkpoint fn earlier
2024-04-16 14:53:57 -04:00
Wing Lian
132eb740f0 DBRX Model Support (#1462)
* wip for dbrx finetuning

* add fastcore for parallel loading of sharded weights

* fix dtype for load, use PartialState instead of accelerator to init process group, remove redundant wandb callback

* update to use v2 of the converted model

* more fixes for dbrx loras

* make sure to enable fsdp activation checkpointing

* fix support for 8bit loras too for dbrx

* apply z3 leaf moe fix for DBRX with deepspeed

* don't raise value error since child module searches could fail and be ok

* revert a previous change to fix fsdp

* update mistral/mistral qlora+fsdp yamls

* fix qlora+fsdp quant storage type

* more edge cases for qlora-fsdp

* fixes for fsdp+qlora w optimizer in 8bit

* add bigstral z3 config and make sure to use full_state_dict for fsdp
2024-04-12 09:02:36 -04:00
Wing Lian
2fa65b9599 ignore issues with calculating # params when printing (#1493) 2024-04-08 11:04:22 -04:00
Wing Lian
0b103775ad reduce verbosity of the special tokens (#1472) 2024-04-01 21:47:27 +09:00
Wing Lian
6086be85f7 qwen2_moe support w multipack (#1455) 2024-03-29 11:04:53 -04:00
Wing Lian
05b398a072 fix some of the edge cases for Jamba (#1452)
* fix some of the edge cases for Jamba

* update requirements for jamba
2024-03-29 02:38:02 -04:00
Wing Lian
02af0820f7 Jamba (#1451)
* fixes for larger models

* add qlora example for deepspeed

* add readme for jamba
2024-03-28 21:03:22 -04:00
Wing Lian
4155e9988f fix layer_replication arg to peft (#1446) 2024-03-27 10:18:56 -04:00
Wing Lian
25afd35842 support layer replication for peft and fix rslora integration (#1445) 2024-03-27 10:16:47 -04:00
NanoCode012
ff939d8a64 fix(dataset): normalize tokenizer config and change hash from tokenizer class to tokenizer path (#1298)
* fix(dataset): normalize tokenizer config and change hash from tokenizer class to tokenizer path

* fix: normalize config
2024-03-25 15:34:54 +09:00
Wing Lian
2a1589f6f6 strip out hacky qlora-fsdp workarounds now that qlora-fsdp fixes are upstreamed (#1428) 2024-03-21 11:56:13 -04:00
NanoCode012
b1e3e1b25f fix(config): passing gradient_checkpoint_kwargs (#1412)
* fix(config): change default use_reentrant to true

* Update trainer_builder.py

* fix: make sure to pass kwargs to enable checkpoint

* chore: lint
2024-03-19 12:57:43 +09:00
Wing Lian
8df7b888ff beta support for multipack with gemmoe: (#1402) 2024-03-14 15:52:23 -04:00
Wing Lian
7659c001aa support for rslora (#1387) [skip ci] 2024-03-10 20:49:45 -04:00
Wing Lian
9b6ee83a73 FDSP + QLoRA (#1378)
* wip qlora + fsdp fixes

* more fixes

* make sure to load the lora 🤦

* only setup quantized meta on non-zero rank:

* only run setup_quantized_peft_meta_for_training for qlora+fsdp

* more fixes for qlora+fsdp

* chore: lint

* add example yml

* support mistral too

* fix for model_type and add mixtral support too

* set cpu_offload: false to reduce vram, constrain new accleerator logic to qlora + fsdp

* refactor for duplicate code
2024-03-08 14:31:01 -05:00
Wing Lian
0cfdb2c90c support for DoRA w/ PEFT (#1363) 2024-03-05 21:20:15 -05:00