Commit Graph

944 Commits

Author SHA1 Message Date
Chiwan Park
2dac1edf72 Fix drop_long_seq bug due to truncation in prompt tokenization strategies when using chat_template (#1867) 2024-08-26 12:56:12 -04:00
Wing Lian
6819c12cee update specturm authors (#1869) 2024-08-26 12:00:36 -04:00
Wing Lian
8e29bdefdd Spectrum plugin (#1866) 2024-08-25 17:54:02 -04:00
Wing Lian
f245964f22 better handling of llama-3 tool rolw (#1782) 2024-08-25 12:31:40 -04:00
Wing Lian
22f4eafa55 simplify logic (#1856) 2024-08-23 20:23:08 -04:00
Wing Lian
77a4b9cda2 change up import to prevent AttributeError (#1863)
* change up import to prevent AttributeError

* tweak patching check for updated upstream
2024-08-23 17:00:01 -04:00
Wing Lian
1f686c576c Liger Kernel integration (#1861)
* add initial plugin support w Liger kernel patches

* integrate the input args classes

* fix liger plugin and dynamic configuration class

* drop untrainable samples and refactor config plugins integration

* fix incorrect inputs and circular imports

* fix bool comparison

* fix for dropping untraibable tokens

* fix licensing so liger integration is Apache 2.0

* add jamba support

* pylint ignore
2024-08-23 12:21:51 -04:00
Wing Lian
328fd4b3b7 add axolotl community license (#1862) 2024-08-23 11:40:21 -04:00
Wing Lian
fefa95e350 most model types now support flash attention 2 regardless of multipack support (#1854) 2024-08-22 16:39:23 -04:00
Wing Lian
2f8037fee6 ensure that the hftrainer deepspeed config is set before the trainer class is ever init'ed (#1850) [skip ci] 2024-08-22 13:10:40 -04:00
JohanWork
7ed92e61c2 fix: prompt phi (#1845) [skip ci]
* corecting phi system prompt

* phi test

* update

* add test
2024-08-22 11:46:57 -04:00
Wing Lian
9caa3eb699 make the train_on_eos default to turn so all eos tokens are treated the same (#1847) [skip ci] 2024-08-22 11:45:37 -04:00
Wing Lian
5b0b774e38 ensure that the bias is also in the correct dtype (#1848) [skip ci]
* ensure that the bias is also in the correct dtype

* add nightly for dpo-qlora-fsdp
2024-08-22 11:45:00 -04:00
Gal Cohen (galco)
9f917245f6 feat: add jamba chat_template (#1843)
* feat: add jamba chat_template

* fix: black

* feat: jamba fsdp+qlora

---------

Co-authored-by: Gal Cohen <galc@ai21.com>
2024-08-21 13:37:17 -04:00
Aman Gupta Karmani
649c19aba3 pretrain: fix with sample_packing=false (#1841) 2024-08-21 13:36:51 -04:00
Gal Cohen (galco)
5aac4bc284 fix: dont change quant storage dtype in case of fsdp (#1837)
* fix: dont change quant storage dtype in case of fsdp

* fix black

---------

Co-authored-by: Gal Cohen <galc@ai21.com>
2024-08-20 12:41:48 -04:00
Wing Lian
e29931259b optionally save the final FSDP model as a sharded state dict (#1828)
* efficiently save very large llms when using FSDP

* fix parsing and index of sharded chunks

* only save fsdp on main process

* debugging for rename

* save sharded state dict

* remove unused new param

* get state dict directly

* tweak acc merge fsdp to shard the weight files

* sharded_state_dict alongside save_safetensors seems to hang on checkpoint save
2024-08-19 14:59:24 -04:00
Wing Lian
b1d2921222 add validation to prevent 8bit lora finetuning on H100s (#1827) 2024-08-16 21:32:00 -04:00
Wing Lian
803fed3e90 update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model (#1821)
* update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model

* There is already a condition check within the function. This outer one is not necessary

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

---------

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
2024-08-16 10:41:51 -04:00
NanoCode012
68a3c7678a fix: parse model_kwargs (#1825) 2024-08-16 07:51:19 -04:00
NanoCode012
f18925fb4b fix: parse eager_attention (#1824) 2024-08-14 09:46:46 -04:00
Chiwan Park
0801f239cc fix the incorrect max_length for chat template (#1818) 2024-08-09 11:50:31 -04:00
Wing Lian
5ee4b7325f fix z3 leaf configuration when not using lists (#1817) [skip ci] 2024-08-09 10:54:52 -04:00
Wing Lian
c56e0a79a5 logging improvements (#1808) [skip ci]
* logging improvements

* fix sort
2024-08-06 10:31:50 -04:00
Wing Lian
35d5e59d78 set z3 leaf for deepseek v2 (#1809) [skip ci]
* set z3 leaf for deepseek v2

* add deepseek v2 chat template
2024-08-06 09:30:46 -04:00
Wing Lian
fbbeb4fee0 remove un-necessary zero-first guard as it's already only called in a parent fn (#1810) [skip ci] 2024-08-06 09:29:23 -04:00
Wing Lian
ecdda006de One cycle lr (#1803)
* refactor one_cycle lr scheduler so it's reusable in more situations

* fix validation for lr_scheduler

* default to cosine anneal strategy

* one cycle lr exepects cos
2024-08-05 13:12:05 -04:00
ripes
7402eb9dcb Fix setting correct repo id when pushing dataset to hub (#1657)
* use the ds hash as the dataset's config_name

* improve logging for loading/pushing ds to hub

* fix missing f string
2024-08-05 12:42:15 -04:00
Wing Lian
78b42a3fe1 fix roles to train defaults and make logging less verbose (#1801) 2024-07-30 20:58:17 -04:00
Wing Lian
3ebf22464b qlora-fsdp ram efficient loading with hf trainer (#1791)
* fix 405b with lower cpu ram requirements

* make sure to use doouble quant and only skip output embeddings

* set model attributes

* more fixes for sharded fsdp loading

* update the base model in example to use pre-quantized nf4-bf16 weights

* upstream fixes  for qlora+fsdp
2024-07-30 19:21:38 -04:00
Adam Brusselback
55cc214c76 Add flexible configuration options for chat_template dataset training (#1756)
* Add flexible configuration options for chat dataset training

- Introduce roles_to_train parameter to set training labels by role
- Add train_on_eos option to configure training on end-of-sequence tokens
- Implement per-message training configuration in dataset
- Allow fine-grained control over training specific portions of messages
- Add message_field_training and message_field_training_detail settings
- Implement mapping between dataset character offsets and tokenized prompt
- Enhance test suite to cover new functionality

* Fix missing field inits, things weren't working from yaml.

* Add flexible configuration options for chat dataset training

- Introduce roles_to_train parameter to set training labels by role
- Add train_on_eos option to configure training on end-of-sequence tokens
- Implement per-message training configuration in dataset
- Allow fine-grained control over training specific portions of messages
- Add message_field_training and message_field_training_detail settings
- Implement mapping between dataset character offsets and tokenized prompt
- Enhance test suite to cover new functionality

* Fix missing field inits, things weren't working from yaml.

* chore: lint

* Revert test repo back to NousResearch after opening PR to fix the tokenizer_config.json.

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-07-28 21:48:57 -04:00
Wing Lian
94ba93259f various batch of fixes (#1785)
* various batch of fixes

* more tweaks

* fix autoawq requirement for torch flexibility

* simplify conditionals

* multi-node fixes wip

* bump transformers and include 405b qlora+fsdp yaml
2024-07-28 07:25:54 -04:00
Wing Lian
6a9cfec222 add support for simpo via cpo trainer (#1772)
* add support for simpo via cpo trainer

* add cpo_alpha / sft_weight from the paper

* make sure to use the right builder for simpo
2024-07-23 21:22:16 -04:00
Wing Lian
fe250ada78 fix fsdp loading of models, esp 70b (#1780) 2024-07-23 19:54:28 -04:00
Wing Lian
87455e7f32 swaps to use newer sample packing for mistral (#1773)
* swaps to use newer sample packing for mistral

* fix multipack patch test

* patch the common fa utils

* update for refactor of flash attn unpad

* remove un-needed drop attn mask for mistral

* bump transformers to main to pick up latest mistral fix for 12b and refactor of fa2

* update test
2024-07-23 01:41:11 -04:00
Keith Stevens
985819d89b Add a chat_template prompt strategy for DPO (#1725)
* Implementing a basic chat_template strategy for DPO datasets

This mimics the sft chat_template strategy such that users can:
* Specify the messages field
* Specify the per message role and content fields
* speicfy the chosen and rejected fields
* Let the tokenizer construct the raw prompt
* Ensure the chosen and rejected fields don't have any prefix tokens

* Adding additional dpo chat template unittests

* Rename test class
2024-07-21 09:10:42 -04:00
Wing Lian
fa91b698e9 Fix untrained tokens (#1771)
* fix untrained reserved tokens

* save model after fixing untrained embeddings

* don't need fsdp conditional here
2024-07-19 12:21:37 -04:00
Wing Lian
e4063d60a7 bump transformers and set roundup_power2_divisions for more VRAM improvements, low bit ao optimizers (#1769)
* bump transformers and set roundup_power2_divisions for more VRAM improvements

* support for low bit optimizers from torch ao

* fix check for alternate optimizers and use nous models on hf for llama3

* add missing check for ao_adamw_fp8

* fix check when using custom optimizers w adamw
2024-07-19 00:47:07 -04:00
Wing Lian
7830fe04b5 Unsloth rope (#1767)
* Add unsloth rope embeddings support

* support for models weights in 4bit and do some memory gc

* use accelerate logger

* add unsloth llama rms norm optims

* update docs for unsloth

* more docs info
2024-07-18 14:54:41 -04:00
Wing Lian
c86c32a627 set the number of dataset processes on the DPO Config rather than the trainer (#1762) 2024-07-17 15:38:37 -04:00
Wing Lian
8731b95d04 re-enable PYTORCH_CUDA_ALLOC_CONF expandable_segments (#1765) [skip ci] 2024-07-17 15:38:26 -04:00
Wing Lian
8619b2d855 add torch_compile_mode options (#1763) [skip ci]
* add torch_compile_mode options

* make sure n_gpu is an int
2024-07-17 15:38:07 -04:00
Wing Lian
976f85195a fixes to accelerator so that iterable pretraining datasets work (#1759)
* fixes to accelerator so that iterable pretraining datasets work

* fix the pretraining test params

* split batches, not dispatch batches needs to be set

* update c4 datasets

* set epochs in pretrain config test

* need to set both split_batches and dispatch_batches to false for pretraining

* fix bool val in comment
2024-07-17 10:58:38 -04:00
Wing Lian
152ab76623 fix num gpu check (#1760) 2024-07-17 10:58:14 -04:00
Wing Lian
5f58555bd0 support for llama multipack using updated code/patches (#1754)
* support for llama multipack using updated code/patches

* also support unsloth patches

* incorrect arg

* add config validation for unsloth

* add missing return to validation

* add another missing return to validation
2024-07-16 17:36:29 -04:00
Wing Lian
cfc533a7f7 torch compile and cuda alloc improvements (#1755)
* enable experimental expandable_segments

* hf trainer seems to be missing torch compile

* disable PYTORCH_CUDA_ALLOC_CONF to see if that fixes cicd
2024-07-16 16:00:23 -04:00
Wing Lian
78e12f8ca5 add basic support for the optimi adamw optimizer (#1727)
* add support for optimi_adamw optimizer w kahan summation

* pydantic validator for optimi_adamw

* workaround for setting optimizer for fsdp

* make sure to install optimizer packages

* make sure to have parity for model parameters passed to optimizer

* add smoke test for optimi_adamw optimizer

* don't use foreach optimi by default
2024-07-14 19:12:57 -04:00
Wing Lian
98af5388ba bump flash attention 2.5.8 -> 2.6.1 (#1738)
* bump flash attention 2.5.8 -> 2.6.1

* use triton implementation of cross entropy from flash attn

* add smoke test for flash attn cross entropy patch

* fix args to xentropy.apply

* handle tuple from triton loss fn

* ensure the patch tests run independently

* use the wrapper already built into flash attn for cross entropy

* mark pytest as forked for patches

* use pytest xdist instead of forked, since cuda doesn't like forking

* limit to 1 process and use dist loadfile for pytest

* change up pytest for fixture to reload transformers w monkeypathc
2024-07-14 19:11:31 -04:00
Wing Lian
a4a5bf057f fixes to prevent vram spike when train starts (#1742) 2024-07-13 09:53:13 -04:00
Wing Lian
47e1916484 add tests so CI can catch updates where patches will break with unsloth (#1737) [skip ci] 2024-07-11 16:43:19 -04:00