Commit Graph

146 Commits

Author SHA1 Message Date
Sunny Liu
bf416bdfd0 bump_liger_0.4.2 (#2096) 2024-11-21 13:24:52 -05:00
Wing Lian
d9b71edf84 bump transformers for fsdp-grad-accum fix, remove patch (#2079) 2024-11-19 02:23:09 -05:00
Wing Lian
70cf79ef52 upgrade autoawq==0.2.7.post2 for transformers fix (#2070)
* point to upstream autoawq for transformers fix

* use autoawq 0.2.7 release

* test wheel for awq

* try different format for wheel def

* autoawq re-release

* Add intel_extension_for_pytorch dep

* ipex gte version

* forcefully remove intel-extension-for-pytorch

* add -y option to pip uninstall for ipex

* use post2 release for autoawq and remove uninstall of ipex
2024-11-18 11:53:37 -05:00
Wing Lian
d42f202046 Fsdp grad accum monkeypatch (#2064) 2024-11-15 19:11:04 -05:00
Wing Lian
0dabde1962 support for schedule free and e2e ci smoke test (#2066) [skip ci]
* support for schedule free and e2e ci smoke test

* set default lr scheduler to constant in test

* ignore duplicate code

* fix quotes for config/dict
2024-11-15 19:10:14 -05:00
Wing Lian
2f20cb7ebf upgrade datasets==3.1.0 and add upstream check (#2067) [skip ci] 2024-11-15 19:08:38 -05:00
Wing Lian
2d7830fda6 upgrade to flash-attn 2.7.0 (#2048) 2024-11-14 06:59:25 -05:00
NanoCode012
4e1891b12b feat: upgrade to liger 0.4.1 (#2045) 2024-11-13 10:07:24 -05:00
Wing Lian
fd3b80716a remove fastchat and sharegpt (#2021)
* remove fastchat and sharegpt

* remove imports

* remove more fastchat imports

* chore: remove unused functions

* feat: remove sharegpt and deprecate from docs

* chore: remove unused sharegpt checks

* fix: remove sharegpt type from tests

* feat: add sharegpt deprecation error

* feat: update readme

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>
2024-11-08 13:45:49 -05:00
Sunny Liu
3265b7095e Add weighted optimisation support for trl DPO trainer integration (#2016)
* trlv0.12.0  integration

* update trl version requirements

* linting

* commenting out

* trl version requirement
2024-11-08 11:29:11 -05:00
Wing Lian
02ce520b7e upgrade liger to 0.4.0 (#1973)
* upgrade liger to 0.3.1

* update docs and example

* skip duplicate code check

* Update src/axolotl/integrations/liger/args.py

Co-authored-by: NanoCode012 <nano@axolotl.ai>

* Update README.md

Co-authored-by: NanoCode012 <nano@axolotl.ai>

* add logging

* chore: lint

* add test case

* upgrade liger and transformers

* also upgrade accelerate

* use kwargs to support patch release

* make sure prepared path is empty for test

* use transfromers 4.46.1 since 4.46.2 breaks fsdp

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>
2024-11-07 12:53:34 -05:00
Wing Lian
d3c45d27b5 fix zero3 (#1994) 2024-10-28 07:32:49 -04:00
NanoCode012
2501c1a6a3 Fix: Gradient Accumulation issue (#1980)
* feat: support new arg num_items_in_batch

* use kwargs to manage extra unknown kwargs for now

* upgrade against upstream transformers main

* make sure trl is on latest too

* fix for upgraded trl

* fix: handle trl and transformer signature change

* feat: update trl to handle transformer signature

* RewardDataCollatorWithPadding no longer has max_length

* handle updated signature for tokenizer vs processor class

* invert logic for tokenizer vs processor class

* processing_class, not processor class

* also handle processing class in dpo

* handle model name w model card creation

* upgrade transformers and add a loss check test

* fix install of tbparse requirements

* make sure to add tbparse to req

* feat: revert kwarg to positional kwarg to be explicit

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-10-25 11:28:23 -04:00
Wing Lian
955cca41fc don't explicitly set cpu pytorch version (#1986)
use a constraint file
use min version of xformers
don't install autoawq with pytorch 2.5.0
debugging for errors
upgrade pip first
fix action yml
add back try/except
retry w/o constraint
use --no-build-isolation
show torch version
install setuptools and wheel
add back try/except
2024-10-21 19:50:50 -04:00
Wing Lian
335027f155 upgrade accelerate to 1.0.1 (#1969) 2024-10-13 20:04:30 -04:00
Wing Lian
ec4272c3a0 add ds zero3 to multigpu biweekly tests (#1900)
* add ds zero3 to multigpu biweekly tests

* fix for upstream api change

* use updated accelerate and fix deepspeed tests

* stringify the Path, and run multigpu tests if the multigpu tests change for a PR

* use correct json rather than yaml

* revert accelerate for deepspeed
2024-10-13 17:34:37 -04:00
Wing Lian
d20b48a61e only install torchao for torch versions >= 2.4.0 (#1963) 2024-10-12 20:53:48 -04:00
Wing Lian
09bf1ceacc update hf deps (#1964)
* update hf deps

* remove deprecated set_caching_enabled
2024-10-12 18:19:48 -04:00
Wing Lian
8159cbd1ab lm_eval harness post train (#1926)
* wip, lm_eval harness post train

* include latex parser

* add dtype and doc

* add validation when doing bench evals

* automatically add test dataset when doing benches
2024-10-10 15:04:17 -04:00
Wing Lian
e8d3da0081 upgrade pytorch from 2.4.0 => 2.4.1 (#1950)
* upgrade pytorch from 2.4.0 => 2.4.1

* update xformers for updated pytorch version

* handle xformers version case for torch==2.3.1
2024-10-09 11:53:56 -04:00
Wing Lian
844331005c bump transformers to 4.45.1 (#1936) 2024-09-30 13:56:12 -04:00
Wing Lian
b98d7d7098 update upstream deps versions and replace lora+ (#1928)
* update upstream deps versions and replace lora+

* typo transformers version
2024-09-26 11:33:41 -04:00
Wing Lian
5c42f11411 remove dynamic module loader monkeypatch as this was fixed upstream (#1914) 2024-09-13 22:19:54 -04:00
Wing Lian
3853ab7ae9 bump accelerate to 0.34.2 (#1901)
* bump accelerate

* add fixture to predownload the test model

* change fixture
2024-09-07 14:39:31 -04:00
Wing Lian
6e354682e3 fix zero3 integration (#1897)
* fix zero3 integration

* bump transformers and accelerate too
2024-09-05 10:58:50 -04:00
Wing Lian
ce33e1ed83 pin liger-kernel to latest 0.2.1 (#1882) [skip ci] 2024-08-30 17:51:18 -04:00
Wing Lian
1f686c576c Liger Kernel integration (#1861)
* add initial plugin support w Liger kernel patches

* integrate the input args classes

* fix liger plugin and dynamic configuration class

* drop untrainable samples and refactor config plugins integration

* fix incorrect inputs and circular imports

* fix bool comparison

* fix for dropping untraibable tokens

* fix licensing so liger integration is Apache 2.0

* add jamba support

* pylint ignore
2024-08-23 12:21:51 -04:00
Wing Lian
c3fc529bfc numpy 2.1.0 was released, but incompatible with numba (#1849) [skip ci] 2024-08-22 11:44:45 -04:00
Wing Lian
803fed3e90 update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model (#1821)
* update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model

* There is already a condition check within the function. This outer one is not necessary

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

---------

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
2024-08-16 10:41:51 -04:00
Wing Lian
1853d6021d bump hf dependencies (#1823)
* bump hf dependencies

* revert optimum version change

* don't bump tokenizers all the way to 0.20 yet since transformers doesn't support that
2024-08-11 16:27:41 -04:00
Wing Lian
850f999a76 update peft and transformers (#1811) 2024-08-06 10:32:05 -04:00
Wing Lian
3ebf22464b qlora-fsdp ram efficient loading with hf trainer (#1791)
* fix 405b with lower cpu ram requirements

* make sure to use doouble quant and only skip output embeddings

* set model attributes

* more fixes for sharded fsdp loading

* update the base model in example to use pre-quantized nf4-bf16 weights

* upstream fixes  for qlora+fsdp
2024-07-30 19:21:38 -04:00
Wing Lian
94ba93259f various batch of fixes (#1785)
* various batch of fixes

* more tweaks

* fix autoawq requirement for torch flexibility

* simplify conditionals

* multi-node fixes wip

* bump transformers and include 405b qlora+fsdp yaml
2024-07-28 07:25:54 -04:00
Wing Lian
22680913f3 Bump deepspeed 20240727 (#1790)
* pin deepspeed to 0.14.4 otherwise it doesn't play nice with trl

* Add test to import to try to trigger import dependencies
2024-07-27 10:24:11 -04:00
Wing Lian
e6b299dd79 bump flash attention to 2.6.2 (#1781) [skip ci] 2024-07-23 19:54:15 -04:00
Wing Lian
608a2f3180 bump transformers for updated llama 3.1 (#1778)
* bump transformers for updated llama 3.1

* bump for patch fix
2024-07-23 13:21:03 -04:00
Wing Lian
87455e7f32 swaps to use newer sample packing for mistral (#1773)
* swaps to use newer sample packing for mistral

* fix multipack patch test

* patch the common fa utils

* update for refactor of flash attn unpad

* remove un-needed drop attn mask for mistral

* bump transformers to main to pick up latest mistral fix for 12b and refactor of fa2

* update test
2024-07-23 01:41:11 -04:00
Wing Lian
e4063d60a7 bump transformers and set roundup_power2_divisions for more VRAM improvements, low bit ao optimizers (#1769)
* bump transformers and set roundup_power2_divisions for more VRAM improvements

* support for low bit optimizers from torch ao

* fix check for alternate optimizers and use nous models on hf for llama3

* add missing check for ao_adamw_fp8

* fix check when using custom optimizers w adamw
2024-07-19 00:47:07 -04:00
Wing Lian
98af5388ba bump flash attention 2.5.8 -> 2.6.1 (#1738)
* bump flash attention 2.5.8 -> 2.6.1

* use triton implementation of cross entropy from flash attn

* add smoke test for flash attn cross entropy patch

* fix args to xentropy.apply

* handle tuple from triton loss fn

* ensure the patch tests run independently

* use the wrapper already built into flash attn for cross entropy

* mark pytest as forked for patches

* use pytest xdist instead of forked, since cuda doesn't like forking

* limit to 1 process and use dist loadfile for pytest

* change up pytest for fixture to reload transformers w monkeypathc
2024-07-14 19:11:31 -04:00
Akshaya Shanbhogue
4512738a73 bump xformers to 0.0.27 (#1740)
* Update requirements.txt

Preserve compatibility with torch 2.3.1. [Reference](https://github.com/facebookresearch/xformers/issues/1052)

* fix setup.py to extract the current xformers dep from requirements for replacement

* xformers 0.0.27 wheels not built for torch 2.3.0

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-07-13 14:04:31 -04:00
Wing Lian
a159724e44 bump trl and accelerate for latest releases (#1730)
* bump trl and accelerate for latest releases

* ensure that the CI runs on new gh org

* drop kto_pair support since removed upstream
2024-07-10 11:15:44 -04:00
Wing Lian
c6d83a87c4 add support for .env files for env vars (#1724) 2024-07-02 13:17:40 -04:00
Wing Lian
5370cedf0c support for gemma2 w sample packing (#1718) 2024-06-29 01:38:55 -04:00
Wing Lian
851ccb1237 bump deepspeed for fix for grad norm compute putting tensors on different devices (#1699) 2024-06-09 17:13:28 -04:00
Wing Lian
c996881ec2 add support for rpo_alpha (#1681)
* add support for rpo_alpha

* Add smoke test for dpo + nll loss
2024-06-04 16:09:51 -04:00
Wing Lian
ef223519c9 update deps (#1663) [skip ci]
* update deps and tweak logic so axolotl is pip installable

* use vcs url format

* using dependency_links isn't supported per docs)
2024-05-28 11:23:34 -04:00
Wing Lian
039e2a0370 bump versions of deps (#1621)
* bump versions of deps

* bump transformers too

* fix xformers deps and include s3fs install
2024-05-15 13:27:44 -04:00
Antoni-Joan Solergibert
b32c08f8cc adding llama3 fastchat conversation monkeypatch (#1539)
* adding llama3 fastchat conversation monkeypatch

* Updated conversation turns to work with PR3259 of FastChat

* fixed bos token

* bump fastchat version

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-05-10 10:40:05 -04:00
Haoxiang Wang
60f5ce0569 Add support for Gemma chat template (#1530)
* Add support for Gemma chat template

* Update fschat version to include its newest support for Gemma chat style

* pin fastchat to current HEAD

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-04-21 19:55:40 -04:00
Wing Lian
7d1d22f72f ORPO Trainer replacement (#1551)
* WIP use trl ORPOTrainer

* fixes to make orpo work with trl

* fix the chat template laoding

* make sure to handle the special tokens and add_generation for assistant turn too
2024-04-19 17:25:36 -04:00