Wing Lian
afb8218c67
fix the monkeypatch
2024-11-19 02:12:33 -05:00
Wing Lian
1ff78d6347
remove temp_dir decorator as we're using fixtures now
2024-11-19 01:28:27 -05:00
Wing Lian
613a217142
monkeypatch for zero3 w 8bit lora
2024-11-19 00:45:20 -05:00
Wing Lian
127953af4e
zero3 can'y use 8bit optimizer
2024-11-19 00:45:20 -05:00
Wing Lian
920ea77bdf
reduce number of steps
2024-11-19 00:45:20 -05:00
Wing Lian
ef60e3e851
bi-weekly 8bit lora zero3 check
2024-11-19 00:45:20 -05:00
Wing Lian
c07bd2fa65
Readme updates v2 ( #2078 )
...
* update readme logos
* use full logo
* Fix svgs
* add srcset
* resize svgs to match
* Rename file
* align badges center
2024-11-18 14:58:03 -05:00
Wing Lian
ed079d434a
static assets, readme, and badges update v1 ( #2077 )
2024-11-18 13:59:32 -05:00
Wing Lian
8403c67156
don't build bdist ( #2076 ) [skip ci]
2024-11-18 12:36:03 -05:00
Wing Lian
9871fa060b
optim e2e tests to run a bit faster ( #2069 ) [skip ci]
...
* optim e2e tests to run a bit faster
* run prequant w/o lora_modules_to_save
* use smollm2
2024-11-18 12:35:31 -05:00
Wing Lian
70cf79ef52
upgrade autoawq==0.2.7.post2 for transformers fix ( #2070 )
...
* point to upstream autoawq for transformers fix
* use autoawq 0.2.7 release
* test wheel for awq
* try different format for wheel def
* autoawq re-release
* Add intel_extension_for_pytorch dep
* ipex gte version
* forcefully remove intel-extension-for-pytorch
* add -y option to pip uninstall for ipex
* use post2 release for autoawq and remove uninstall of ipex
2024-11-18 11:53:37 -05:00
Wing Lian
c06b8f0243
increase worker count to 8 for basic pytests ( #2075 ) [skip ci]
2024-11-18 11:52:35 -05:00
Chirag Jain
0c8b1d824a
Update get_unpad_data patching for multipack ( #2013 )
...
* Update `get_unpad_data` patching for multipack
* Update src/axolotl/utils/models.py
* Update src/axolotl/utils/models.py
* Add test case
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2024-11-15 20:35:50 -05:00
NanoCode012
fd70eec577
fix: loading locally downloaded dataset ( #2056 ) [skip ci]
2024-11-15 20:35:26 -05:00
Wing Lian
d42f202046
Fsdp grad accum monkeypatch ( #2064 )
2024-11-15 19:11:04 -05:00
Wing Lian
0dabde1962
support for schedule free and e2e ci smoke test ( #2066 ) [skip ci]
...
* support for schedule free and e2e ci smoke test
* set default lr scheduler to constant in test
* ignore duplicate code
* fix quotes for config/dict
2024-11-15 19:10:14 -05:00
Wing Lian
15f1462ccd
support passing trust_remote_code to dataset loading ( #2050 ) [skip ci]
...
* support passing trust_remote_code to dataset loading
* add doc for trust_remote_code in dataset config
2024-11-15 19:09:48 -05:00
Wing Lian
521e62daf1
remove the bos token from dpo outputs ( #1733 ) [skip ci]
...
* remove the bos token from dpo outputs
* don't forget to fix prompt_input_ids too
* use processing_class instead of tokenizer
* fix for processing class
2024-11-15 19:09:20 -05:00
Wing Lian
c16ec398d7
update to be deprecated evaluation_strategy ( #1682 ) [skip ci]
...
* update to be deprecated evaluation_strategy and c4 dataset
* chore: lint
* remap eval strategy to new config and add tests
2024-11-15 19:09:00 -05:00
Wing Lian
2f20cb7ebf
upgrade datasets==3.1.0 and add upstream check ( #2067 ) [skip ci]
2024-11-15 19:08:38 -05:00
Wing Lian
71d4030b79
gradient accumulation tests, embeddings w pad_token fix, smaller models ( #2059 )
...
* add more test cases for gradient accumulation and fix zero3
* swap out for smaller model
* fix missing return
* fix missing pad_token in config
* support concurrency for multigpu testing
* cast empty deepspeed to empty string for zero3 check
* fix temp_dir as fixture so parametrize works properly
* fix test file for multigpu evals
* don't use default
* don't use default for fsdp_state_dict_type
* don't use llama tokenizer w smollm
* also automatically cancel multigpu for concurrency
2024-11-14 12:59:00 -05:00
Wing Lian
f3a5d119af
fix env var extraction ( #2043 ) [skip ci]
2024-11-14 12:58:06 -05:00
Wing Lian
ba219b51a5
fix duplicate base build ( #2061 ) [skip ci]
2024-11-14 10:31:19 -05:00
Wing Lian
5be8e13d35
make sure to add tags for versioned tag on cloud docker images ( #2060 )
2024-11-14 10:24:49 -05:00
Wing Lian
2d7830fda6
upgrade to flash-attn 2.7.0 ( #2048 )
2024-11-14 06:59:25 -05:00
Wing Lian
5e98cdddac
Grokfast support ( #1917 )
2024-11-13 17:10:36 -05:00
Sunny Liu
1d7aee0ad2
ADOPT optimizer integration ( #2032 ) [skip ci]
...
* adopt integration
* stuff
* doc and test for ADOPT
* rearrangement
* fixed formatting
* hacking pre-commit
* chore: lint
* update module doc for adopt optimizer
* remove un-necessary example yaml for adopt optimizer
* skip test adopt if torch<2.5.1
* formatting
* use version.parse
* specifies required torch version for adopt_adamw
---------
Co-authored-by: sunny <sunnyliu19981005@gmail.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2024-11-13 17:10:17 -05:00
Wing Lian
659ee5d723
don't cancel the tests on main automatically for concurrency ( #2055 ) [skip ci]
2024-11-13 17:07:41 -05:00
Sunny Liu
342935cff3
Update unsloth for torch.cuda.amp deprecation ( #2042 )
...
* update deprecated unsloth tirch cuda amp decorator
* WIP fix torch.cuda.amp deprecation
* lint
* laxing torch version requirement
* remove use of partial
* remove use of partial
* lint
---------
Co-authored-by: sunny <sunnyliu19981005@gmail.com >
2024-11-13 15:17:34 -05:00
Wing Lian
c5eb9ea2c2
fix push to main and tag semver build for docker ci ( #2054 )
2024-11-13 14:04:28 -05:00
Wing Lian
f2145a3ccb
add default torch version if not installed, and support for xformers new wheels ( #2049 )
2024-11-13 13:16:47 -05:00
Wing Lian
010d0e7ff3
retry flaky test_packing_stream_dataset test that timesout on read ( #2052 ) [skip ci]
2024-11-13 13:16:16 -05:00
Wing Lian
01881c3113
make sure to tag images in docker for tagged releases ( #2051 ) [skip ci]
...
* make sure to tag images in docker for tagged releases
* fix tag event
2024-11-13 13:15:49 -05:00
Wing Lian
0e8eb96e07
run pypi release action on tag create w version ( #2047 )
2024-11-13 10:21:48 -05:00
NanoCode012
4e1891b12b
feat: upgrade to liger 0.4.1 ( #2045 )
2024-11-13 10:07:24 -05:00
NanoCode012
28924fc791
feat: cancel ongoing tests if new CI is triggered ( #2046 ) [skip ci]
2024-11-13 10:06:59 -05:00
NanoCode012
8c480b2804
fix: inference not using chat_template ( #2019 ) [skip ci]
2024-11-13 10:06:41 -05:00
Oliver Molenschot
a4b1cc6df0
Add example YAML file for training Mistral using DPO ( #2029 ) [skip ci]
...
* Add example YAML file for training Mistral using DPO
* chore: lint
* Apply suggestions from code review
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update mistral-dpo.yml
Adding qlora and removing role-related data (unecessary)
* Rename mistral-dpo.yml to mistral-dpo-qlora.yml
* Apply suggestions from code review
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2024-11-13 10:06:25 -05:00
NanoCode012
7b78a31593
feat: print out dataset length even if not preprocess ( #2034 ) [skip ci]
2024-11-13 10:06:00 -05:00
Wing Lian
810ebc2c0e
invert the string in string check for p2p device check ( #2044 )
2024-11-12 23:20:47 -05:00
Wing Lian
ad435a3b09
add P2P env when multi-gpu but not the full node ( #2041 )
...
Co-authored-by: Wing Lian <wing@axolotl.ai >
2024-11-12 17:58:26 -05:00
NanoCode012
9f1cf9b17c
fix: handle sharegpt dataset missing ( #2035 )
...
* fix: handle sharegpt dataset missing
* fix: explanation
* feat: add test
2024-11-12 12:51:37 +07:00
Wing Lian
3931a42763
change deprecated modal Stub to App ( #2038 )
2024-11-11 15:10:34 -05:00
NanoCode012
dc8f9059f7
feat: add metharme chat_template ( #2033 ) [skip ci]
...
* feat: add metharme chat_template
* fix: add eos token
2024-11-11 15:09:58 -05:00
Wing Lian
234e94e9dd
replace references to personal docker hub to org docker hub ( #2036 ) [skip ci]
2024-11-11 15:09:29 -05:00
Wing Lian
f68fb71005
update actions version for node16 deprecation ( #2037 ) [skip ci]
...
* update actions version for node16 deprecation
* update pre-commit/action to use 3.0.1 for actions/cache@v4 dep
* update docker/setup-buildx-action too to v3
2024-11-11 15:09:11 -05:00
Wing Lian
9bc3ee6c75
add axolotlai docker hub org to publish list ( #2031 )
...
* add axolotlai docker hub org to publish list
* fix to use latest actions docker metadata version
* fix list in yaml for expected format for action
* missed a change
2024-11-11 09:48:19 -05:00
Wing Lian
d356740ffa
move deprecated kwargs from trainer to trainingargs ( #2028 )
2024-11-10 12:45:47 -05:00
Wing Lian
e4af51eb66
remove direct dependency on fused dense lib ( #2027 )
publish pypi / Upload release to PyPI (push) Has been cancelled
v0.5.0
2024-11-08 14:48:04 -05:00
Wing Lian
e20b15bee3
make publish to pypi manually dispatchable as a workflow ( #2026 ) [skip ci]
2024-11-08 14:18:16 -05:00