Salman Mohammadi
e1a8dfbe8c
pinning transformers version
2025-04-08 17:17:23 +01:00
Sung Ching Liu
a8f38c367c
Flex Attention + Packing with BlockMask support ( #2363 )
2025-04-05 18:02:57 -04:00
Dan Saunders
c907ac173e
adding pre-commit auto-update GH action and bumping plugin versions ( #2428 )
...
* adding pre-commit auto-update GH action and bumping plugin versions
* running updated pre-commit plugins
* sorry to revert, but pylint complained
* Update .pre-commit-config.yaml
Co-authored-by: Wing Lian <wing.lian@gmail.com >
---------
Co-authored-by: Dan Saunders <dan@axolotl.ai >
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2025-03-21 11:02:43 -04:00
Wing Lian
ffae8d6a95
GRPO ( #2307 )
2025-02-13 16:01:01 -05:00
Wing Lian
30046315d9
disable ray tests for latest torch release ( #2328 )
...
* disable ray tests for latest torch release
* move decorator from class to method
2025-02-12 18:29:02 -05:00
Wing Lian
cf17649ef3
Misc fixes 20250130 ( #2301 )
...
* misc fixes for garbage collection and L40S w NCCL P2P
* patch bnb fix for triton check
* chore: lint
* change up import
* try patching differently
* remove patch for bnb fix for now
* more verbose checks and tweak train loss threshold
2025-01-31 08:58:04 -05:00
salman
c071a530f7
removing 2.3.1 ( #2294 )
2025-01-28 23:23:44 -05:00
Wing Lian
dd26cc3c0f
add helper to verify the correct model output file exists ( #2245 )
...
* add helper to verify the correct model output file exists
* more checks using helper
* chore: lint
* fix import and relora model check
* workaround for trl trainer saves
* remove stray print
2025-01-13 10:43:29 -05:00
Wing Lian
fb3352e21c
rename liger test so it properly runs in ci ( #2246 )
2025-01-09 17:31:43 -05:00
Wing Lian
a1790f2652
replace tensorboard checks with helper function ( #2120 ) [skip ci]
...
* replace tensorboard checks with helper function
* move helper function
* use relative
2024-12-03 21:06:20 -05:00
Sunny Liu
d5f58b6509
Check torch version for ADOPT optimizer + integrating new ADOPT updates ( #2104 )
...
* added torch check for adopt, wip
* lint
* gonna put torch version checking somewhere else
* added ENVcapabilities class for torch version checking
* lint + pydantic
* ENVCapabilities -> EnvCapabilities
* forgot to git add v0_4_1/__init__.py
* removed redundancy
* add check if env_capabilities not specified
* make env_capabilities compulsory [skip e2e]
* fixup env_capabilities
* modified test_validation.py to accomodate env_capabilities
* adopt torch version test [skip e2e]
* raise error
* test correct torch version
* test torch version above requirement
* Update src/axolotl/utils/config/models/input/v0_4_1/__init__.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* removed unused is_totch_min
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-12-02 20:15:39 -05:00
Sunny Liu
1d7aee0ad2
ADOPT optimizer integration ( #2032 ) [skip ci]
...
* adopt integration
* stuff
* doc and test for ADOPT
* rearrangement
* fixed formatting
* hacking pre-commit
* chore: lint
* update module doc for adopt optimizer
* remove un-necessary example yaml for adopt optimizer
* skip test adopt if torch<2.5.1
* formatting
* use version.parse
* specifies required torch version for adopt_adamw
---------
Co-authored-by: sunny <sunnyliu19981005@gmail.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2024-11-13 17:10:17 -05:00
NanoCode012
5c7e89105d
Fix: modelloader handling of model_kwargs load_in*bit ( #1999 )
...
* fix: load_in_*bit not properly read
* fix: load_*bit check
* fix: typo
* refactor: load * bit handling
* feat: add test dpo lora multi-gpu
* fix: turn off sample packing for dpo
* fix: missing warmup_steps
* fix: test to load in 8bit for lora
* skip 8bit lora on h100, add 4bit lora on h100 to multi gpu tests
* chore: reduce max_steps
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-10-30 14:41:34 -04:00
Wing Lian
00568c1539
support for true batches with multipack ( #1230 )
...
* support for true batches with multipack
* patch the map dataset fetcher to handle batches with packed indexes
* patch 4d mask creation for sdp attention
* better handling for BetterTransformer
* patch general case for 4d mask
* setup forward patch. WIP
* fix patch file
* support for multipack w/o flash attention for llama
* cleanup
* add warning about bf16 vs fp16 for multipack with sdpa
* bugfixes
* add 4d multipack tests, refactor patches
* update tests and add warnings
* fix e2e file check
* skip sdpa test if not at least torch 2.1.1, update docs
2024-02-01 10:18:42 -05:00
Wing Lian
b3a61e8ce2
add e2e tests for checking functionality of resume from checkpoint ( #865 )
...
* use tensorboard to see if resume from checkpoint works
* make sure e2e test is either fp16 or bf16
* set max_steps and save limit so we have the checkpoint when testing resuming
* fix test parameters
2023-11-15 23:05:55 -05:00
Wing Lian
6dc68a653f
use temp_dir kwarg instead
2023-11-06 18:33:01 -05:00
Wing Lian
c74f045ba7
chore: lint
2023-11-06 18:33:01 -05:00