Dan Saunders
79ddaebe9a
Add ruff, remove black, isort, flake8, pylint ( #3092 )
...
* black, isort, flake8 -> ruff
* remove unused
* add back needed import
* fix
2025-08-23 23:37:33 -04:00
Dan Saunders
35fdbce102
Ensure device mesh patching is applied ( #2842 )
...
* move patches; make patch stronger
* fix broken tests
* guard sequence_parallel_degree comparison against none
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-06-29 22:16:32 -04:00
NanoCode012
21388cf615
Fix: lora kernel pre-patch applied despite post-patch not applied ( #2772 )
...
* fix: do not pre-patch self attention if lora dropout non-zero
* fix: add test to check patch not applied
* fix: test
* fix: test config check
* fix where we check so that tests don't break
* fix: test
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-06-14 11:54:06 -07:00
Dan Saunders
2962a398b7
Lora kernels fix ( #2732 )
...
* fix lora kernel patching and improve test
* simplification
2025-05-28 10:03:43 -04:00
Wing Lian
0f3587174d
swap tinymodels that have safetensors for some ci tests ( #2641 )
2025-05-07 15:06:07 -04:00
Dan Saunders
ecac731922
auto-enable lora kernels where possible ( #2589 )
...
* auto-enable lora kernels where possible
* test
* revert change to example yaml
* naming
* remove print
* slight logic change
2025-04-29 16:18:49 -04:00
Wing Lian
07e4f2e25b
support for qwen3 with lora kernels ( #2588 )
...
* support for qwen3 with lora kernels
* fix patch
* typo
2025-04-29 15:02:49 -04:00
Wing Lian
198d775d6d
make sure the all of the model is on the same device, so this test will pass on multigpu ( #2524 ) [skip ci]
2025-04-15 22:15:42 -07:00
NanoCode012
2c34a4634e
feat: add CCE for gemma3, cohere, and cohere2 ( #2443 )
...
* feat: add CCE for gemma3 and cohere1/2
* fix: change from relative import to absolute
* feat: add multipack for cohere&cohere2
* chore: improve comments
* fix: add gemma3_text
* feat: add cohere2 example
* fix: cohere forward
* fix: patch for cohere2
* feat: add command r v01 qlora sample
* chore: lint
* feat: upgrade gemma3 and gemma2 patch to use logits_to_keep
* chore: lint
* fix: add deprecate_kwarg decorator
* fix: add cce for gemma3 conditionalgeneration
* fix: gemma3 patch to defer logits calculation
* fix: patch gemma3 if given as model
* fix: remove not working config
* fix: update comments to clarify changes
* feat(doc): add supported models to readme
* fix: address difference in our cohere patch
* feat: add mistral3
* feat: add gemma
* feat(doc): update README to include gemma and mistral3 in supported models
* fix: gemma patch
* fix: import
* fix: gemma patch to be standalone
* fix: gemma3 warn about not support final_logit_softcapping
* feat: add mllama CCE
* chore: add abbireviation to doc
* fix: remove unneeded gemma3 eager warning
* fix: save processor if available
* fix: enable save processor on merge
* fix: wrong env meaning
2025-03-26 18:13:51 -04:00
NanoCode012
9f00465a5c
Feat: Add support for gemma3_text and add e2e for gemma2 ( #2406 )
2025-03-22 20:33:21 -04:00
Dan Saunders
c907ac173e
adding pre-commit auto-update GH action and bumping plugin versions ( #2428 )
...
* adding pre-commit auto-update GH action and bumping plugin versions
* running updated pre-commit plugins
* sorry to revert, but pylint complained
* Update .pre-commit-config.yaml
Co-authored-by: Wing Lian <wing.lian@gmail.com >
---------
Co-authored-by: Dan Saunders <dan@axolotl.ai >
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2025-03-21 11:02:43 -04:00
Dan Saunders
3d8425fa91
Activation function Triton kernels, LoRA custom autograd functions ( #2324 )
...
* LoRA + activation fn Triton kernels: initial commit
* implementing optims
* finalizing MLP LoRA kernels and progress on QKV / W kernels
* updates
* O projection optim
* adding monkey patching logic
* doc strings, typing, pre-commit fixes
* updates
* adding lora 8b kernels example
* working on fsdp support
* tests and fixes
* small fixes, getting tests to pass, adding doc strings
* integration tests for LoRA patching
* config.qmd
* remove unneeded pytest fixture
* fix
* review comments first pass
* improving tests, attention class agnostic patching
* adding support for more archs
* wip SiLU / GELU impls
* improved testing, small updates, etc.
* slightly updating docs
* rebase
* fixing test_attention_patching_integration
* additional review comments, fixing test in CI (hopefully)
* isolating problematic patching test
* relaxing allclose threshold to reduce flakiness
* fixing accidental change
* adding model arch agnostic attention class fetching
* removing unused activations
2025-02-17 14:23:15 -05:00