Wing Lian
af8d257aa2
make pad_to_sequence_len default to the same value as sample_packing ( #2941 ) [skip ci]
...
* make pad_to_sequence_len default to the same value as sample_packing
* remove duplicate validation
* fix test
* update description meta
Co-authored-by: NanoCode012 <nano@axolotl.ai >
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-07-21 11:40:56 -04:00
Dan Saunders
10ba1622f7
checkpoint model on first step callback ( #2906 )
...
* checkpoint model on first step callback
* remove debug
* add test cases; update existing tests not to save on first step
* move test out of solo
* delete
* default to False
* typo
2025-07-15 15:00:48 -04:00
Wing Lian
dd8bad06d0
remove strict=false from example yamls [skip ci] ( #2523 ) [skip ci]
2025-04-12 07:25:11 -07:00
Wing Lian
9f824ef76a
simplify the example configs to be more minimal and less daunting ( #2486 ) [skip ci]
...
* simplify the example configs to be more minimal and less daunting
* drop empty s2_attention from example yamls
2025-04-04 13:47:26 -04:00
NanoCode012
2c34a4634e
feat: add CCE for gemma3, cohere, and cohere2 ( #2443 )
...
* feat: add CCE for gemma3 and cohere1/2
* fix: change from relative import to absolute
* feat: add multipack for cohere&cohere2
* chore: improve comments
* fix: add gemma3_text
* feat: add cohere2 example
* fix: cohere forward
* fix: patch for cohere2
* feat: add command r v01 qlora sample
* chore: lint
* feat: upgrade gemma3 and gemma2 patch to use logits_to_keep
* chore: lint
* fix: add deprecate_kwarg decorator
* fix: add cce for gemma3 conditionalgeneration
* fix: gemma3 patch to defer logits calculation
* fix: patch gemma3 if given as model
* fix: remove not working config
* fix: update comments to clarify changes
* feat(doc): add supported models to readme
* fix: address difference in our cohere patch
* feat: add mistral3
* feat: add gemma
* feat(doc): update README to include gemma and mistral3 in supported models
* fix: gemma patch
* fix: import
* fix: gemma patch to be standalone
* fix: gemma3 warn about not support final_logit_softcapping
* feat: add mllama CCE
* chore: add abbireviation to doc
* fix: remove unneeded gemma3 eager warning
* fix: save processor if available
* fix: enable save processor on merge
* fix: wrong env meaning
2025-03-26 18:13:51 -04:00