feat: add CCE for gemma3, cohere, and cohere2 (#2443)
* feat: add CCE for gemma3 and cohere1/2 * fix: change from relative import to absolute * feat: add multipack for cohere&cohere2 * chore: improve comments * fix: add gemma3_text * feat: add cohere2 example * fix: cohere forward * fix: patch for cohere2 * feat: add command r v01 qlora sample * chore: lint * feat: upgrade gemma3 and gemma2 patch to use logits_to_keep * chore: lint * fix: add deprecate_kwarg decorator * fix: add cce for gemma3 conditionalgeneration * fix: gemma3 patch to defer logits calculation * fix: patch gemma3 if given as model * fix: remove not working config * fix: update comments to clarify changes * feat(doc): add supported models to readme * fix: address difference in our cohere patch * feat: add mistral3 * feat: add gemma * feat(doc): update README to include gemma and mistral3 in supported models * fix: gemma patch * fix: import * fix: gemma patch to be standalone * fix: gemma3 warn about not support final_logit_softcapping * feat: add mllama CCE * chore: add abbireviation to doc * fix: remove unneeded gemma3 eager warning * fix: save processor if available * fix: enable save processor on merge * fix: wrong env meaning
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
# Cut Cross Entropy
|
||||
|
||||
Cut Cross Entropy reduces VRAM usage through optimization on the cross-entropy operation during loss calculation.
|
||||
Cut Cross Entropy (CCE) reduces VRAM usage through optimization on the cross-entropy operation during loss calculation.
|
||||
|
||||
See https://github.com/apple/ml-cross-entropy
|
||||
|
||||
@@ -29,6 +29,20 @@ plugins:
|
||||
cut_cross_entropy: true
|
||||
```
|
||||
|
||||
## Supported Models
|
||||
|
||||
- llama
|
||||
- phi3
|
||||
- gemma
|
||||
- gemma2
|
||||
- gemma3
|
||||
- gemma3_text
|
||||
- mistral
|
||||
- mistral3
|
||||
- qwen2
|
||||
- cohere
|
||||
- cohere2
|
||||
|
||||
## Citation
|
||||
|
||||
```bib
|
||||
|
||||
Reference in New Issue
Block a user