* guard return if ring attn alrady registered
* add docs link, bits in multi-gpu docs, remove save model callback (subsumed by HF trainers)
* configurable heads_k_stride from ring-flash-attn hf adapter
* fix: update chat_template
* fix: handle gemma3 showing a lot of no content for turn 0
* fix: remove unknown config from examples
* fix: test
* fix: temporary disable gemma2 test
* fix: stop overwriting config.text_config unnecessarily
* fix: handling of set cache to the text_config section
* feat: add liger gemma support and bump liger to 0.5.5
* fix: add double use_cache setting
* fix: add support for final_logit_softcap in CCE for gemma2/3
* fix: set use_cache before model load
* feat: add missing layernorm override
* fix: handle gemma3 rmsnorm
* fix: use wrapper to pass dim as hidden_size
* fix: change dim to positional
* fix: patch with wrong mlp
* chore: refactor use_cache handling
* fix import issues
* fix tests.e2e.utils import
---------
Co-authored-by: Wing Lian <wing@axolotl.ai>
* hf offline decorator for tests to workaround rate limits
* fail quicker so we can see logs
* try new cache name
* limit files downloaded
* phi mini predownload
* offline decorator for phi tokenizer
* handle meta llama 8b offline too
* make sure to return fixtures if they are wrapped too
* more fixes
* more things offline
* more offline things
* fix the env var
* fix the model name
* handle gemma also
* force reload of modules to recheck offline status
* prefetch mistral too
* use reset_sessions so hub picks up offline mode
* more fixes
* rename so it doesn't seem like a context manager
* fix backoff
* switch out tinyshakespeare dataset since it runs a py script to fetch data and doesn't work offline
* include additional dataset
* more fixes
* more fixes
* replace tiny shakespeaere dataset
* skip some tests for now
* use more robust check using snapshot download to determine if a dataset name is on the hub
* typo for skip reason
* use local_files_only
* more fixtures
* remove local only
* use tiny shakespeare as pretrain dataset and streaming can't be offline even if precached
* make sure fixtures aren't offline
improve the offline reset
try bumping version of datasets
reorder reloading and setting
prime a new cache
run the tests now with fresh cache
try with a static cache
* now run all the ci again with hopefully a correct cache
* skip wonky tests for now
* skip wonky tests for now
* handle offline mode for model card creation
* add 12.8.1 cuda to the base matrix
* use nightly
* bump deepspeed and set no binary
* deepspeed binary fixes hopefully
* install deepspeed by itself
* multiline fix
* make sure ninja is installed
* try with reversion of packaging/setuptools/wheel install
* use license instead of license-file
* try rolling back packaging and setuptools versions
* comment out license for validation for now
* make sure packaging version is consistent
* more parity across tests and docker images for packaging/setuptools
* use default torch fused adamw optimizer as default as adamw_hf is deprecated
* make sure to have latest packaging installed
* bump packagingin requirements.txt too
* pass additional info for fix untrained tokens when using distributed + offloading
* use latest version of vendored lib
* use v0.0.5 of contribs lgpl
* fix for no bad tokens and add tests
* use release
* add multigpu test too
* make sure the multigpu zero3 test actually uses zero3
* add muon optimizer
optimizer_cls_and_kwargs is on trainer_kwargs
only add adamw_kwargs if they're non-null
fix mocks
better handling of override and check the optimizer
unwrap optimizer
* fix import
* override special tokens mock code
* fix(doc): remove duplicate config
* feat: replace added_tokens in tokenizer and add test
* make sure to run tokenizer modification on rank 0 only
* use is local main process instead
* feat: rename config
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai>
Co-authored-by: Wing Lian <wing@axolotl.ai>