* Added chatgml3 conversation type for training models like TinyLLama
* Added chatgml3 conversation type for training models like TinyLLama with lint
* Added chatgml3 conversation type for training models like TinyLLama with lint
* bump transformers and update attention class map name
* also run the tests in docker
* add mixtral e2e smoke test
* fix base name for docker image in test
* mixtral lora doesn't seem to work, at least check qlora
* add testcase for mixtral w sample packing
* check monkeypatch for flash attn multipack
* also run the e2e tests in docker
* use all gpus to run tests in docker ci
* use privileged mode too for docker w gpus
* rename the docker e2e actions for gh ci
* set privileged mode for docker and update mixtral model self attn check
* use fp16/bf16 for mixtral w fa2
* skip e2e tests on docker w gpus for now
* tests to validate mistral and mixtral patches
* fix rel import
* add config to model card
* rm space
* apply black formatting
* apply black formatting
* fix formatting
* check for cfg attribute
* add version
* add version
* put the config in a collapsible element
* put the config in a collapsible element
* Feat: Auto add to modules_to_save when adding tokens
* fix: swap to error instead of warning
* feat: add check when special_tokens differ and add test
* add torch to requirements.txt at build time to force version to stick
* fix xformers check
* better handling of xformers based on installed torch version
* fix for ci w/o torch
* start at index 0
* add test to check for missing turns
* apply black
* Update test_prompt_tokenizers.py
* Update src/axolotl/monkeypatch/fastchat_conversation_turns.py
Co-authored-by: Motoki Wu <tokestermw@gmail.com>
* fix linting
* apply black
* add more tests for llama/sharegpt
* make logic clearer
---------
Co-authored-by: Motoki Wu <tokestermw@gmail.com>
* fix: switch to using the HuggingFace Transformers NEFT implementation
* linter
* add support for noisy_embedding_alpha with a warning about it being renamed
* restore pre/posttrain_hooks
* move validation of NEFT noise alpha into validate_config()
* linter
* add check for zero3
* freeze parameters
* fixes for deepspeed loading
* fix model parameter check
* unfrozen parameters in example mixtral and logging when unfreezing
* Respect sequence_len in config for `type: llama2_chat`
It was hardcoded to `4096` I am not sure why? This updates it to pull from the config.
cc: @winglian
* Update llama2_chat.py
* apply black formatting
* fix tokenizer
* update test data
* lint fixtures
* mixtral multipack
* use mixtral model
* sample yml
* calculate cu_seqlens properly
* use updated flash ettention setting
* attn var checks
* force use of flash attention 2 for packing
* lint
* disable future fix for now
* update support table