* Allow usage of native Mistral FA when no sample_packing
* fix: do not apply custom patch when sample_pack off
* chore: lint
* chore: pin transformer to v4.35.0.dev0
* fix: split sample_packing to separate test
* add mistral monkeypatch
* add arg for decoder attention masl
* fix lint for duplicate code
* make sure to update transformers too
* tweak install for e2e
* move mistral patch to conditional
* use fastchat conversations template
* require fastchat (fschat) pip install
* handle roles dynamically from conversation
* tweak fastchat conversation with a monkeypatch to get individual turns
* fix up so it works with multiple conversation styles, and don't strip the turns
* fix sharegpt fixture now that we're using a more correct tokenization
* use a new prompter and support fastchat conversation type
* use sharegpt from prompt strategies now
* update docs, add chatml template
* add a newline after im_end token
* ensure we correctly set system message
* update per PR feedback to handle deprecated sharegpt types
* don't add duplicate wandb req
* make sharegpt fields configurable from yml
* llama2 fixes
* don't fail fatally when turns are improper
* Feat: Add support for upstream FA2
* chore: add is_falcon_derived_model: true to examples
* chore: add config to readme for documentation
* feat: add extra model types
* fix: remove old falcon flash patch
* chore: pin transformers and accelerate
* update readme to point to direct link to runpod template, cleanup install instrucitons
* default install flash-attn and auto-gptq now too
* update readme w flash-attn extra
* fix version in setup
* auto gptq support
* more tweaks and add yml
* remove old gptq docker
* don't need explicit peft install for tests
* fix setup.py to use extra index url
install torch for tests
fix cuda version for autogptq index
set torch in requirements so that it installs properly
move gptq install around to work with github cicd
* gptq doesn't play well with sample packing
* address pr feedback
* remove torch install for now
* set quantization_config from model config
* Fix the implementation for getting quant config from model config
* add mmlu callback
* use hf dataset for mmlu evals
* default to mmlu-zs
* make sure to define all the explicit positional args
* include metrics in callback
* another callback fix for collator max len attribute
* fix mmlu evals
* sample benchmarks, ensure we drop long samples
* fix the data file
* fix elif and add better messaging
* more fixes
* rename mmlu to bench
* more fixes
* dataset handling and aggregate across benchmark
* better handling when no subjects
* benchmark callback has its own dataloader and collator
* fixes
* updated dataset
* more fixes
* missing transformers import
* improve support for customized dataset for bench evals
* gather benchmarks from all ranks
* fix for gather across multiple gpus
* flash attn pip
* add packaging
* add packaging to apt get
* install flash attn in dockerfile
* remove unused whls
* add wheel
* clean up pr
fix packaging requirement for ci
upgrade pip for ci
skip build isolation for requiremnents to get flash-attn working
install flash-attn seperately
* install wheel for ci
* no flash-attn for basic cicd
* install flash-attn as pip extras
---------
Co-authored-by: Ubuntu <mgh@mgh-vm.wsyvwcia0jxedeyrchqg425tpb.ax.internal.cloudapp.net>
Co-authored-by: mhenrichsen <some_email@hey.com>
Co-authored-by: Mads Henrichsen <mads@BrbartiendeMads.lan>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* fix attetion mask with packing
* set position ids and use block diagonal attn mask
* fix expand mask for multiple batch items, make sure we pad position_ids
* don't move masks to cpu
* use multi pack dataloader w random sampler
* add position_ids back
* more fixes for dataloader integration
* est total tokens, fix field loop
* more fixes, position_ids seems broken
* more fixes for sample packing
* use distributed sampler, avoid accelerate prepare
* use accelerator prepare for dataloader
* fix for position_ids w packing
* Update src/axolotl/utils/dataloader.py
* validation for sample packing and doc
* more fixes for 4k and optimizations
* optimized expand mask fn
* better handling of variance in multipack dataloader length and trainer hanging when it runs out of data
* fix rounding of len of batches to int
* better handling so that all devices have the same dataloader len
* fix step calc for packing
* pass sample packing efficiency to training args
* add a test for the mask expansion for sequence packing
* only process eval dataset for packing if not None
* don't split batches when packing
* weighted CE losses
* weighted CEL fixes
* limit packing to sequences of max seq len
* seq_len_multiple for packing
* make sure the chunk size is an int
* sample_packing_seq_len_multiplier config
* use cumulative seq len with var len flash attn v2 w packing
* properly calculate max len
* fix flash-attn, xformers, packing, support chatml
* fix chatml system prompt for openorca, legacy tokenizer opts
* add chatml
* add unit tests for cum seq lens, add ability to build cu_seq_lens from positional ids, fix prompt test
* fix test and pylint checks
* more packing and dataset optimizations and fixes
* filter w multiple cpus
* more fixes and optimizations
* fixes and go back to distributed sampler since batch sampler won't work
* fix counts by accounting for num devices
* fix steps calculation
* previous accelerate is still most performant
* add numba to requirements.
* use custom distributed checks
* fix sampler to prevent overfit w new epochs
* let's not cleanup the cached datasets
* calculate cum seq lens with pos_ids instead of mask, simplify packing params, fix distributed barrier
* speed optimizations and set accelerate fsdp env vars
* optimize dataset concatenation?
* more optimizations for dataset handling
* fix import for annotation
* manual pre-commit fixes
* another sum optimization and bug fix for calc steps
* fix packing estimations
* fix formatting
* pylint problems
* add back flash attention branch for handling unpacked sequences seperately
* Address PR feedback
* add optional sample packing config params to readme