* update readme to point to direct link to runpod template, cleanup install instrucitons
* default install flash-attn and auto-gptq now too
* update readme w flash-attn extra
* fix version in setup
* set early stopping metric to check
* tweak how load_best_model_at_end gets set for early stopping
* add validation for earl;y stopping patience
* remove negation
* save results to metrics in callback
* move early stopping callback after the benchmark evals
* broadcast metrics so early stopping works
* auto gptq support
* more tweaks and add yml
* remove old gptq docker
* don't need explicit peft install for tests
* fix setup.py to use extra index url
install torch for tests
fix cuda version for autogptq index
set torch in requirements so that it installs properly
move gptq install around to work with github cicd
* gptq doesn't play well with sample packing
* address pr feedback
* remove torch install for now
* set quantization_config from model config
* Fix the implementation for getting quant config from model config
* use flash_attn xentropy when available
* use flash_attn.ops.rms_norm when available
* log when xentropy is not found
* log how to install RMSNorm
* add quotes so pip install works
* fix: bad dtype for full finetune
* Update src/axolotl/utils/models.py
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* Update models.py
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* Added "eval_" prefix
* Added total bench accuracy and renamed the previous one to bench_average_accuracy. Changed naming to use bench_split instead of always using eval_ prefix.
* add mmlu callback
* use hf dataset for mmlu evals
* default to mmlu-zs
* make sure to define all the explicit positional args
* include metrics in callback
* another callback fix for collator max len attribute
* fix mmlu evals
* sample benchmarks, ensure we drop long samples
* fix the data file
* fix elif and add better messaging
* more fixes
* rename mmlu to bench
* more fixes
* dataset handling and aggregate across benchmark
* better handling when no subjects
* benchmark callback has its own dataloader and collator
* fixes
* updated dataset
* more fixes
* missing transformers import
* improve support for customized dataset for bench evals
* gather benchmarks from all ranks
* fix for gather across multiple gpus