* add mmlu callback
* use hf dataset for mmlu evals
* default to mmlu-zs
* make sure to define all the explicit positional args
* include metrics in callback
* another callback fix for collator max len attribute
* fix mmlu evals
* sample benchmarks, ensure we drop long samples
* fix the data file
* fix elif and add better messaging
* more fixes
* rename mmlu to bench
* more fixes
* dataset handling and aggregate across benchmark
* better handling when no subjects
* benchmark callback has its own dataloader and collator
* fixes
* updated dataset
* more fixes
* missing transformers import
* improve support for customized dataset for bench evals
* gather benchmarks from all ranks
* fix for gather across multiple gpus
* Add Metharme tokenizing strategy
This strategy accounts for how the Metharme JSONLs are formatted as well as adds duplicated EOS tokens which can help trim model output length.
I haven't gotten the chance to test this yet, and probably won't have the chance for quite a bit, so I'm committing this now.
* Redo Metharme tokenizing strategy
lol
* fix: oops
* Rearrange a conditional
* chore: reformat code in accordance with linter
* chore: Make lint not freak out
* chore: fix lint
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* recast loralayer, norm, lmhead + embed token weights per original qlora
* try again for the fix
* refactor torch dtype picking
* linter fixes
* missing import for LoraLayer
* fix install for tests now that peft is involved
* support user defined prompters, pretokenized datasets in config, local parquet, local arrow files
* fix user defined dataset types
* fix for system prompts
* fix tests
* fix checks for parquet and arrow
* aha moment that d.data_files isn't used
* add documentation for ds_type to add support for parquet and arrow
* split sdp attn into its own patch
* sync xformers patch to follow shared format and be diffable
* update flash-attn patch for 70B/GQA and inference using helper from flash-attn tests
* speed up flash-attn inference
* fix patch to check position ids and don't use multipack for evals
* copy LlamaModel.forward and LlamaDecoderLayer.forward into monkeypatch
* update forwards so we only calculate cu_seqlens once
* enable eval dataloader using multipack again
* fix the patch to work properly and work with FSDP
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>