* add mmlu callback * use hf dataset for mmlu evals * default to mmlu-zs * make sure to define all the explicit positional args * include metrics in callback * another callback fix for collator max len attribute * fix mmlu evals * sample benchmarks, ensure we drop long samples * fix the data file * fix elif and add better messaging * more fixes * rename mmlu to bench * more fixes * dataset handling and aggregate across benchmark * better handling when no subjects * benchmark callback has its own dataloader and collator * fixes * updated dataset * more fixes * missing transformers import * improve support for customized dataset for bench evals * gather benchmarks from all ranks * fix for gather across multiple gpus
29 lines
500 B
Plaintext
29 lines
500 B
Plaintext
packaging
|
|
peft @ git+https://github.com/huggingface/peft.git
|
|
transformers @ git+https://github.com/huggingface/transformers.git
|
|
bitsandbytes>=0.41.1
|
|
accelerate @ git+https://github.com/huggingface/accelerate@2a289f6108e77a77a4efffb3f6316bc98538413b
|
|
addict
|
|
evaluate
|
|
fire
|
|
PyYAML>=6.0
|
|
datasets
|
|
flash-attn>=2.0.8
|
|
sentencepiece
|
|
wandb
|
|
einops
|
|
xformers
|
|
optimum
|
|
hf_transfer
|
|
colorama
|
|
numba
|
|
numpy>=1.24.4
|
|
# qlora things
|
|
bert-score==0.3.13
|
|
evaluate==0.4.0
|
|
rouge-score==0.1.2
|
|
scipy
|
|
scikit-learn==1.2.2
|
|
pynvml
|
|
art
|