Files
axolotl/requirements.txt
Wing Lian 132eb740f0 DBRX Model Support (#1462)
* wip for dbrx finetuning

* add fastcore for parallel loading of sharded weights

* fix dtype for load, use PartialState instead of accelerator to init process group, remove redundant wandb callback

* update to use v2 of the converted model

* more fixes for dbrx loras

* make sure to enable fsdp activation checkpointing

* fix support for 8bit loras too for dbrx

* apply z3 leaf moe fix for DBRX with deepspeed

* don't raise value error since child module searches could fail and be ok

* revert a previous change to fix fsdp

* update mistral/mistral qlora+fsdp yamls

* fix qlora+fsdp quant storage type

* more edge cases for qlora-fsdp

* fixes for fsdp+qlora w optimizer in 8bit

* add bigstral z3 config and make sure to use full_state_dict for fsdp
2024-04-12 09:02:36 -04:00

45 lines
766 B
Plaintext

--extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
packaging==23.2
peft==0.10.0
transformers @ git+https://github.com/huggingface/transformers.git@43d17c18360ac9c3d3491389328e2fe55fe8f9ce
tokenizers==0.15.0
bitsandbytes==0.43.0
accelerate==0.28.0
deepspeed==0.13.1
pydantic==2.6.3
addict
fire
PyYAML>=6.0
requests
datasets>=2.15.0
flash-attn==2.5.5
sentencepiece
wandb
einops
xformers==0.0.22
optimum==1.16.2
hf_transfer
colorama
numba
numpy>=1.24.4
# qlora things
evaluate==0.4.1
scipy
scikit-learn==1.2.2
pynvml
art
fschat==0.2.36
gradio==3.50.2
tensorboard
mamba-ssm==1.2.0.post1
# remote filesystems
s3fs
gcsfs
# adlfs
trl @ git+https://github.com/huggingface/trl.git@0ee349dcd43b0f4b3169449f16751c38ac4a609f
zstandard==0.22.0
fastcore