Wing Lian
3ebf22464b
qlora-fsdp ram efficient loading with hf trainer ( #1791 )
...
* fix 405b with lower cpu ram requirements
* make sure to use doouble quant and only skip output embeddings
* set model attributes
* more fixes for sharded fsdp loading
* update the base model in example to use pre-quantized nf4-bf16 weights
* upstream fixes for qlora+fsdp
2024-07-30 19:21:38 -04:00
..
2024-05-15 12:44:13 -04:00
2024-05-15 12:44:13 -04:00
2024-07-12 21:24:01 -04:00
2024-05-15 12:44:13 -04:00
2024-05-15 12:44:13 -04:00
2024-05-15 12:44:13 -04:00
2024-06-29 01:38:55 -04:00
2024-05-15 12:44:13 -04:00
2024-05-15 12:44:13 -04:00
2024-05-15 12:44:13 -04:00
2024-05-15 12:44:13 -04:00
2024-07-30 19:21:38 -04:00
2024-05-15 12:44:13 -04:00
2024-05-15 12:44:13 -04:00
2024-05-15 12:44:13 -04:00
2024-05-15 12:44:13 -04:00
2024-06-04 16:20:25 -04:00
2024-05-15 12:44:13 -04:00
2024-05-15 12:44:13 -04:00
2024-05-15 12:44:13 -04:00
2024-06-07 16:38:29 -04:00
2024-05-15 12:44:13 -04:00
2024-05-15 12:44:13 -04:00
2024-05-15 12:44:13 -04:00
2024-05-15 12:44:13 -04:00
2024-05-22 08:34:06 -04:00
2024-05-15 12:44:13 -04:00
2024-05-15 12:44:13 -04:00