axolotl

tocmo0nlord/axolotl

Fork 0

Commit Graph

Author	SHA1	Message	Date
Wing Lian	71d4030b79	gradient accumulation tests, embeddings w pad_token fix, smaller models (#2059 ) * add more test cases for gradient accumulation and fix zero3 * swap out for smaller model * fix missing return * fix missing pad_token in config * support concurrency for multigpu testing * cast empty deepspeed to empty string for zero3 check * fix temp_dir as fixture so parametrize works properly * fix test file for multigpu evals * don't use default * don't use default for fsdp_state_dict_type * don't use llama tokenizer w smollm * also automatically cancel multigpu for concurrency	2024-11-14 12:59:00 -05:00
NanoCode012	5c7e89105d	Fix: modelloader handling of model_kwargs load_inbit (#1999 ) fix: load_in_bit not properly read fix: load_bit check fix: typo * refactor: load * bit handling * feat: add test dpo lora multi-gpu * fix: turn off sample packing for dpo * fix: missing warmup_steps * fix: test to load in 8bit for lora * skip 8bit lora on h100, add 4bit lora on h100 to multi gpu tests * chore: reduce max_steps --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-10-30 14:41:34 -04:00
Wing Lian	5b0b774e38	ensure that the bias is also in the correct dtype (#1848 ) [skip ci] * ensure that the bias is also in the correct dtype * add nightly for dpo-qlora-fsdp	2024-08-22 11:45:00 -04:00

Author

SHA1

Message

Date

Wing Lian

71d4030b79

gradient accumulation tests, embeddings w pad_token fix, smaller models (#2059 )

* add more test cases for gradient accumulation and fix zero3

* swap out for smaller model

* fix missing return

* fix missing pad_token in config

* support concurrency for multigpu testing

* cast empty deepspeed to empty string for zero3 check

* fix temp_dir as fixture so parametrize works properly

* fix test file for multigpu evals

* don't use default

* don't use default for fsdp_state_dict_type

* don't use llama tokenizer w smollm

* also automatically cancel multigpu for concurrency

2024-11-14 12:59:00 -05:00

NanoCode012

5c7e89105d

Fix: modelloader handling of model_kwargs load_in*bit (#1999 )

* fix: load_in_*bit not properly read

* fix: load_*bit check

* fix: typo

* refactor: load * bit handling

* feat: add test dpo lora multi-gpu

* fix: turn off sample packing for dpo

* fix: missing warmup_steps

* fix: test to load in 8bit for lora

* skip 8bit lora on h100, add 4bit lora on h100 to multi gpu tests

* chore: reduce max_steps

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>

2024-10-30 14:41:34 -04:00

Wing Lian

5b0b774e38

ensure that the bias is also in the correct dtype (#1848 ) [skip ci]

* ensure that the bias is also in the correct dtype

* add nightly for dpo-qlora-fsdp

2024-08-22 11:45:00 -04:00

3 Commits