axolotl

Files

Wing Lian 71d4030b79 gradient accumulation tests, embeddings w pad_token fix, smaller models (#2059 )

* add more test cases for gradient accumulation and fix zero3

* swap out for smaller model

* fix missing return

* fix missing pad_token in config

* support concurrency for multigpu testing

* cast empty deepspeed to empty string for zero3 check

* fix temp_dir as fixture so parametrize works properly

* fix test file for multigpu evals

* don't use default

* don't use default for fsdp_state_dict_type

* don't use llama tokenizer w smollm

* also automatically cancel multigpu for concurrency

2024-11-14 12:59:00 -05:00

__init__.py

Attempt to run multigpu in PR CI for now to ensure it works (#1815 ) [skip ci]

2024-08-09 11:50:13 -04:00

test_eval.py

gradient accumulation tests, embeddings w pad_token fix, smaller models (#2059 )

2024-11-14 12:59:00 -05:00

test_llama.py

gradient accumulation tests, embeddings w pad_token fix, smaller models (#2059 )

2024-11-14 12:59:00 -05:00

test_qwen2.py

gradient accumulation tests, embeddings w pad_token fix, smaller models (#2059 )

2024-11-14 12:59:00 -05:00