VED
dcf24fd24e
feat: save checkpoint after training started (#3233)
* add:config parameters for checkpoint
* callback main
* test file_type fix
* lint
* unit
* simplify dict/obj handeling
* Update src/axolotl/utils/schemas/dynamic_checkpoint.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Delete tests/e2e/integrations/__init__.py
* remove hard code path in test
* device check
* lint
* Update src/axolotl/utils/callbacks/dynamic_checkpoint.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* Update src/axolotl/utils/callbacks/dynamic_checkpoint.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* Update src/axolotl/utils/schemas/dynamic_checkpoint.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* lint-2
* remove: singal based checkpoints
* lint
* remove signal tests
* add:is_main_process
* lint
* addis_d:istributed() for tests
* remove nested is_main_process
* Update src/axolotl/utils/schemas/dynamic_checkpoint.py
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* Update src/axolotl/utils/schemas/dynamic_checkpoint.py
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* add user_defined_filename
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>