feat: save checkpoint after training started (#3233)

* add:config parameters for checkpoint

* callback main

* test file_type fix

* lint

* unit

* simplify dict/obj handeling

* Update src/axolotl/utils/schemas/dynamic_checkpoint.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* Delete tests/e2e/integrations/__init__.py

* remove hard code path in test

* device check

* lint

* Update src/axolotl/utils/callbacks/dynamic_checkpoint.py

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* Update src/axolotl/utils/callbacks/dynamic_checkpoint.py

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* Update src/axolotl/utils/schemas/dynamic_checkpoint.py

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* lint-2

* remove: singal based checkpoints

* lint

* remove signal tests

* add:is_main_process

* lint

* addis_d:istributed() for tests

* remove nested is_main_process

* Update src/axolotl/utils/schemas/dynamic_checkpoint.py

Co-authored-by: Wing Lian <wing.lian@gmail.com>

* Update src/axolotl/utils/schemas/dynamic_checkpoint.py

Co-authored-by: Wing Lian <wing.lian@gmail.com>

* add user_defined_filename

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
This commit is contained in:
VED
2025-11-13 20:51:05 +05:30
committed by GitHub
parent 49b8107989
commit dcf24fd24e
6 changed files with 567 additions and 0 deletions