* current
not clean working version
move torch trainer to do_cli
update code with config changes and clean up
edit config
cleanup
add run name to trainer
* address comments
* use axolotl train in multigpu tests and add ray tests for multi-gpu
* accelerate uses underscores for main_process_port arg
* chore: lint
* fix order of accelerate args
* include ray train in docker images
* current
not clean working version
move torch trainer to do_cli
update code with config changes and clean up
edit config
cleanup
add run name to trainer
* address comments
* use axolotl train in multigpu tests and add ray tests for multi-gpu
* accelerate uses underscores for main_process_port arg
* chore: lint
* fix order of accelerate args
* include ray train in docker images
* fix bf16 resolution behavior
* move dtype logic
* x
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
* rename
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
* add to sidebar
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
* Apply suggestions from code review
Co-authored-by: Eric Tang <46737979+erictang000@users.noreply.github.com>
* Update docs/ray-integration.qmd
Co-authored-by: Eric Tang <46737979+erictang000@users.noreply.github.com>
* pre-commit fixes
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
* use output_dir instead of hardcoded saves path
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* bugfix storage dir
* change type\ for resources_per_worker
---------
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
Co-authored-by: SumanthRH <sumanthrh@anyscale.com>
Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* support for custom lr groups for non-embedding modules
invert name check for group modules
include lr_groups in training args
additional conditional for creating optimizer
fix regular params as w weight decay
fix lookup and add docs
* address pr feedback
* fix for pretrain with packing
* fix model name and loss expected
* make sure to check with micro batch size for pretraining
* change loss threshholds based on parametrization
* make tests smaller for CI
* fix pretrain packing
* fix pretrain packing test
* address pr feedback
* fix: use text_column even when not packing for pretraining
* feat: update test to check when not packing
* chore: lint
* Update src/axolotl/utils/data/pretraining.py
Co-authored-by: Wing Lian <wing.lian@gmail.com>
---------
Co-authored-by: Wing Lian <wing@axolotl.ai>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* add helper to verify the correct model output file exists
* more checks using helper
* chore: lint
* fix import and relora model check
* workaround for trl trainer saves
* remove stray print