* current
not clean working version
move torch trainer to do_cli
update code with config changes and clean up
edit config
cleanup
add run name to trainer
* address comments
* use axolotl train in multigpu tests and add ray tests for multi-gpu
* accelerate uses underscores for main_process_port arg
* chore: lint
* fix order of accelerate args
* include ray train in docker images
* current
not clean working version
move torch trainer to do_cli
update code with config changes and clean up
edit config
cleanup
add run name to trainer
* address comments
* use axolotl train in multigpu tests and add ray tests for multi-gpu
* accelerate uses underscores for main_process_port arg
* chore: lint
* fix order of accelerate args
* include ray train in docker images
* fix bf16 resolution behavior
* move dtype logic
* x
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
* rename
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
* add to sidebar
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
* Apply suggestions from code review
Co-authored-by: Eric Tang <46737979+erictang000@users.noreply.github.com>
* Update docs/ray-integration.qmd
Co-authored-by: Eric Tang <46737979+erictang000@users.noreply.github.com>
* pre-commit fixes
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
* use output_dir instead of hardcoded saves path
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* bugfix storage dir
* change type\ for resources_per_worker
---------
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
Co-authored-by: SumanthRH <sumanthrh@anyscale.com>
Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* fix build w pyproject to respect insalled torch version
* include in manifest
* disable duplicate code check for now
* move parser so it can be found
* add checks for correct pytorch version so this doesn't slip by again
* feat: add cut_cross_entropy
* fix: add to input
* fix: remove from setup.py
* feat: refactor into an integration
* chore: ignore lint
* feat: add test for cce
* fix: set max_steps for liger test
* chore: Update base model following suggestion
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* chore: update special_tokens following suggestion
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* chore: remove with_temp_dir following comments
* fix: plugins aren't loaded
* chore: update quotes in error message
* chore: lint
* chore: lint
* feat: enable FA on test
* chore: refactor get_pytorch_version
* fix: lock cce commit version
* fix: remove subclassing UT
* fix: downcast even if not using FA and config check
* feat: add test to check different attentions
* feat: add install to CI
* chore: refactor to use parametrize for attention
* fix: pytest not detecting test
* feat: handle torch lower than 2.4
* fix args/kwargs to match docs
* use release version cut-cross-entropy==24.11.4
* fix quotes
* fix: use named params for clarity for modal builder
* fix: handle install from pip
* fix: test check only top level module install
* fix: re-add import check
* uninstall existing version if no transformers submodule in cce
* more dataset fixtures into the cache
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
* see if unsloth installs cleanly in ci
* check unsloth install on regular tests, not sdist
* fix ampere check exception for ci
* use cached_property instead
* add an e2e test for unsloth qlora
* reduce seq len and mbsz to prevent oom in ci
* add checks for fp16 and sdp_attention
* pin unsloth to a specific release
* add unsloth to docker image too
* fix flash attn xentropy patch
* fix loss, add check for loss when using fa_xentropy
* fix special tokens for test
* typo
* test fa xentropy with and without gradient accum
* pr feedback changes
* add support for optimi_adamw optimizer w kahan summation
* pydantic validator for optimi_adamw
* workaround for setting optimizer for fsdp
* make sure to install optimizer packages
* make sure to have parity for model parameters passed to optimizer
* add smoke test for optimi_adamw optimizer
* don't use foreach optimi by default
* support galore once upstreamed into transformers
* update module name for llama in readme and fix typing for all linear
* bump trl for deprecation fixes from newer transformers
* include galore as an extra and install in docker image
* fix optim_args type
* fix optim_args
* update dependencies for galore
* add galore to cicd dockerfile
* make mlflow optional
* fix xformers
don't patch swiglu if xformers not working
fix the check for xformers swiglu
* fix install of xformers with extra index url for docker builds
* fix docker build arg quoting
* add torch to requirements.txt at build time to force version to stick
* fix xformers check
* better handling of xformers based on installed torch version
* fix for ci w/o torch
* update readme to point to direct link to runpod template, cleanup install instrucitons
* default install flash-attn and auto-gptq now too
* update readme w flash-attn extra
* fix version in setup
* auto gptq support
* more tweaks and add yml
* remove old gptq docker
* don't need explicit peft install for tests
* fix setup.py to use extra index url
install torch for tests
fix cuda version for autogptq index
set torch in requirements so that it installs properly
move gptq install around to work with github cicd
* gptq doesn't play well with sample packing
* address pr feedback
* remove torch install for now
* set quantization_config from model config
* Fix the implementation for getting quant config from model config
* flash attn pip
* add packaging
* add packaging to apt get
* install flash attn in dockerfile
* remove unused whls
* add wheel
* clean up pr
fix packaging requirement for ci
upgrade pip for ci
skip build isolation for requiremnents to get flash-attn working
install flash-attn seperately
* install wheel for ci
* no flash-attn for basic cicd
* install flash-attn as pip extras
---------
Co-authored-by: Ubuntu <mgh@mgh-vm.wsyvwcia0jxedeyrchqg425tpb.ax.internal.cloudapp.net>
Co-authored-by: mhenrichsen <some_email@hey.com>
Co-authored-by: Mads Henrichsen <mads@BrbartiendeMads.lan>
Co-authored-by: Wing Lian <wing.lian@gmail.com>