* add 12.8.1 cuda to the base matrix
* use nightly
* bump deepspeed and set no binary
* deepspeed binary fixes hopefully
* install deepspeed by itself
* multiline fix
* make sure ninja is installed
* try with reversion of packaging/setuptools/wheel install
* use license instead of license-file
* try rolling back packaging and setuptools versions
* comment out license for validation for now
* make sure packaging version is consistent
* more parity across tests and docker images for packaging/setuptools
* current
not clean working version
move torch trainer to do_cli
update code with config changes and clean up
edit config
cleanup
add run name to trainer
* address comments
* use axolotl train in multigpu tests and add ray tests for multi-gpu
* accelerate uses underscores for main_process_port arg
* chore: lint
* fix order of accelerate args
* include ray train in docker images
* current
not clean working version
move torch trainer to do_cli
update code with config changes and clean up
edit config
cleanup
add run name to trainer
* address comments
* use axolotl train in multigpu tests and add ray tests for multi-gpu
* accelerate uses underscores for main_process_port arg
* chore: lint
* fix order of accelerate args
* include ray train in docker images
* fix bf16 resolution behavior
* move dtype logic
* x
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
* rename
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
* add to sidebar
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
* Apply suggestions from code review
Co-authored-by: Eric Tang <46737979+erictang000@users.noreply.github.com>
* Update docs/ray-integration.qmd
Co-authored-by: Eric Tang <46737979+erictang000@users.noreply.github.com>
* pre-commit fixes
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
* use output_dir instead of hardcoded saves path
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* bugfix storage dir
* change type\ for resources_per_worker
---------
Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
Co-authored-by: SumanthRH <sumanthrh@anyscale.com>
Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* need to update deepspeed version in extras too
* fix patch import
* fix monkeypatch reloading in tests and deepspeed patch
* remove duplicated functionality fixture
* reset LlamaForCausalLM too in fixtures for cce patch
* reset llama attn too
* disable xformers patch for cce
* skip problematic test on low usage functionality
* Fix broken CLI; remove duplicate metadata from setup.py
* Adding tests.yml CLI check
* updating
* remove test with requests to github due to rate limiting
---------
Co-authored-by: Dan Saunders <dan@axolotl.ai>
use a constraint file
use min version of xformers
don't install autoawq with pytorch 2.5.0
debugging for errors
upgrade pip first
fix action yml
add back try/except
retry w/o constraint
use --no-build-isolation
show torch version
install setuptools and wheel
add back try/except
* add support for optimi_adamw optimizer w kahan summation
* pydantic validator for optimi_adamw
* workaround for setting optimizer for fsdp
* make sure to install optimizer packages
* make sure to have parity for model parameters passed to optimizer
* add smoke test for optimi_adamw optimizer
* don't use foreach optimi by default
* bump flash attention 2.5.8 -> 2.6.1
* use triton implementation of cross entropy from flash attn
* add smoke test for flash attn cross entropy patch
* fix args to xentropy.apply
* handle tuple from triton loss fn
* ensure the patch tests run independently
* use the wrapper already built into flash attn for cross entropy
* mark pytest as forked for patches
* use pytest xdist instead of forked, since cuda doesn't like forking
* limit to 1 process and use dist loadfile for pytest
* change up pytest for fixture to reload transformers w monkeypathc
* Update requirements.txt
Preserve compatibility with torch 2.3.1. [Reference](https://github.com/facebookresearch/xformers/issues/1052)
* fix setup.py to extract the current xformers dep from requirements for replacement
* xformers 0.0.27 wheels not built for torch 2.3.0
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* support galore once upstreamed into transformers
* update module name for llama in readme and fix typing for all linear
* bump trl for deprecation fixes from newer transformers
* include galore as an extra and install in docker image
* fix optim_args type
* fix optim_args
* update dependencies for galore
* add galore to cicd dockerfile
* make mlflow optional
* fix xformers
don't patch swiglu if xformers not working
fix the check for xformers swiglu
* fix install of xformers with extra index url for docker builds
* fix docker build arg quoting
* add mps support
* linter stuff
* CI fixes
* install packaging for various tests
* Update setup.py
* Revert "install packaging for various tests"
This reverts commit 980e7aa44d.
* Revert "CI fixes"
This reverts commit 4609e3b166.
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* restore to current phi modeling code from phi-2
* enable gradient checkpointing
* don't cast everything to float32 all the time
* gradient checkpointing for phi2 ParallelBlock module too
* fix enabling flash attn for phi2
* add comment about import
* fix phi2 example
* fix model type check for tokenizer
* revert float32 -> bf16 casting changes
* support fused dense flash attn
* fix the repo for flash-attn
* add package name for subdir pkg
* fix the data collator when not using sample packing
* install packaging for pytests in ci
* also fix setup to not install flash attn fused dense subdir if not extras
* split out the fused-dense-lib in extra requires
* don't train w group_by_length for phi
* update integration test to use phi2
* set max steps and save steps for phi e2e tests
* try to workaround ssave issue in ci
* skip phi2 e2e test for now
* add torch to requirements.txt at build time to force version to stick
* fix xformers check
* better handling of xformers based on installed torch version
* fix for ci w/o torch
* support for mamba
* more mamba fixes
* use fork for mamba kwargs fix
* grad checkpointing doesn't work
* fix extras for mamaba
* mamba loss fix
* use fp32 and remove verbose logging
* mamba fixes
* fix collator for mamba
* set model_type on training_args
* don't save safetensors for mamba
* update mamba config to disable safetensor checkpooints, install for tests
* no evals for mamba tests
* handle save_pretrained
* handle unused safetensors arg