VED
cd856b45b1
feat:add support dataset_num_processes ( #3129 ) [skip ci]
...
* feat:add support dataset_num_processes
* chore
* required changes
* requested chnages
* required chnages
* required changes
* required changes
* elif get_default_process_count()
* add:del data
* Update cicd/Dockerfile.jinja
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update cicd/single_gpu.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2025-10-13 17:18:12 +07:00
Wing Lian
130637a3fa
upgrade transformers to 4.57.0 ( #3201 )
...
* upgrade transformers to 4.57.0
* remove deprecated autoawq and use latest peft
* remove autoawq from setuptools script
* fix imports
* make sure torchvision is installed
* remove support for BetterTransformer
* skip fsdp_qlora_prequant test
* more robust error reporting
2025-10-08 08:43:46 -04:00
Wing Lian
06bebcb65f
run cu128-2.8.0 e2e tests on B200 ( #3126 )
...
* run cu128-2.8.0 e2e tests on B200
* not an int 🤦
* fix yaml
2025-09-02 13:13:23 -04:00
Dan Saunders
79ddaebe9a
Add ruff, remove black, isort, flake8, pylint ( #3092 )
...
* black, isort, flake8 -> ruff
* remove unused
* add back needed import
* fix
2025-08-23 23:37:33 -04:00
salman
294c7fe7a6
Distributed/ND-Parallel ( #2977 )
2025-07-31 15:25:02 -04:00
Wing Lian
c6d69d5c1b
release v0.11.0 ( #2875 )
...
ci-cd / build-axolotl (<nil>, 126, 12.6.3, 3.11, 2.6.0) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl (vllm, 126, 12.6.3, 3.11, 2.7.0) (push) Has been cancelled
publish pypi / Create Release (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, 3.11, 2.7.0) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, true, 3.11, 2.6.0) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud-no-tmux (<nil>, 126, 12.6.3, 3.11, 2.6.0) (push) Has been cancelled
publish pypi / Upload release to PyPI (push) Has been cancelled
* release v0.11.0
* don't build vllm into release for now
* remove 2.5.1 references
* smollm3 multipack support
* fix ordering of e2e tests
2025-07-09 09:22:35 -04:00
Wing Lian
7563e1bd30
set a different triton cache for each test to avoid blocking writes to cache ( #2843 )
...
* set a different triton cache for each test to avoid blocking writes to cache
* set log level
* disable debug logging for filelock
2025-06-29 22:05:21 -04:00
Wing Lian
cb03c765a1
add uv tooling for e2e gpu tests ( #2750 )
...
* add uv tooling for e2e gpu tests
* fixes from PR feedback
* simplify check
* fix env var
* make sure to use uv for other install
* use raw_dockerfile_image
* Fix import
* fix args to experimental dockerfile image call
* use updated modal versions
2025-06-05 07:25:06 -07:00
Wing Lian
ecc719f5c7
add support for base image with uv ( #2691 )
2025-06-02 12:48:55 -07:00
Wing Lian
c7b6790614
Various fixes for CI, save_only_model for RL, prevent packing multiprocessing deadlocks ( #2661 )
...
* lean mistral ft tests, remove e2e torch 2.4.1 test
* make sure to pass save_only_model for RL
* more tests to make ci leaner, add cleanup to modal ci
* fix module for import in e2e tests
* use mp spawn to prevent deadlocks with packing
* make sure cleanup shell script is executable when cloned out
2025-05-12 10:51:18 -04:00