Commit Graph

10 Commits

Author SHA1 Message Date
Dan Saunders
fc973f4322 CLI Implementation with Click (#2107)
* Initial CLI implementation with click package

* Adding fetch command for pulling examples and deepspeed configs

* Automating default options for CliArgs classes

* Mimicking existing no config behavior

* bugfix in choose_config

* Updating fetch to sync instead of re-download

* bugfix

* isort fix

* fixing yaml isort order

* pre-commit fixes

* simplifying argument parsing -- pass through kwargs to do_cli

* make accelerate launch default for non-preprocess commands

* fixing arg handling

* testing None placeholder approach

* removing hacky --use-gpu argument to preprocess command

* Adding brief README documentation for CLI

* remove (New)

* Initial CLI pytest tests

* progress on CLI pytest

* adding inference CLI tests; cleanup

* Refactor train CLI tests to remove various mocking

* Major CLI test refator; adding remaining CLI codepath test coverage

* pytest fixes

* remove integration markers

* parallelizing examples, deepspeed config downloads; rename test to match other CLI test naming

* moving cli pytest due to isolation issues; cleanup

* testing fixes; various minor improvements

* fix

* tests fix

* Update tests/cli/conftest.py

Co-authored-by: Wing Lian <wing.lian@gmail.com>

---------

Co-authored-by: Dan Saunders <dan@axolotl.ai>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-12-05 22:11:48 -05:00
Wing Lian
9f6d0b5587 use pytest sugar and verbose for more info during ci (#2112) [skip ci]
* use pytest sugar and verbose for more info during ci

* also run test suite when test requirements or cicd.sh changes

* also on PR too
2024-12-02 20:14:40 -05:00
Wing Lian
ce5bcff750 various tests fixes for flakey tests (#2110)
* add mhenrichsen/alpaca_2k_test with revision dataset download fixture for flaky tests

* log slowest tests

* pin pynvml==11.5.3

* fix load local hub path

* optimize for speed w smaller models and val_set_size

* replace pynvml

* make the resume from checkpoint e2e faster

* make tests smaller
2024-12-02 17:28:58 -05:00
Wing Lian
c06b8f0243 increase worker count to 8 for basic pytests (#2075) [skip ci] 2024-11-18 11:52:35 -05:00
Mengqing Cao
1d6a5e2bd6 Refactor func load_model to class ModelLoader (#1909) 2024-10-25 09:06:56 -04:00
Wing Lian
0aeb277456 add e2e smoke tests for llama liger integration (#1884)
* add e2e smoke tests for llama liger integration

* fix import

* don't use __main__ for test

* consolidate line
2024-09-01 19:29:37 -04:00
Wing Lian
54392ac8a6 Attempt to run multigpu in PR CI for now to ensure it works (#1815) [skip ci]
* Attempt to run multigpu in PR CI for now to ensure it works

* fix yaml file

* forgot to include multigpu tests

* fix call to cicd.multigpu

* dump dictdefault to dict for yaml conversion

* use to_dict instead of casting

* 16bit-lora w flash attention, 8bit lora seems problematic

* add llama fsdp test

* more tests

* Add test for qlora + fsdp with prequant

* limit accelerate to 2 processes and disable broken qlora+fsdp+bnb test

* move multigpu tests to biweekly
2024-08-09 11:50:13 -04:00
Wing Lian
e1725aef2b update modal package and don't cache pip install (#1757)
* update modal package and cleanup pip cache

* more verbosity on the test
2024-07-16 14:45:38 -04:00
Wing Lian
fe650dd326 make sure the CI fails when pytest script fails (#1669)
* make sure the pytest script fails

* make sure the defaults come through for tests

* make sure tensorboard is loaded for test assertion
2024-05-29 10:12:11 -04:00
Wing Lian
00018629e7 run tests again on Modal (#1289) [skip ci]
* run tests again on Modal

* make sure to run the full suite of tests on modal

* run cicd steps via shell script

* run tests in different runs

* increase timeout

* split tests into steps on modal

* increase workflow timeout

* retry doing this with only a single script

* fix yml launch for modal ci

* reorder tests to run on modal

* skip dpo tests on modal

* run on L4s, A10G takes too long

* increase CPU and RAM for modal test

* run modal tests on A100s

* skip phi test on modal

* env not arg in modal dockerfile

* upgrade pydantic and fastapi for modal tests

* cleanup stray character

* use A10s instead of A100 for modal
2024-02-29 14:26:26 -05:00