Commit Graph

1350 Commits

Author SHA1 Message Date
Casper Hansen
10328b3429 Simplify creating parameters 2024-03-18 12:32:59 +00:00
Casper Hansen
5bfc470d57 Stop transformers from using all memory 2024-03-18 11:47:47 +00:00
Casper Hansen
04168801c9 Simplify conversion + more debug 2024-03-17 20:21:46 +00:00
Casper
d43a79b7bf device_map auto 2024-03-17 19:52:56 +01:00
Casper
884d81331e Initialize ParallelExperts on device of first expert 2024-03-17 19:51:31 +01:00
Casper
2ea75b4160 temporary: inference validation script 2024-03-17 19:48:52 +01:00
Casper Hansen
035e680631 Update test 2024-03-15 13:58:12 +00:00
Casper Hansen
26fc10df01 Refactor names, bugfixes 2024-03-15 12:39:11 +00:00
Casper Hansen
1bc008e901 Refactor creating FusedExperts 2024-03-15 11:59:56 +00:00
Casper Hansen
3f7ed6a784 Bugfixes, test green 2024-03-15 11:48:46 +00:00
Casper
feea977923 initial implementation, untested 2024-03-15 11:54:36 +01:00
Wing Lian
8df7b888ff beta support for multipack with gemmoe: (#1402) 2024-03-14 15:52:23 -04:00
Sebastian Raschka
6366b0c212 Fix Gemma 7b qlora.yml (#1405) 2024-03-14 15:44:38 -04:00
Seungduk Kim
05bcc9ea56 Train parameters exclusively in specific ranges (#1390)
* Train parameters exclusively in specific ranges

* Fix the style and update docs

* Update yaml example
2024-03-14 11:05:42 -04:00
Chirag Jain
3bd8203c35 Don't disable existing loggers when configuring axolotl logging (#1395) 2024-03-14 11:05:21 -04:00
Hamel Husain
8b12468230 Add QLoRA + FSDP Docs (#1403)
* pre commit

* Update fsdp_qlora.md
2024-03-14 11:04:51 -04:00
Chirag Jain
0976781e15 Update ChatTemplate enum to include alpaca and gemma (#1396) 2024-03-13 11:06:02 -04:00
Wing Lian
8a82d2e0a4 add handling for argilla dpo-mix (#1397) 2024-03-12 17:17:10 -04:00
Wing Lian
4326520829 chore: lint (#1389) 2024-03-10 21:02:55 -04:00
Brian Fitzgerald
b7d8a7dc4d Add Glaive conversation format support (#1365)
* Add Glaive conversation format support

* fix black formatting errors

* Fix black and pylint formatting errors

* only set role_key_tool if provided in the dataset constructor

* Update src/axolotl/prompt_strategies/sharegpt.py

Co-authored-by: Wing Lian <wing.lian@gmail.com>

* sharegpt test

* tokenizer test

* fix formatting

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-03-10 20:50:25 -04:00
Seungduk Kim
b0ee9ec734 Set gradient_clipping to auto in DeepSpeed configs (#1382) [skip ci] 2024-03-10 20:50:12 -04:00
David Baker
0bc114d2e1 Fix pydantic configuration for the max_memory input (#1385) [skip ci]
* Fix pydantic configuration for the max_memory input

* chore: lint

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-03-10 20:50:04 -04:00
Wing Lian
7659c001aa support for rslora (#1387) [skip ci] 2024-03-10 20:49:45 -04:00
Wing Lian
3fd8093717 validation for fsdp and deepspeed (#1388) [skip ci]
* validation for fsdp and deepspeed

* make sure to return data
2024-03-10 20:49:25 -04:00
Wing Lian
9b6ee83a73 FDSP + QLoRA (#1378)
* wip qlora + fsdp fixes

* more fixes

* make sure to load the lora 🤦

* only setup quantized meta on non-zero rank:

* only run setup_quantized_peft_meta_for_training for qlora+fsdp

* more fixes for qlora+fsdp

* chore: lint

* add example yml

* support mistral too

* fix for model_type and add mixtral support too

* set cpu_offload: false to reduce vram, constrain new accleerator logic to qlora + fsdp

* refactor for duplicate code
2024-03-08 14:31:01 -05:00
Wing Lian
638c2dafb5 JarvisLabs (#1372)
* add Jarvis cloud gpu and sponsorship

* whitespace
2024-03-07 10:47:32 -05:00
Wing Lian
58b0d4b0d8 update flash attention for gemma support: (#1368) 2024-03-06 10:08:54 -05:00
Hamel Husain
ed70a08348 add docs for input_output format (#1367) [skip ci]
* add docs

* add docs

* run linter
2024-03-06 09:09:49 -05:00
Wing Lian
0cfdb2c90c support for DoRA w/ PEFT (#1363) 2024-03-05 21:20:15 -05:00
Nicolas Rojas
37657473c8 Remove unsupported python version 3.9 from README (#1364) [skip ci] 2024-03-05 21:19:36 -05:00
Eric Hartford
e0f1895408 add starcoder2 (#1349)
* add starcoder2

* Apply suggestions from code review

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* chore: lint

* Apply suggestions from code review

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
2024-03-05 19:49:17 -05:00
Sebastian Raschka
8984bf1722 Update tinyllama lora.yml to fix eval packing issue (#1362) 2024-03-05 14:36:29 -05:00
Wing Lian
2598c9f045 allow the sharegpt handler to also better handle datasets destined for openai finetuning (#1361)
* allow the sharegpt handler to also better handle datasets destined for openai finetuning

* make sure to support system role
2024-03-05 11:43:33 -05:00
Wing Lian
decb66e170 lora+ support (#1352)
* lora+ support

* optimizer should default to None

* include mit license
2024-03-05 07:29:23 -05:00
Wing Lian
4d09b42ee3 plain input/output prompt strategy w/o chat templates (#1346)
* plain input/output prompt strategy w/o chat templates

* disable duplicate code check

* make sure to add an eos/eot token to the end of the output so it will stop

* multi turn segement support and test
2024-03-04 16:25:16 -05:00
Chirag Jain
b5b44925ec Fix validation for early stopping (#1358) 2024-03-03 22:15:18 -05:00
NanoCode012
170d4d7092 chore: enable sample_packing for Gemma (#1351) 2024-03-01 21:56:22 -05:00
Wing Lian
00018629e7 run tests again on Modal (#1289) [skip ci]
* run tests again on Modal

* make sure to run the full suite of tests on modal

* run cicd steps via shell script

* run tests in different runs

* increase timeout

* split tests into steps on modal

* increase workflow timeout

* retry doing this with only a single script

* fix yml launch for modal ci

* reorder tests to run on modal

* skip dpo tests on modal

* run on L4s, A10G takes too long

* increase CPU and RAM for modal test

* run modal tests on A100s

* skip phi test on modal

* env not arg in modal dockerfile

* upgrade pydantic and fastapi for modal tests

* cleanup stray character

* use A10s instead of A100 for modal
2024-02-29 14:26:26 -05:00
Wing Lian
6b3b271925 fix for protected model_ namespace w pydantic (#1345) 2024-02-28 15:07:49 -05:00
Chirag Jain
3a5a2d2f34 Fix use_mlflow to be bool instead of str (#1344) 2024-02-28 12:58:29 -05:00
Wing Lian
6d4bbb877f deprecate py 3.9 support, set min pytorch version (#1343) [skip ci] 2024-02-28 12:58:05 -05:00
Wing Lian
0f985e12fe more fixes 20240228 (#1342) [skip ci]
* add missing evals_per_epoch setting

* more pydantic fixes

* more fixes

* move test from normalization to validation

* increase eval size for sample packing tests
2024-02-28 12:57:45 -05:00
Wing Lian
c1a7b3dd69 add gemma instruct chat template (#1341)
* add gemma instruct chat template

* support for chat tempalte strategy too
2024-02-27 17:20:01 -05:00
Ikko Eltociear Ashimine
2b9687f341 Update fastchat_conversation_turns.py (#1294) [skip ci]
seperated -> separated
2024-02-27 09:06:10 -05:00
Wing Lian
2c9c88b32a fix steps check for anneal on first cycle (#1316) 2024-02-27 08:56:08 -05:00
Hamel Husain
5265cd6b2c Update debugging.md (#1339) [skip ci] 2024-02-27 15:47:31 +09:00
NanoCode012
5be8b555a0 fix: checkpoint saving with deepspeed (#1321) 2024-02-27 15:46:44 +09:00
Maxime
0f6af36d50 Mps mistral lora (#1292) [skip ci]
* Lora example for Mistral on MPS backend

* Add some MPS documentation

* Update examples/mistral/lora-mps.yml

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* Update examples/mistral/lora-mps.yml

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* Update README.md

---------

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-02-26 22:39:57 -05:00
Wing Lian
3f69571943 more pydantic fixes (#1338) 2024-02-26 22:39:13 -05:00
nopperl
1e3d5305d3 Support user-defined prompt processing strategies for dpo (#1248)
* support user-defined prompt processing strategies for dpo

* interpret dict dataset types as user-defined

* fix lint errors

* setup pydantic config for validation of User defined DPO

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-02-26 18:49:34 -05:00