Wing Lian
05b398a072
fix some of the edge cases for Jamba ( #1452 )
...
* fix some of the edge cases for Jamba
* update requirements for jamba
2024-03-29 02:38:02 -04:00
Keith Stevens
e634118f90
Support loading datasets saved via save_to_disk ( #1432 )
...
* Support loading datasetes saved via save_to_disk
* Adding comprehensive unittests
* Fix dataset tests due to new hash changes
2024-03-29 00:19:36 -04:00
Wing Lian
02af0820f7
Jamba ( #1451 )
...
* fixes for larger models
* add qlora example for deepspeed
* add readme for jamba
2024-03-28 21:03:22 -04:00
Wing Lian
4155e9988f
fix layer_replication arg to peft ( #1446 )
2024-03-27 10:18:56 -04:00
Wing Lian
25afd35842
support layer replication for peft and fix rslora integration ( #1445 )
2024-03-27 10:16:47 -04:00
Wing Lian
da265dd796
fix for accelerate env var for auto bf16, add new base image and expand torch_cuda_arch_list support ( #1413 )
2024-03-26 16:46:19 -04:00
WenboPan
e07347b188
Remove seq_len arg in rotary_emb ( #1443 )
...
* remove seq_len in llama rotary_emb
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-03-26 15:19:44 -04:00
Far El
bcdc9b1601
Fix falcon tokenization step ( #1441 ) [skip ci]
...
* Fix falcon tokenization step
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-03-26 15:19:34 -04:00
Wing Lian
601b77bc9d
make sure to capture non-null defaults from config validation ( #1415 )
2024-03-26 15:18:47 -04:00
NanoCode012
ff939d8a64
fix(dataset): normalize tokenizer config and change hash from tokenizer class to tokenizer path ( #1298 )
...
* fix(dataset): normalize tokenizer config and change hash from tokenizer class to tokenizer path
* fix: normalize config
2024-03-25 15:34:54 +09:00
Wing Lian
34ba634b8c
Fix ORPO multi gpu ( #1433 )
...
* don't drop attention_mask for orpo
* handle multi-gpu cases better for orpo
* revert change to not drop the attention_mask from inputs for orpo
2024-03-22 15:22:58 -07:00
Wing Lian
2a1589f6f6
strip out hacky qlora-fsdp workarounds now that qlora-fsdp fixes are upstreamed ( #1428 )
2024-03-21 11:56:13 -04:00
Younes Belkada
7d55607368
HF / FEAT: Optimize HF tags ( #1425 ) [skip ci]
...
* optimize tags
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-03-21 11:55:56 -04:00
Wing Lian
7803f0934f
fixes for dpo and orpo template loading ( #1424 )
2024-03-20 11:36:24 -04:00
Wing Lian
dd449c5cd8
support galore once upstreamed into transformers ( #1409 )
...
* support galore once upstreamed into transformers
* update module name for llama in readme and fix typing for all linear
* bump trl for deprecation fixes from newer transformers
* include galore as an extra and install in docker image
* fix optim_args type
* fix optim_args
* update dependencies for galore
* add galore to cicd dockerfile
2024-03-19 09:26:35 -04:00
NanoCode012
40a88e8c4a
Feat: Add sharegpt multirole ( #1137 )
...
* feat(prompt): support multiple roles for sharegpt
* fix: add handling of empty role back
* feat: rebased and allowed more dynamic roles via config
* fix: variable
* chore: update message
* feat: add vicuna format
* fix: JSON serializable error
* fix: typing
* fix: don't remap for unknown keys
* fix: add roles to pydantic
* feat: add test
* chore: remove leftover print
* chore: remove leftover comment
* chore: remove print
* fix: update test to use chatml
2024-03-19 20:51:49 +09:00
Seungduk Kim
43bdc5d3de
Add a config not to shuffle merged dataset ( #1394 ) [skip ci]
...
* Add a config not to shuffle merged dataset
* Update README.md
* Update src/axolotl/utils/config/models/input/v0_4_1/__init__.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* invert the condition name
* update README
* info -> debug
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-03-19 20:51:00 +09:00
NanoCode012
b1e3e1b25f
fix(config): passing gradient_checkpoint_kwargs ( #1412 )
...
* fix(config): change default use_reentrant to true
* Update trainer_builder.py
* fix: make sure to pass kwargs to enable checkpoint
* chore: lint
2024-03-19 12:57:43 +09:00
Wing Lian
2ea70ebbd8
ORPO ( #1419 )
...
* orpo trainer
* rl handling for orpo
* support for remove_unused_columns
* orpo fixes
* fix loader for orpo
* chore: lint
* fix default for remove_unused_columns
* roll ORPO into the main AxolotlTrainer so it can be compatible with some of the other techniques like relora
* better handling of system message for orpo
* revert system prompt changes for chat templtes
* no need for else condition
* split dataset parsing into it's own component
2024-03-18 13:10:00 -04:00
NanoCode012
d485a08393
chore(script): remove redundant setting ( #1411 )
2024-03-16 21:10:38 +09:00
Wing Lian
8df7b888ff
beta support for multipack with gemmoe: ( #1402 )
2024-03-14 15:52:23 -04:00
Seungduk Kim
05bcc9ea56
Train parameters exclusively in specific ranges ( #1390 )
...
* Train parameters exclusively in specific ranges
* Fix the style and update docs
* Update yaml example
2024-03-14 11:05:42 -04:00
Chirag Jain
3bd8203c35
Don't disable existing loggers when configuring axolotl logging ( #1395 )
2024-03-14 11:05:21 -04:00
Chirag Jain
0976781e15
Update ChatTemplate enum to include alpaca and gemma ( #1396 )
2024-03-13 11:06:02 -04:00
Wing Lian
8a82d2e0a4
add handling for argilla dpo-mix ( #1397 )
2024-03-12 17:17:10 -04:00
Wing Lian
4326520829
chore: lint ( #1389 )
2024-03-10 21:02:55 -04:00
Brian Fitzgerald
b7d8a7dc4d
Add Glaive conversation format support ( #1365 )
...
* Add Glaive conversation format support
* fix black formatting errors
* Fix black and pylint formatting errors
* only set role_key_tool if provided in the dataset constructor
* Update src/axolotl/prompt_strategies/sharegpt.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* sharegpt test
* tokenizer test
* fix formatting
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-03-10 20:50:25 -04:00
David Baker
0bc114d2e1
Fix pydantic configuration for the max_memory input ( #1385 ) [skip ci]
...
* Fix pydantic configuration for the max_memory input
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-03-10 20:50:04 -04:00
Wing Lian
7659c001aa
support for rslora ( #1387 ) [skip ci]
2024-03-10 20:49:45 -04:00
Wing Lian
3fd8093717
validation for fsdp and deepspeed ( #1388 ) [skip ci]
...
* validation for fsdp and deepspeed
* make sure to return data
2024-03-10 20:49:25 -04:00
Wing Lian
9b6ee83a73
FDSP + QLoRA ( #1378 )
...
* wip qlora + fsdp fixes
* more fixes
* make sure to load the lora 🤦
* only setup quantized meta on non-zero rank:
* only run setup_quantized_peft_meta_for_training for qlora+fsdp
* more fixes for qlora+fsdp
* chore: lint
* add example yml
* support mistral too
* fix for model_type and add mixtral support too
* set cpu_offload: false to reduce vram, constrain new accleerator logic to qlora + fsdp
* refactor for duplicate code
2024-03-08 14:31:01 -05:00
Wing Lian
0cfdb2c90c
support for DoRA w/ PEFT ( #1363 )
2024-03-05 21:20:15 -05:00
Eric Hartford
e0f1895408
add starcoder2 ( #1349 )
...
* add starcoder2
* Apply suggestions from code review
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* chore: lint
* Apply suggestions from code review
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2024-03-05 19:49:17 -05:00
Wing Lian
2598c9f045
allow the sharegpt handler to also better handle datasets destined for openai finetuning ( #1361 )
...
* allow the sharegpt handler to also better handle datasets destined for openai finetuning
* make sure to support system role
2024-03-05 11:43:33 -05:00
Wing Lian
decb66e170
lora+ support ( #1352 )
...
* lora+ support
* optimizer should default to None
* include mit license
2024-03-05 07:29:23 -05:00
Wing Lian
4d09b42ee3
plain input/output prompt strategy w/o chat templates ( #1346 )
...
* plain input/output prompt strategy w/o chat templates
* disable duplicate code check
* make sure to add an eos/eot token to the end of the output so it will stop
* multi turn segement support and test
2024-03-04 16:25:16 -05:00
Chirag Jain
b5b44925ec
Fix validation for early stopping ( #1358 )
2024-03-03 22:15:18 -05:00
Wing Lian
6b3b271925
fix for protected model_ namespace w pydantic ( #1345 )
2024-02-28 15:07:49 -05:00
Chirag Jain
3a5a2d2f34
Fix use_mlflow to be bool instead of str ( #1344 )
2024-02-28 12:58:29 -05:00
Wing Lian
0f985e12fe
more fixes 20240228 ( #1342 ) [skip ci]
...
* add missing evals_per_epoch setting
* more pydantic fixes
* more fixes
* move test from normalization to validation
* increase eval size for sample packing tests
2024-02-28 12:57:45 -05:00
Wing Lian
c1a7b3dd69
add gemma instruct chat template ( #1341 )
...
* add gemma instruct chat template
* support for chat tempalte strategy too
2024-02-27 17:20:01 -05:00
Ikko Eltociear Ashimine
2b9687f341
Update fastchat_conversation_turns.py ( #1294 ) [skip ci]
...
seperated -> separated
2024-02-27 09:06:10 -05:00
Wing Lian
2c9c88b32a
fix steps check for anneal on first cycle ( #1316 )
2024-02-27 08:56:08 -05:00
Wing Lian
3f69571943
more pydantic fixes ( #1338 )
2024-02-26 22:39:13 -05:00
nopperl
1e3d5305d3
Support user-defined prompt processing strategies for dpo ( #1248 )
...
* support user-defined prompt processing strategies for dpo
* interpret dict dataset types as user-defined
* fix lint errors
* setup pydantic config for validation of User defined DPO
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-02-26 18:49:34 -05:00
Maxime
16482796b0
add lion-pytorch optimizer ( #1299 ) [skip ci]
...
* add lion-pytorch optimizer
* update pydantic to support lion optimizer
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-02-26 18:45:14 -05:00
Wing Lian
269c5436ea
hotfix to exclude_unset from pydantic config when converting back to a dict ( #1334 )
2024-02-26 15:06:25 -05:00
Wing Lian
e7eed203d8
hotfix for missing outputs params ( #1333 )
2024-02-26 14:36:37 -05:00
Wing Lian
cf002312e0
hotfix for lora rank ( #1332 )
2024-02-26 14:28:43 -05:00
Wing Lian
7de912e097
hotfix for capabilities loading ( #1331 )
2024-02-26 14:24:28 -05:00