axolotl

Author	SHA1	Message	Date
Wing Lian	7d1d22f72f	ORPO Trainer replacement (#1551 ) * WIP use trl ORPOTrainer * fixes to make orpo work with trl * fix the chat template laoding * make sure to handle the special tokens and add_generation for assistant turn too	2024-04-19 17:25:36 -04:00
Wing Lian	6319da1f9b	Unsloth gradient checkpointing offload (#1528 ) * unsloth gradient checkpointing * fix validation too * fixes to make it work with mistral * monkeypatch the checkpoint fn earlier	2024-04-16 14:53:57 -04:00
Wing Lian	132eb740f0	DBRX Model Support (#1462 ) * wip for dbrx finetuning * add fastcore for parallel loading of sharded weights * fix dtype for load, use PartialState instead of accelerator to init process group, remove redundant wandb callback * update to use v2 of the converted model * more fixes for dbrx loras * make sure to enable fsdp activation checkpointing * fix support for 8bit loras too for dbrx * apply z3 leaf moe fix for DBRX with deepspeed * don't raise value error since child module searches could fail and be ok * revert a previous change to fix fsdp * update mistral/mistral qlora+fsdp yamls * fix qlora+fsdp quant storage type * more edge cases for qlora-fsdp * fixes for fsdp+qlora w optimizer in 8bit * add bigstral z3 config and make sure to use full_state_dict for fsdp	2024-04-12 09:02:36 -04:00
Thomas Capelle	5ed29393e3	Update SaveAxolotlConfigtoWandBCallback to use artifact instead of save (#1483 ) * deprecated wandb.save * also use wandb.save for axolotl yaml * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-04-09 18:58:38 -04:00
Wing Lian	da9b1a3196	use locale agnostic seperator to make large nums easier to read (#1503 )	2024-04-09 17:28:43 -04:00
DavidFarago	057fa44191	WIP: Support table logging for mlflow, too (#1506 ) * WIP: Support table logging for mlflow, too Create a `LogPredictionCallback` for both "wandb" and "mlflow" if specified. In `log_prediction_callback_factory`, create a generic table and make it specific only if the newly added `logger` argument is set to "wandb" resp. "mlflow". See https://github.com/OpenAccess-AI-Collective/axolotl/issues/1505 * chore: lint * add additional clause for mlflow as it's optional * Fix circular imports --------- Co-authored-by: Dave Farago <dfarago@innoopract.com> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-04-09 17:28:27 -04:00
Scott Fleming	8fa0785f74	Correctly handle splits for datasets.arrow_dataset.Dataset objects (#1504 ) * Correctly handle splits for datasets.arrow_dataset.Dataset objects The `load_tokenized_prepared_datasets` function currently has logic for loading a dataset from local path that always checks if a split is in the dataset. The problem is, if the dataset is loaded using `load_from_disk` and it is an Arrow-based dataset, there is no split information. Instead what happens is, by calling `split in ds`, it presumably searches through all the rows and columns of the arrow dataset object to find e.g., 'train' assuming `split == 'train'`. This causes the program to hang. See https://chat.openai.com/share/0d567dbd-d60b-4079-9040-e1de58a4dff3 for context. * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-04-09 16:40:26 -04:00
Wing Lian	4313b1a6a0	Print versions (#1496 ) * print out dependency versions for easier debugging * improve readability	2024-04-09 11:05:15 -04:00
Wing Lian	ff01c45127	add field to sft dataset pydantic for completion support (#1497 )	2024-04-08 21:37:54 -04:00
Wing Lian	2fa65b9599	ignore issues with calculating # params when printing (#1493 )	2024-04-08 11:04:22 -04:00
xzuyn	9430b6e868	Remove `validate_quantized_dora` (#1485 ) DoRA with quantized layers is supported with PEFT 0.10.0	2024-04-08 01:25:23 -04:00
Wing Lian	934fc851da	drop empty token from beginning if tokenizer has no bos_token (in the case of qwen) (#1490 )	2024-04-06 19:55:19 -07:00
NanoCode012	bda48f0150	fix: reduce sample_packing warning (#1484 )	2024-04-06 21:04:07 +09:00
NanoCode012	bf4cd67252	feat: validate sample packing requires flash_attention (#1465 ) * feat: validate sample packing requires flash_attention * fix: check for sdp_attn per suggestion * feat: add FA to tests	2024-04-05 12:47:32 +09:00
Wing Lian	05b0b7e8ca	add support for cohere chat template (#1478 )	2024-04-04 18:20:50 -07:00
Wing Lian	87ca3f98c6	don't use deepspeed or fsdp when merging loras (#1479 )	2024-04-04 18:20:32 -07:00
Wing Lian	e0fcef403f	refactor utils.data module for line count linter (#1476 )	2024-04-04 16:33:42 -07:00
Wing Lian	5aa50974ce	Pretrain multipack v2 (#1470 )	2024-04-02 05:42:16 -07:00
Nick Doiron	586bd8d221	fix pretraining_ on odd datasets (#1463 ) * can configure name of split of pretraining dataset * streaming data and dataset map * text column customized * allow text_column to be set in pretrain * pretrain type * load a bit of the dataset * fix dataset where splits have separate configs * ok name param here is the config * whitespace	2024-04-01 20:48:59 -07:00
Wing Lian	0b103775ad	reduce verbosity of the special tokens (#1472 )	2024-04-01 21:47:27 +09:00
Wing Lian	0ddfb24fcf	LISA (#1469 ) * add lisa support * fix default and fix attribute traversal for layers * improve lisa callback logging * fix LISA by ensuring params are not frozen during __init__ * example config for lisa --------- Co-authored-by: Aman Karmani <aman@tmm1.net>	2024-04-01 04:54:53 -07:00
Wing Lian	6086be85f7	qwen2_moe support w multipack (#1455 )	2024-03-29 11:04:53 -04:00
Wing Lian	05b398a072	fix some of the edge cases for Jamba (#1452 ) * fix some of the edge cases for Jamba * update requirements for jamba	2024-03-29 02:38:02 -04:00
Keith Stevens	e634118f90	Support loading datasets saved via save_to_disk (#1432 ) * Support loading datasetes saved via save_to_disk * Adding comprehensive unittests * Fix dataset tests due to new hash changes	2024-03-29 00:19:36 -04:00
Wing Lian	02af0820f7	Jamba (#1451 ) * fixes for larger models * add qlora example for deepspeed * add readme for jamba	2024-03-28 21:03:22 -04:00
Wing Lian	4155e9988f	fix layer_replication arg to peft (#1446 )	2024-03-27 10:18:56 -04:00
Wing Lian	25afd35842	support layer replication for peft and fix rslora integration (#1445 )	2024-03-27 10:16:47 -04:00
Wing Lian	da265dd796	fix for accelerate env var for auto bf16, add new base image and expand torch_cuda_arch_list support (#1413 )	2024-03-26 16:46:19 -04:00
WenboPan	e07347b188	Remove seq_len arg in rotary_emb (#1443 ) * remove seq_len in llama rotary_emb * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-03-26 15:19:44 -04:00
Far El	bcdc9b1601	Fix falcon tokenization step (#1441 ) [skip ci] * Fix falcon tokenization step * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-03-26 15:19:34 -04:00
Wing Lian	601b77bc9d	make sure to capture non-null defaults from config validation (#1415 )	2024-03-26 15:18:47 -04:00
NanoCode012	ff939d8a64	fix(dataset): normalize tokenizer config and change hash from tokenizer class to tokenizer path (#1298 ) * fix(dataset): normalize tokenizer config and change hash from tokenizer class to tokenizer path * fix: normalize config	2024-03-25 15:34:54 +09:00
Wing Lian	34ba634b8c	Fix ORPO multi gpu (#1433 ) * don't drop attention_mask for orpo * handle multi-gpu cases better for orpo * revert change to not drop the attention_mask from inputs for orpo	2024-03-22 15:22:58 -07:00
Wing Lian	2a1589f6f6	strip out hacky qlora-fsdp workarounds now that qlora-fsdp fixes are upstreamed (#1428 )	2024-03-21 11:56:13 -04:00
Younes Belkada	7d55607368	HF / FEAT: Optimize HF tags (#1425 ) [skip ci] * optimize tags * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-03-21 11:55:56 -04:00
Wing Lian	7803f0934f	fixes for dpo and orpo template loading (#1424 )	2024-03-20 11:36:24 -04:00
Wing Lian	dd449c5cd8	support galore once upstreamed into transformers (#1409 ) * support galore once upstreamed into transformers * update module name for llama in readme and fix typing for all linear * bump trl for deprecation fixes from newer transformers * include galore as an extra and install in docker image * fix optim_args type * fix optim_args * update dependencies for galore * add galore to cicd dockerfile	2024-03-19 09:26:35 -04:00
NanoCode012	40a88e8c4a	Feat: Add sharegpt multirole (#1137 ) * feat(prompt): support multiple roles for sharegpt * fix: add handling of empty role back * feat: rebased and allowed more dynamic roles via config * fix: variable * chore: update message * feat: add vicuna format * fix: JSON serializable error * fix: typing * fix: don't remap for unknown keys * fix: add roles to pydantic * feat: add test * chore: remove leftover print * chore: remove leftover comment * chore: remove print * fix: update test to use chatml	2024-03-19 20:51:49 +09:00
Seungduk Kim	43bdc5d3de	Add a config not to shuffle merged dataset (#1394 ) [skip ci] * Add a config not to shuffle merged dataset * Update README.md * Update src/axolotl/utils/config/models/input/v0_4_1/__init__.py Co-authored-by: Wing Lian <wing.lian@gmail.com> * invert the condition name * update README * info -> debug --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-03-19 20:51:00 +09:00
NanoCode012	b1e3e1b25f	fix(config): passing gradient_checkpoint_kwargs (#1412 ) * fix(config): change default use_reentrant to true * Update trainer_builder.py * fix: make sure to pass kwargs to enable checkpoint * chore: lint	2024-03-19 12:57:43 +09:00
Wing Lian	2ea70ebbd8	ORPO (#1419 ) * orpo trainer * rl handling for orpo * support for remove_unused_columns * orpo fixes * fix loader for orpo * chore: lint * fix default for remove_unused_columns * roll ORPO into the main AxolotlTrainer so it can be compatible with some of the other techniques like relora * better handling of system message for orpo * revert system prompt changes for chat templtes * no need for else condition * split dataset parsing into it's own component	2024-03-18 13:10:00 -04:00
NanoCode012	d485a08393	chore(script): remove redundant setting (#1411 )	2024-03-16 21:10:38 +09:00
Wing Lian	8df7b888ff	beta support for multipack with gemmoe: (#1402 )	2024-03-14 15:52:23 -04:00
Seungduk Kim	05bcc9ea56	Train parameters exclusively in specific ranges (#1390 ) * Train parameters exclusively in specific ranges * Fix the style and update docs * Update yaml example	2024-03-14 11:05:42 -04:00
Chirag Jain	3bd8203c35	Don't disable existing loggers when configuring axolotl logging (#1395 )	2024-03-14 11:05:21 -04:00
Chirag Jain	0976781e15	Update ChatTemplate enum to include alpaca and gemma (#1396 )	2024-03-13 11:06:02 -04:00
Wing Lian	8a82d2e0a4	add handling for argilla dpo-mix (#1397 )	2024-03-12 17:17:10 -04:00
Wing Lian	4326520829	chore: lint (#1389 )	2024-03-10 21:02:55 -04:00
Brian Fitzgerald	b7d8a7dc4d	Add Glaive conversation format support (#1365 ) * Add Glaive conversation format support * fix black formatting errors * Fix black and pylint formatting errors * only set role_key_tool if provided in the dataset constructor * Update src/axolotl/prompt_strategies/sharegpt.py Co-authored-by: Wing Lian <wing.lian@gmail.com> * sharegpt test * tokenizer test * fix formatting --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-03-10 20:50:25 -04:00
David Baker	0bc114d2e1	Fix pydantic configuration for the max_memory input (#1385 ) [skip ci] * Fix pydantic configuration for the max_memory input * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-03-10 20:50:04 -04:00

... 7 8 9 10 11 ...

1085 Commits