axolotl

Author	SHA1	Message	Date
Ben Redmond	22ae21a6c2	Add KTO support (#1640 ) * add kto support * test cleanup * fix outdated comment * fix llama3 ultra * chore: lint * update to use rl_beta instead of dpo_beta --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-05-20 16:05:16 -04:00
JohanWork	601c08b4c2	ADD: warning hub model (#1301 ) * update warning for save_strategy * update * clean up * update * Update test_validation.py * fix validation step * update * test_validation * update * fix * fix --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2024-05-01 01:05:12 +09:00
NanoCode012	bf4cd67252	feat: validate sample packing requires flash_attention (#1465 ) * feat: validate sample packing requires flash_attention * fix: check for sdp_attn per suggestion * feat: add FA to tests	2024-04-05 12:47:32 +09:00
Wing Lian	601b77bc9d	make sure to capture non-null defaults from config validation (#1415 )	2024-03-26 15:18:47 -04:00
Wing Lian	6b3b271925	fix for protected model_ namespace w pydantic (#1345 )	2024-02-28 15:07:49 -05:00
Wing Lian	0f985e12fe	more fixes 20240228 (#1342 ) [skip ci] * add missing evals_per_epoch setting * more pydantic fixes * more fixes * move test from normalization to validation * increase eval size for sample packing tests	2024-02-28 12:57:45 -05:00
Wing Lian	cc3cebfa70	Pydantic 2.x cfg (#1239 ) * WIP conversion to use pydantic for config validation * wip, more fields, add capabilities * wip * update pydantic validation to match existing tests * tweak requirements * setup deprecated paams pydantic model * more validations * wrap up rest of the validations * flesh out the rest of the options from the readme into pydantic * fix model validators as class methods remember to return in validator missing return add missing relora attributes fix test for DictDefault change fix sys template for mistral from fastchat change in PR 2872 fix test for batch size warning * more missing attributes for cfg * updates from PR feedback * fix validation for datasets and pretrain datasets * fix test for lora check	2024-02-26 12:24:14 -05:00
Wing Lian	4cb7900a56	Peft lotfq (#1222 ) * loftq support for lora * fix loftq check * update readme for loftq * readability cleanup * use peft main for loftq fixes, remove unnecessary special tokens * remove unused test from older deprecation	2024-01-28 18:50:08 -05:00
JohanWork	af29d81f80	ADD: warning if hub_model_id ist set but not any save strategy (#1202 ) * warning if hub model id set but no save * add warning * move the warning * add test * allow more public methods for tests for now * fix tests --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-26 10:38:55 -05:00
Wing Lian	814aee6603	Phi2 multipack (#1173 ) * phi2 multipack * update validation and examples for phi * more updates to phi examples * make sure to use the correct collator for phi multipack * phi needs attention mask now for multipack * if the special token already exists in the tokenizer, don't require in lora modules to save * fix qlora yml for phi, fix phi test validation * test qlora too * make sure flash attention is enabled for the test * don't use remote code for phi anymore * reduce sequence len for sample packing phi	2024-01-23 12:54:36 -05:00
Wing Lian	2ce5c0d68a	Deprecate max packed sequence len (#1141 )	2024-01-20 05:11:50 -05:00
xzuyn	8487b97cf3	Add `layers_to_transform` for `lora_config` (#1118 )	2024-01-15 21:29:55 -05:00
Wing Lian	78c5b1979e	add gptneox embeddings, fix phi2 inputs, also fix the casting (#1083 )	2024-01-10 22:32:43 -05:00
Wing Lian	0f100800e3	be more robust about checking embedding modules for lora finetunes (#1074 ) [skip ci] * be more robust about checking embedding modules for lora finetunes * update dynamic error message	2024-01-09 22:58:54 -05:00
Wing Lian	788649fe95	attempt to also run e2e tests that needs gpus (#1070 ) * attempt to also run e2e tests that needs gpus * fix stray quote * checkout specific github ref * dockerfile for tests with proper checkout ensure wandb is dissabled for docker pytests clear wandb env after testing clear wandb env after testing make sure to provide a default val for pop tryin skipping wandb validation tests explicitly disable wandb in the e2e tests explicitly report_to None to see if that fixes the docker e2e tests split gpu from non-gpu unit tests skip bf16 check in test for now build docker w/o cache since it uses branch name ref revert some changes now that caching is fixed skip bf16 check if on gpu w support * pytest skip for auto-gptq requirements * skip mamba tests for now, split multipack and non packed lora llama tests * split tests that use monkeypatches * fix relative import for prev commit * move other tests using monkeypatches to the correct run	2024-01-09 21:23:23 -05:00
NanoCode012	1ffa3866f2	Feat: Warns to add to modules_to_save when adding tokens or switching special_tokens (#787 ) * Feat: Auto add to modules_to_save when adding tokens * fix: swap to error instead of warning * feat: add check when special_tokens differ and add test	2023-12-22 21:49:07 +09:00
NanoCode012	a1da39cd48	Feat(wandb): Refactor to be more flexible (#767 ) * Feat: Update to handle wandb env better * chore: rename wandb_run_id to wandb_name * feat: add new recommendation and update config * fix: indent and pop disabled env if project passed * feat: test env set for wandb and recommendation * feat: update to use wandb_name and allow id * chore: add info to readme	2023-12-04 22:17:25 +09:00
NanoCode012	fb12895a17	Feat: Add warmup_ratio (#893 ) * Feat: Add warmup_ratio * fix: update readme with more details on conflict	2023-11-25 12:15:43 +09:00
NanoCode012	44c9d0151a	Fix: Warn when fullfinetune without adapter (#770 )	2023-10-22 15:41:43 -04:00
NanoCode012	9923b72649	Fix: eval table conflict with eval_sample_packing (#769 )	2023-10-23 01:18:12 +09:00
NanoCode012	383f88d7a7	Fix(cfg): Add validation for save_strategy and eval_strategy (#633 ) * Fix(cfg): Check save_strategy cfg conflict with save_steps * Fix(cfg): Check evaluation_strategy cfg conflict with eval_steps * chore: add extra check for steps only	2023-09-28 10:14:41 +09:00
Wing Lian	e7d3e2dbb6	use fastchat conversations template (#578 ) * use fastchat conversations template * require fastchat (fschat) pip install * handle roles dynamically from conversation * tweak fastchat conversation with a monkeypatch to get individual turns * fix up so it works with multiple conversation styles, and don't strip the turns * fix sharegpt fixture now that we're using a more correct tokenization * use a new prompter and support fastchat conversation type * use sharegpt from prompt strategies now * update docs, add chatml template * add a newline after im_end token * ensure we correctly set system message * update per PR feedback to handle deprecated sharegpt types * don't add duplicate wandb req * make sharegpt fields configurable from yml * llama2 fixes * don't fail fatally when turns are improper	2023-09-27 12:10:45 -04:00
NanoCode012	cfbce020e9	Fix: Fail bf16 check when running on cpu during merge (#631 )	2023-09-25 13:48:18 +09:00
Wing Lian	343714972b	recommend padding when using sample packing (#531 )	2023-09-06 17:00:21 -04:00
Aman Karmani	8cec513447	extract module for working with cfg	2023-08-12 18:25:27 -07:00
Wing Lian	2bb0b78975	Attention mask and position id fixes for packing (#285 ) * fix attetion mask with packing * set position ids and use block diagonal attn mask * fix expand mask for multiple batch items, make sure we pad position_ids * don't move masks to cpu * use multi pack dataloader w random sampler * add position_ids back * more fixes for dataloader integration * est total tokens, fix field loop * more fixes, position_ids seems broken * more fixes for sample packing * use distributed sampler, avoid accelerate prepare * use accelerator prepare for dataloader * fix for position_ids w packing * Update src/axolotl/utils/dataloader.py * validation for sample packing and doc * more fixes for 4k and optimizations * optimized expand mask fn * better handling of variance in multipack dataloader length and trainer hanging when it runs out of data * fix rounding of len of batches to int * better handling so that all devices have the same dataloader len * fix step calc for packing * pass sample packing efficiency to training args * add a test for the mask expansion for sequence packing * only process eval dataset for packing if not None * don't split batches when packing * weighted CE losses * weighted CEL fixes * limit packing to sequences of max seq len * seq_len_multiple for packing * make sure the chunk size is an int * sample_packing_seq_len_multiplier config * use cumulative seq len with var len flash attn v2 w packing * properly calculate max len * fix flash-attn, xformers, packing, support chatml * fix chatml system prompt for openorca, legacy tokenizer opts * add chatml * add unit tests for cum seq lens, add ability to build cu_seq_lens from positional ids, fix prompt test * fix test and pylint checks * more packing and dataset optimizations and fixes * filter w multiple cpus * more fixes and optimizations * fixes and go back to distributed sampler since batch sampler won't work * fix counts by accounting for num devices * fix steps calculation * previous accelerate is still most performant * add numba to requirements. * use custom distributed checks * fix sampler to prevent overfit w new epochs * let's not cleanup the cached datasets * calculate cum seq lens with pos_ids instead of mask, simplify packing params, fix distributed barrier * speed optimizations and set accelerate fsdp env vars * optimize dataset concatenation? * more optimizations for dataset handling * fix import for annotation * manual pre-commit fixes * another sum optimization and bug fix for calc steps * fix packing estimations * fix formatting * pylint problems * add back flash attention branch for handling unpacked sequences seperately * Address PR feedback * add optional sample packing config params to readme	2023-08-12 15:14:56 -04:00
Wing Lian	19cf0bda99	params are adam_, not adamw_	2023-07-08 12:13:39 -04:00
Wing Lian	ad5ca4f734	Additional test case per pr	2023-06-15 10:12:47 -04:00
Wing Lian	cb9d3af5c0	add validation and tests for adamw hyperparam	2023-06-15 09:39:42 -04:00
Wing Lian	fd2c9814c9	Merge branch 'main' into flash-optimum	2023-06-12 13:12:15 -04:00
Wing Lian	14668fa54e	new validation for mpt w grad checkpoints	2023-06-11 09:26:10 -04:00
Wing Lian	eea2731a5e	add streaming dataset support for pretraining datasets	2023-06-10 14:23:56 -04:00
NanoCode012	babf0fdb71	Validate falcon with fsdp	2023-06-09 00:29:04 +09:00
NanoCode012	3c71c8debe	Update doc for grad_accu and add validation tests for batch size	2023-06-01 06:13:47 +09:00
Wing Lian	6fa40bf8ad	black formatting	2023-05-30 23:33:37 -04:00
Wing Lian	3aad5f3b3e	add support for gradient accumulation steps	2023-05-30 23:24:37 -04:00
NanoCode012	37293dce07	Apply isort then black	2023-05-31 02:53:53 +09:00
NanoCode012	0dd35c74af	Ignore unsupported-binary-operation	2023-05-31 02:53:53 +09:00
NanoCode012	b832a0ac62	Black formatting	2023-05-31 02:53:53 +09:00
NanoCode012	1f3c3f5ea0	Lint validation	2023-05-31 02:53:53 +09:00
Wing Lian	fd5f9656a2	update for pr feedback	2023-05-28 14:23:27 -04:00
Wing Lian	1c33eb88a7	new hf_use_auth_token setting so login to hf isn't required	2023-05-28 13:08:49 -04:00
NanoCode012	52dd92a0cd	Feat: Update validate_config and add tests	2023-05-29 00:25:54 +09:00

43 Commits