Commit Graph

28 Commits

Author SHA1 Message Date
NanoCode012
1ffa3866f2 Feat: Warns to add to modules_to_save when adding tokens or switching special_tokens (#787)
* Feat: Auto add to modules_to_save when adding tokens

* fix: swap to error instead of warning

* feat: add check when special_tokens differ and add test
2023-12-22 21:49:07 +09:00
NanoCode012
a1da39cd48 Feat(wandb): Refactor to be more flexible (#767)
* Feat: Update to handle wandb env better

* chore: rename wandb_run_id to wandb_name

* feat: add new recommendation and update config

* fix: indent and pop disabled env if project passed

* feat: test env set for wandb and recommendation

* feat: update to use wandb_name and allow id

* chore: add info to readme
2023-12-04 22:17:25 +09:00
NanoCode012
fb12895a17 Feat: Add warmup_ratio (#893)
* Feat: Add warmup_ratio

* fix: update readme with more details on conflict
2023-11-25 12:15:43 +09:00
NanoCode012
44c9d0151a Fix: Warn when fullfinetune without adapter (#770) 2023-10-22 15:41:43 -04:00
NanoCode012
9923b72649 Fix: eval table conflict with eval_sample_packing (#769) 2023-10-23 01:18:12 +09:00
NanoCode012
383f88d7a7 Fix(cfg): Add validation for save_strategy and eval_strategy (#633)
* Fix(cfg): Check save_strategy cfg conflict with save_steps

* Fix(cfg): Check evaluation_strategy cfg conflict with eval_steps

* chore: add extra check for steps only
2023-09-28 10:14:41 +09:00
Wing Lian
e7d3e2dbb6 use fastchat conversations template (#578)
* use fastchat conversations template

* require fastchat (fschat) pip install

* handle roles dynamically from conversation

* tweak fastchat conversation with a monkeypatch to get individual turns

* fix up so it works with multiple conversation styles, and don't strip the turns

* fix sharegpt fixture now that we're using a more correct tokenization

* use a new prompter and support fastchat conversation type

* use sharegpt from prompt strategies now

* update docs, add chatml template

* add a newline after im_end token

* ensure we correctly set system message

* update per PR feedback to handle deprecated sharegpt types

* don't add duplicate wandb req

* make sharegpt fields configurable from yml

* llama2 fixes

* don't fail fatally when turns are improper
2023-09-27 12:10:45 -04:00
NanoCode012
cfbce020e9 Fix: Fail bf16 check when running on cpu during merge (#631) 2023-09-25 13:48:18 +09:00
Wing Lian
343714972b recommend padding when using sample packing (#531) 2023-09-06 17:00:21 -04:00
Aman Karmani
8cec513447 extract module for working with cfg 2023-08-12 18:25:27 -07:00
Wing Lian
2bb0b78975 Attention mask and position id fixes for packing (#285)
* fix attetion mask with packing

* set position ids and use block diagonal attn mask

* fix expand mask for multiple batch items, make sure we pad position_ids

* don't move masks to cpu

* use multi pack dataloader w random sampler

* add position_ids back

* more fixes for dataloader integration

* est total tokens, fix field loop

* more fixes, position_ids seems broken

* more fixes for sample packing

* use distributed sampler, avoid accelerate prepare

* use accelerator prepare for dataloader

* fix for position_ids w packing

* Update src/axolotl/utils/dataloader.py

* validation for sample packing and doc

* more fixes for 4k and optimizations

* optimized expand mask fn

* better handling of variance in multipack dataloader length and trainer hanging when it runs out of data

* fix rounding of len of batches to int

* better handling so that all devices have the same dataloader len

* fix step calc for packing

* pass sample packing efficiency to training args

* add a test for the mask expansion for sequence packing

* only process eval dataset for packing if not None

* don't split batches when packing

* weighted CE losses

* weighted CEL fixes

* limit packing to sequences of max seq len

* seq_len_multiple for packing

* make sure the chunk size is an int

* sample_packing_seq_len_multiplier config

* use cumulative seq len with var len flash attn v2 w packing

* properly calculate max len

* fix flash-attn, xformers, packing, support chatml

* fix chatml system prompt for openorca, legacy tokenizer opts

* add chatml

* add unit tests for cum seq lens, add ability to build cu_seq_lens from positional ids, fix prompt test

* fix test and pylint checks

* more packing and dataset optimizations and fixes

* filter w multiple cpus

* more fixes and optimizations

* fixes and go back to distributed sampler since batch sampler won't work

* fix counts by accounting for num devices

* fix steps calculation

* previous accelerate is still most performant

* add numba to requirements.

* use custom distributed checks

* fix sampler to prevent overfit w new epochs

* let's not cleanup the cached datasets

* calculate cum seq lens with pos_ids instead of mask, simplify packing params, fix distributed barrier

* speed optimizations and set accelerate fsdp env vars

* optimize dataset concatenation?

* more optimizations for dataset handling

* fix import for annotation

* manual pre-commit fixes

* another sum optimization and bug fix for calc steps

* fix packing estimations

* fix formatting

* pylint problems

* add back flash attention branch for handling unpacked sequences seperately

* Address PR feedback

* add optional sample packing config params to readme
2023-08-12 15:14:56 -04:00
Wing Lian
19cf0bda99 params are adam_*, not adamw_* 2023-07-08 12:13:39 -04:00
Wing Lian
ad5ca4f734 Additional test case per pr 2023-06-15 10:12:47 -04:00
Wing Lian
cb9d3af5c0 add validation and tests for adamw hyperparam 2023-06-15 09:39:42 -04:00
Wing Lian
fd2c9814c9 Merge branch 'main' into flash-optimum 2023-06-12 13:12:15 -04:00
Wing Lian
14668fa54e new validation for mpt w grad checkpoints 2023-06-11 09:26:10 -04:00
Wing Lian
eea2731a5e add streaming dataset support for pretraining datasets 2023-06-10 14:23:56 -04:00
NanoCode012
babf0fdb71 Validate falcon with fsdp 2023-06-09 00:29:04 +09:00
NanoCode012
3c71c8debe Update doc for grad_accu and add validation tests for batch size 2023-06-01 06:13:47 +09:00
Wing Lian
6fa40bf8ad black formatting 2023-05-30 23:33:37 -04:00
Wing Lian
3aad5f3b3e add support for gradient accumulation steps 2023-05-30 23:24:37 -04:00
NanoCode012
37293dce07 Apply isort then black 2023-05-31 02:53:53 +09:00
NanoCode012
0dd35c74af Ignore unsupported-binary-operation 2023-05-31 02:53:53 +09:00
NanoCode012
b832a0ac62 Black formatting 2023-05-31 02:53:53 +09:00
NanoCode012
1f3c3f5ea0 Lint validation 2023-05-31 02:53:53 +09:00
Wing Lian
fd5f9656a2 update for pr feedback 2023-05-28 14:23:27 -04:00
Wing Lian
1c33eb88a7 new hf_use_auth_token setting so login to hf isn't required 2023-05-28 13:08:49 -04:00
NanoCode012
52dd92a0cd Feat: Update validate_config and add tests 2023-05-29 00:25:54 +09:00