Wing Lian
125cccb786
Refactor train cfg cli ( #499 )
...
* wip to cleanup cfg cli options
* fix launcher
* fix cli args
2023-08-29 05:37:53 -07:00
Aman Karmani
fd55bc87e2
use math.ceil instead of round /cc #498
2023-08-29 01:03:41 +00:00
Birch-san
8e197f6fb4
pad_to_worst_case_seq_len boolean, for testing memory limits ( #498 )
...
* pad_to_worst_case_seq_len boolean, for testing memory limits
* remove collator_pad_to_longest option since it does nothing
see docs: https://huggingface.co/docs/transformers/main_classes/data_collator#transformers.DataCollatorWithPadding.padding
True and "longest" mean the same thing
* rename to `pad_to_sequence_len, and ensure 64 alignment
---------
Co-authored-by: Aman Karmani <aman@tmm1.net >
2023-08-28 18:47:16 -04:00
Aman Karmani
267b7b24e5
simplify linear layer locator
2023-08-28 09:45:16 -04:00
Wing Lian
98bf76e236
fsdp requires params be the same type too ( #493 )
2023-08-28 04:33:50 -04:00
NanoCode012
4c37bd0b54
Fix(tokenizer): Make sure to add pad for CodeLlamaTokenizer ( #489 )
2023-08-28 09:39:10 +09:00
Aman Karmani
3a011ea1ef
fix condition and add logging
2023-08-27 20:09:26 +00:00
Aman Karmani
1f613e5aa7
Merge branch 'main' into patch-4
2023-08-27 19:57:34 +00:00
Aman Karmani
f319b0bc67
rename var and reformat
2023-08-27 19:55:11 +00:00
Maxime
7fd662dd89
Update src/axolotl/utils/models.py
...
Co-authored-by: Aman Gupta Karmani <aman@tmm1.net >
2023-08-27 21:01:43 +02:00
Maxime
9e699683d7
Update src/axolotl/utils/models.py
...
Co-authored-by: Aman Gupta Karmani <aman@tmm1.net >
2023-08-27 21:01:37 +02:00
Aman Karmani
868530c39c
let transformers handle adamw_bnb_8bit
2023-08-26 21:40:12 +00:00
Maxime
d03887fad5
ignore: address pr review
2023-08-26 22:45:45 +02:00
Maxime
a184549e4c
ignore: linter
2023-08-26 22:36:14 +02:00
Maxime
f311df9462
fix: finetune model inference needs the dtype fix to work with flash-attn
2023-08-26 22:34:11 +02:00
Wing Lian
31f3e71764
fix checkpints on multigpu ( #481 )
2023-08-26 12:00:03 -04:00
Wing Lian
0b7ba57ec4
fix types w lora ( #478 )
2023-08-25 02:03:24 -04:00
NanoCode012
71bd06243c
Fix(tokenizer): Fix condition to add pad token ( #477 )
...
* Fix(tokenizer): Fix condition to add pad token
* chore: fix lint
2023-08-25 14:30:50 +09:00
Wing Lian
cb9797ef5a
improve llama pad token handling ( #475 )
...
* improve llama pad token handling
* tweak logic to not clobber
2023-08-24 13:20:35 -04:00
Charles O. Goddard
bde3c5a478
ReLoRA implementation (with quantization) ( #322 )
...
* Experimental ReLoRA (+qlora) implementation
* Add CPU offload
* Remove local config
* Fix saving logic
* Remove redundant assert
* Fix logic errors
* Move ReLoRA into its own trainer class with a method override to create the proper scheduler
* Formatting & typing fixes
* Use safe_serialization
* Don't allow fsdp/deepspeed with ReLoRA
* Fix cpu-offload logic, enable multi gpu
* Document parameters and add comment
* Fix merge issue
* Smooth over some sharp edges
* Implement resume from checkpoint for relora
* Address review comments
* Fix saving logic
* Add necessary metadata to safetensors
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-08-23 23:07:18 -04:00
Wing Lian
c69faee7a7
workaround so training doesn't hang when packed dataloader batches aren't even ( #461 )
...
* workaround so training doesn't hang when packed dataloader batches aren't even
* don't bother labeling anything in the no-op data
2023-08-23 10:39:11 -04:00
TearGosling
f4746507f6
feat: add Metharme prompt strategy ( #446 )
...
* Add Metharme tokenizing strategy
This strategy accounts for how the Metharme JSONLs are formatted as well as adds duplicated EOS tokens which can help trim model output length.
I haven't gotten the chance to test this yet, and probably won't have the chance for quite a bit, so I'm committing this now.
* Redo Metharme tokenizing strategy
lol
* fix: oops
* Rearrange a conditional
* chore: reformat code in accordance with linter
* chore: Make lint not freak out
* chore: fix lint
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2023-08-22 11:21:45 +09:00
Wing Lian
96deb6bd67
recast loralayer, norm, lmhead + embed token weights per original qlora ( #393 )
...
* recast loralayer, norm, lmhead + embed token weights per original qlora
* try again for the fix
* refactor torch dtype picking
* linter fixes
* missing import for LoraLayer
* fix install for tests now that peft is involved
2023-08-21 18:41:12 -04:00
Wing Lian
50682a3c06
always drop samples that are too long ( #452 )
2023-08-21 16:43:33 -04:00
Wing Lian
5a1985ba24
set env var for FSDP layer to wrap ( #453 )
2023-08-21 16:43:22 -04:00
Aman Karmani
a213d9972a
fix eval regression caused in 13f7efaf74
2023-08-21 10:40:06 -07:00
Wing Lian
fbf49a4770
is_causal fix for evals?
2023-08-21 10:36:26 -04:00
Wing Lian
58cf7e7fed
add missing positional arg ( #450 )
2023-08-21 04:10:19 -04:00
Wing Lian
ee262818ef
fix evals ( #447 )
2023-08-20 23:39:42 -04:00
Wing Lian
9d629d8bff
gracefully handle empty input ( #442 )
2023-08-20 09:18:18 -04:00
Wing Lian
d2e7f27240
support user defined prompters, pretokenized datasets in config, local parquet, local arrow files ( #348 )
...
* support user defined prompters, pretokenized datasets in config, local parquet, local arrow files
* fix user defined dataset types
* fix for system prompts
* fix tests
* fix checks for parquet and arrow
* aha moment that d.data_files isn't used
* add documentation for ds_type to add support for parquet and arrow
2023-08-20 09:17:49 -04:00
Wing Lian
f733d0f31e
disable eval using multipack for now ( #437 )
2023-08-19 10:35:04 -04:00
Wing Lian
008505c8ae
fix comma, not a tuple ( #436 )
2023-08-19 00:57:40 -04:00
Wing Lian
b3f5e00ff5
use save_strategy from config if available ( #434 )
...
* use save_strategy from config if available
* update docs for save_strategy
2023-08-18 20:28:23 -04:00
Wing Lian
5247c5004e
set env for FSDP offload params ( #433 )
2023-08-18 20:28:09 -04:00
Aman Gupta Karmani
06edf175ac
standardize attn hijack patches ( #381 )
...
* split sdp attn into its own patch
* sync xformers patch to follow shared format and be diffable
* update flash-attn patch for 70B/GQA and inference using helper from flash-attn tests
* speed up flash-attn inference
* fix patch to check position ids and don't use multipack for evals
* copy LlamaModel.forward and LlamaDecoderLayer.forward into monkeypatch
* update forwards so we only calculate cu_seqlens once
* enable eval dataloader using multipack again
* fix the patch to work properly and work with FSDP
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-08-18 12:54:16 -04:00
mhenrichsen
0a228479b3
adds color ( #425 )
...
* adds color
* chore: lint
* fix for colorama
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-08-18 10:59:43 -04:00
Wing Lian
1b7e8604bb
fix orca prompts ( #422 )
2023-08-16 11:21:03 -04:00
NanoCode012
c01015f33f
Fix(config): Update handling of deepspeed config ( #404 )
...
* Fix(config): Update handling of deepspeed config
* feat: auto set deepspeed env if deepspeed passed
* fix: update new deepspeed instructions
2023-08-16 01:22:43 +09:00
Wing Lian
da10af03e9
fix eval steps and strategy ( #403 )
2023-08-15 07:28:50 -04:00
Wing Lian
85cf4f8e2c
better handling of empty input ids when tokenizing ( #395 )
...
* better handling of empty input ids when tokenizing
* Add warning if tokenizer resulted in empty result
* fix len comparison for linter
2023-08-15 01:09:59 -04:00
Aman Karmani
2e22404d2d
add utils.data.prepare_dataset
2023-08-14 21:28:29 -07:00
Wing Lian
fc2d6be96d
use context manager to run things on rank0 before others ( #397 )
2023-08-15 00:10:47 -04:00
Wing Lian
1687be6a35
don't use mask expansion for inference ( #392 )
2023-08-14 20:52:54 -04:00
Gabriel Puliatti
3c2ad00d07
Feat(config): add max steps ( #387 )
2023-08-14 11:19:29 -04:00
florian peyron
5d48a10548
Added "epoch" evaluation_strategy ( #388 )
2023-08-14 10:59:23 -04:00
NanoCode012
73a0b6ead5
Feat(config): Add hub_strategy ( #386 )
2023-08-14 07:12:55 -04:00
florian peyron
63fdb5a7fb
Error msg for sharegpt if conv has less than 2 msg ( #379 )
2023-08-14 17:40:40 +09:00
Wing Lian
919246fbc1
don't pass rope_scaling kwarg if it's None ( #383 )
2023-08-13 18:57:38 -04:00
Charles Goddard
15f6e57eaa
Fix crash when running without CUDA
2023-08-13 13:36:40 -07:00