Commit Graph

866 Commits

Author SHA1 Message Date
kingbri
995557bdf3 Prompters: ShareGPT: Allow for custom system prompts
If a system prompt is present in a conversation, add it instead of
using the default.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-09-01 13:53:05 -04:00
Maxime
1991946c5a fix: bad dtype for full finetune (#504)
* fix: bad dtype for full finetune

* Update src/axolotl/utils/models.py

Co-authored-by: Wing Lian <wing.lian@gmail.com>

* Update models.py

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2023-09-01 07:11:45 -07:00
NanoCode012
f51c9c56c6 Fix(doc): Inform Windows users to use WSL/docker (#518) 2023-09-01 00:08:21 -07:00
Wing Lian
7710e81f50 log supervised token count (#448) 2023-08-31 15:45:23 -07:00
Tom Jobbins
48434bec54 Debug tokenization output: Add ability to output text only (no tokens), and/or specify num samples to see (#511) 2023-08-31 14:26:52 -07:00
Jan Philipp Harries
396a7a74fc Added advanced DDP args (#515)
* add ddp_config

* add advanced ddp config

* add ddp_config

* add advanced ddp config

---------

Co-authored-by: Jan Philipp Harries <jphme@users.noreply.github.com>
2023-08-31 10:37:47 -07:00
Wing Lian
b21e4a20fe split train from other cli options (#503) 2023-08-30 22:01:47 -07:00
Alpay Ariyak
42f9642792 Changed Bench Eval to report metrics correctly by split. Added total accuracy and renamed previously used bench_accuracy to bench_average_accuracy. (#512)
* Added "eval_" prefix

* Added total bench accuracy and renamed the previous one to bench_average_accuracy. Changed naming to use bench_split instead of always using eval_ prefix.
2023-08-30 22:00:50 -07:00
Wing Lian
c56b450cf5 drop empty tokenized rows too (#509) 2023-08-30 06:55:26 -07:00
Aman Gupta Karmani
1e07c162f1 set zero3 optimizer betas to auto so they inherit from HF trainer config (#507) 2023-08-30 08:10:33 -04:00
Wing Lian
76576323df add eval benchmark callback (#441)
* add mmlu callback

* use hf dataset for mmlu evals

* default to mmlu-zs

* make sure to define all the explicit positional args

* include metrics in callback

* another callback fix for collator max len attribute

* fix mmlu evals

* sample benchmarks, ensure we drop long samples

* fix the data file

* fix elif and add better messaging

* more fixes

* rename mmlu to bench

* more fixes

* dataset handling and aggregate across benchmark

* better handling when no subjects

* benchmark callback has its own dataloader and collator

* fixes

* updated dataset

* more fixes

* missing transformers import

* improve support for customized dataset for bench evals

* gather benchmarks from all ranks

* fix for gather across multiple gpus
2023-08-29 13:24:19 -07:00
Wing Lian
548787daae customizable ascii art (#506) 2023-08-29 10:13:42 -07:00
Wing Lian
5ac3392075 support for datasets with multiple names (#480)
* support for datasets with multiple names

* update docs
2023-08-29 06:18:17 -07:00
Aman Gupta Karmani
e356b297cb remove --force-reinstall from Dockerfile to ensure correct pytorch version (#492) 2023-08-29 06:17:51 -07:00
NanoCode012
48c56470d0 Fix(doc): Clarify no amp to full yaml docs (#496) 2023-08-29 06:17:37 -07:00
Maxime
36b2e1cfee tweak: use default config file when only one file is present (#501) 2023-08-29 06:17:10 -07:00
Wing Lian
125cccb786 Refactor train cfg cli (#499)
* wip to cleanup cfg cli options

* fix launcher

* fix cli args
2023-08-29 05:37:53 -07:00
Aman Karmani
fd55bc87e2 use math.ceil instead of round /cc #498 2023-08-29 01:03:41 +00:00
Birch-san
8e197f6fb4 pad_to_worst_case_seq_len boolean, for testing memory limits (#498)
* pad_to_worst_case_seq_len boolean, for testing memory limits

* remove collator_pad_to_longest option since it does nothing

see docs: https://huggingface.co/docs/transformers/main_classes/data_collator#transformers.DataCollatorWithPadding.padding

True and "longest" mean the same thing

* rename to `pad_to_sequence_len, and ensure 64 alignment

---------

Co-authored-by: Aman Karmani <aman@tmm1.net>
2023-08-28 18:47:16 -04:00
Aman Karmani
267b7b24e5 simplify linear layer locator 2023-08-28 09:45:16 -04:00
Wing Lian
98bf76e236 fsdp requires params be the same type too (#493) 2023-08-28 04:33:50 -04:00
NanoCode012
4c37bd0b54 Fix(tokenizer): Make sure to add pad for CodeLlamaTokenizer (#489) 2023-08-28 09:39:10 +09:00
Aman Gupta Karmani
f144e98a32 Merge pull request #485 from maximegmd/patch-4
fix: finetune model inference needs the dtype fix to work with flash-attn
2023-08-27 16:27:47 -04:00
Aman Karmani
3a011ea1ef fix condition and add logging 2023-08-27 20:09:26 +00:00
Aman Karmani
1f613e5aa7 Merge branch 'main' into patch-4 2023-08-27 19:57:34 +00:00
Aman Karmani
f319b0bc67 rename var and reformat 2023-08-27 19:55:11 +00:00
Maxime
7fd662dd89 Update src/axolotl/utils/models.py
Co-authored-by: Aman Gupta Karmani <aman@tmm1.net>
2023-08-27 21:01:43 +02:00
Maxime
9e699683d7 Update src/axolotl/utils/models.py
Co-authored-by: Aman Gupta Karmani <aman@tmm1.net>
2023-08-27 21:01:37 +02:00
mhenrichsen
35130711d6 Feat(cfg): Add code-llama configs for all sizes (#479)
* configs for all sizes

* update tokenizer type

---------

Co-authored-by: mhenrichsen <some_email@hey.com>
2023-08-27 10:20:17 +09:00
mhenrichsen
3fc9006298 Feat(deepspeed): Add zero2 config (#476)
* zero2 config

* config added

* linting

---------

Co-authored-by: mhenrichsen <some_email@hey.com>
2023-08-27 10:10:33 +09:00
NanoCode012
ad8be435ad Feat(doc): Update eval_steps doc (#487) 2023-08-27 10:09:09 +09:00
Charles O. Goddard
fe4d6baf92 Add example Llama 2 ReLoRA config (#471)
* Add example Llama 2 ReLoRA config

* Use adamw_bnb_8bit in example relora config
2023-08-27 10:08:34 +09:00
Aman Gupta Karmani
f31301063d Merge pull request #486 from OpenAccess-AI-Collective/adam-bnb-simpler
let transformers handle adamw_bnb_8bit
2023-08-26 20:44:19 -04:00
Aman Karmani
868530c39c let transformers handle adamw_bnb_8bit 2023-08-26 21:40:12 +00:00
Maxime
d03887fad5 ignore: address pr review 2023-08-26 22:45:45 +02:00
Maxime
17605b85d8 fix: inference did not move the model to the correct device (#483) 2023-08-26 16:40:56 -04:00
Maxime
a184549e4c ignore: linter 2023-08-26 22:36:14 +02:00
Maxime
f311df9462 fix: finetune model inference needs the dtype fix to work with flash-attn 2023-08-26 22:34:11 +02:00
Maxime
c500d02517 Fix missing 'packaging' wheel (#482) 2023-08-26 12:02:15 -04:00
Wing Lian
31f3e71764 fix checkpints on multigpu (#481) 2023-08-26 12:00:03 -04:00
Aman Gupta Karmani
56c4a94caf Merge pull request #484 from OpenAccess-AI-Collective/reqs
allow newer deps in requirements.txt
2023-08-26 11:13:41 -04:00
Aman Karmani
c29117a0d7 allow newer deps 2023-08-26 15:06:05 +00:00
Wing Lian
0b7ba57ec4 fix types w lora (#478) 2023-08-25 02:03:24 -04:00
NanoCode012
71bd06243c Fix(tokenizer): Fix condition to add pad token (#477)
* Fix(tokenizer): Fix condition to add pad token

* chore: fix lint
2023-08-25 14:30:50 +09:00
Wing Lian
cb9797ef5a improve llama pad token handling (#475)
* improve llama pad token handling

* tweak logic to not clobber
2023-08-24 13:20:35 -04:00
Charles O. Goddard
bde3c5a478 ReLoRA implementation (with quantization) (#322)
* Experimental ReLoRA (+qlora) implementation

* Add CPU offload

* Remove local config

* Fix saving logic

* Remove redundant assert

* Fix logic errors

* Move ReLoRA into its own trainer class with a method override to create the proper scheduler

* Formatting & typing fixes

* Use safe_serialization

* Don't allow fsdp/deepspeed with ReLoRA

* Fix cpu-offload logic, enable multi gpu

* Document parameters and add comment

* Fix merge issue

* Smooth over some sharp edges

* Implement resume from checkpoint for relora

* Address review comments

* Fix saving logic

* Add necessary metadata to safetensors

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2023-08-23 23:07:18 -04:00
NanoCode012
55c23c7bcb Fix(doc): Clarify config (#466) 2023-08-23 11:56:01 -04:00
Wing Lian
c69faee7a7 workaround so training doesn't hang when packed dataloader batches aren't even (#461)
* workaround so training doesn't hang when packed dataloader batches aren't even

* don't bother labeling anything in the no-op data
2023-08-23 10:39:11 -04:00
Wing Lian
d5dcf9c350 fix test fixture b/c hf trainer tokenization changed (#464) 2023-08-23 04:04:49 -04:00
TearGosling
f4746507f6 feat: add Metharme prompt strategy (#446)
* Add Metharme tokenizing strategy

This strategy accounts for how the Metharme JSONLs are formatted as well as adds duplicated EOS tokens which can help trim model output length.
I haven't gotten the chance to test this yet, and probably won't have the chance for quite a bit, so I'm committing this now.

* Redo Metharme tokenizing strategy

lol

* fix: oops

* Rearrange a conditional

* chore: reformat code in accordance with linter

* chore: Make lint not freak out

* chore: fix lint

---------

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
2023-08-22 11:21:45 +09:00