Wing Lian
8b16ecd448
updated dataset
2023-08-28 05:39:13 -04:00
Wing Lian
f5db88a10d
fixes
2023-08-28 05:39:13 -04:00
Wing Lian
99d844f215
benchmark callback has its own dataloader and collator
2023-08-28 05:39:13 -04:00
Wing Lian
aefd4d74fa
better handling when no subjects
2023-08-28 05:39:13 -04:00
Wing Lian
24b0e93235
dataset handling and aggregate across benchmark
2023-08-28 05:39:13 -04:00
Wing Lian
2455254b92
more fixes
2023-08-28 05:39:13 -04:00
Wing Lian
918e040601
rename mmlu to bench
2023-08-28 05:39:13 -04:00
Wing Lian
ef062d8fcb
more fixes
2023-08-28 05:39:13 -04:00
Wing Lian
d4c8b66f3d
fix elif and add better messaging
2023-08-28 05:39:13 -04:00
Wing Lian
64e9824d3e
fix the data file
2023-08-28 05:39:13 -04:00
Wing Lian
1134654c98
sample benchmarks, ensure we drop long samples
2023-08-28 05:39:13 -04:00
Wing Lian
2fc756c289
fix mmlu evals
2023-08-28 05:39:13 -04:00
Wing Lian
943b84c490
another callback fix for collator max len attribute
2023-08-28 05:39:13 -04:00
Wing Lian
6f166464d8
include metrics in callback
2023-08-28 05:39:13 -04:00
Wing Lian
e3b07402a7
make sure to define all the explicit positional args
2023-08-28 05:39:13 -04:00
Wing Lian
8d3c8a3eab
default to mmlu-zs
2023-08-28 05:39:13 -04:00
Wing Lian
c30120e684
use hf dataset for mmlu evals
2023-08-28 05:39:13 -04:00
Wing Lian
9aed60fa54
add mmlu callback
2023-08-28 05:39:12 -04:00
Wing Lian
98bf76e236
fsdp requires params be the same type too ( #493 )
2023-08-28 04:33:50 -04:00
NanoCode012
4c37bd0b54
Fix(tokenizer): Make sure to add pad for CodeLlamaTokenizer ( #489 )
2023-08-28 09:39:10 +09:00
Aman Gupta Karmani
f144e98a32
Merge pull request #485 from maximegmd/patch-4
...
fix: finetune model inference needs the dtype fix to work with flash-attn
2023-08-27 16:27:47 -04:00
Aman Karmani
3a011ea1ef
fix condition and add logging
2023-08-27 20:09:26 +00:00
Aman Karmani
1f613e5aa7
Merge branch 'main' into patch-4
2023-08-27 19:57:34 +00:00
Aman Karmani
f319b0bc67
rename var and reformat
2023-08-27 19:55:11 +00:00
Maxime
7fd662dd89
Update src/axolotl/utils/models.py
...
Co-authored-by: Aman Gupta Karmani <aman@tmm1.net >
2023-08-27 21:01:43 +02:00
Maxime
9e699683d7
Update src/axolotl/utils/models.py
...
Co-authored-by: Aman Gupta Karmani <aman@tmm1.net >
2023-08-27 21:01:37 +02:00
mhenrichsen
35130711d6
Feat(cfg): Add code-llama configs for all sizes ( #479 )
...
* configs for all sizes
* update tokenizer type
---------
Co-authored-by: mhenrichsen <some_email@hey.com >
2023-08-27 10:20:17 +09:00
mhenrichsen
3fc9006298
Feat(deepspeed): Add zero2 config ( #476 )
...
* zero2 config
* config added
* linting
---------
Co-authored-by: mhenrichsen <some_email@hey.com >
2023-08-27 10:10:33 +09:00
NanoCode012
ad8be435ad
Feat(doc): Update eval_steps doc ( #487 )
2023-08-27 10:09:09 +09:00
Charles O. Goddard
fe4d6baf92
Add example Llama 2 ReLoRA config ( #471 )
...
* Add example Llama 2 ReLoRA config
* Use adamw_bnb_8bit in example relora config
2023-08-27 10:08:34 +09:00
Aman Gupta Karmani
f31301063d
Merge pull request #486 from OpenAccess-AI-Collective/adam-bnb-simpler
...
let transformers handle adamw_bnb_8bit
2023-08-26 20:44:19 -04:00
Aman Karmani
868530c39c
let transformers handle adamw_bnb_8bit
2023-08-26 21:40:12 +00:00
Maxime
d03887fad5
ignore: address pr review
2023-08-26 22:45:45 +02:00
Maxime
17605b85d8
fix: inference did not move the model to the correct device ( #483 )
2023-08-26 16:40:56 -04:00
Maxime
a184549e4c
ignore: linter
2023-08-26 22:36:14 +02:00
Maxime
f311df9462
fix: finetune model inference needs the dtype fix to work with flash-attn
2023-08-26 22:34:11 +02:00
Maxime
c500d02517
Fix missing 'packaging' wheel ( #482 )
2023-08-26 12:02:15 -04:00
Wing Lian
31f3e71764
fix checkpints on multigpu ( #481 )
2023-08-26 12:00:03 -04:00
Aman Gupta Karmani
56c4a94caf
Merge pull request #484 from OpenAccess-AI-Collective/reqs
...
allow newer deps in requirements.txt
2023-08-26 11:13:41 -04:00
Aman Karmani
c29117a0d7
allow newer deps
2023-08-26 15:06:05 +00:00
Wing Lian
0b7ba57ec4
fix types w lora ( #478 )
2023-08-25 02:03:24 -04:00
NanoCode012
71bd06243c
Fix(tokenizer): Fix condition to add pad token ( #477 )
...
* Fix(tokenizer): Fix condition to add pad token
* chore: fix lint
2023-08-25 14:30:50 +09:00
Wing Lian
cb9797ef5a
improve llama pad token handling ( #475 )
...
* improve llama pad token handling
* tweak logic to not clobber
2023-08-24 13:20:35 -04:00
Charles O. Goddard
bde3c5a478
ReLoRA implementation (with quantization) ( #322 )
...
* Experimental ReLoRA (+qlora) implementation
* Add CPU offload
* Remove local config
* Fix saving logic
* Remove redundant assert
* Fix logic errors
* Move ReLoRA into its own trainer class with a method override to create the proper scheduler
* Formatting & typing fixes
* Use safe_serialization
* Don't allow fsdp/deepspeed with ReLoRA
* Fix cpu-offload logic, enable multi gpu
* Document parameters and add comment
* Fix merge issue
* Smooth over some sharp edges
* Implement resume from checkpoint for relora
* Address review comments
* Fix saving logic
* Add necessary metadata to safetensors
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-08-23 23:07:18 -04:00
NanoCode012
55c23c7bcb
Fix(doc): Clarify config ( #466 )
2023-08-23 11:56:01 -04:00
Wing Lian
c69faee7a7
workaround so training doesn't hang when packed dataloader batches aren't even ( #461 )
...
* workaround so training doesn't hang when packed dataloader batches aren't even
* don't bother labeling anything in the no-op data
2023-08-23 10:39:11 -04:00
Wing Lian
d5dcf9c350
fix test fixture b/c hf trainer tokenization changed ( #464 )
2023-08-23 04:04:49 -04:00
TearGosling
f4746507f6
feat: add Metharme prompt strategy ( #446 )
...
* Add Metharme tokenizing strategy
This strategy accounts for how the Metharme JSONLs are formatted as well as adds duplicated EOS tokens which can help trim model output length.
I haven't gotten the chance to test this yet, and probably won't have the chance for quite a bit, so I'm committing this now.
* Redo Metharme tokenizing strategy
lol
* fix: oops
* Rearrange a conditional
* chore: reformat code in accordance with linter
* chore: Make lint not freak out
* chore: fix lint
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2023-08-22 11:21:45 +09:00
Wing Lian
96deb6bd67
recast loralayer, norm, lmhead + embed token weights per original qlora ( #393 )
...
* recast loralayer, norm, lmhead + embed token weights per original qlora
* try again for the fix
* refactor torch dtype picking
* linter fixes
* missing import for LoraLayer
* fix install for tests now that peft is involved
2023-08-21 18:41:12 -04:00
Wing Lian
50682a3c06
always drop samples that are too long ( #452 )
2023-08-21 16:43:33 -04:00