Wing Lian
|
c3de28942c
|
fix for gather across multiple gpus
pre-commit / pre-commit (push) Has been cancelled
PyTest / test (3.10) (push) Has been cancelled
PyTest / test (3.9) (push) Has been cancelled
|
2023-08-29 06:57:28 -07:00 |
|
Wing Lian
|
45848a9285
|
gather benchmarks from all ranks
|
2023-08-28 11:29:59 -04:00 |
|
Wing Lian
|
d6cea18034
|
improve support for customized dataset for bench evals
|
2023-08-28 06:03:53 -04:00 |
|
Wing Lian
|
606846e0a5
|
missing transformers import
|
2023-08-28 05:43:19 -04:00 |
|
Wing Lian
|
a6c9223114
|
more fixes
|
2023-08-28 05:39:13 -04:00 |
|
Wing Lian
|
8b16ecd448
|
updated dataset
|
2023-08-28 05:39:13 -04:00 |
|
Wing Lian
|
f5db88a10d
|
fixes
|
2023-08-28 05:39:13 -04:00 |
|
Wing Lian
|
99d844f215
|
benchmark callback has its own dataloader and collator
|
2023-08-28 05:39:13 -04:00 |
|
Wing Lian
|
aefd4d74fa
|
better handling when no subjects
|
2023-08-28 05:39:13 -04:00 |
|
Wing Lian
|
24b0e93235
|
dataset handling and aggregate across benchmark
|
2023-08-28 05:39:13 -04:00 |
|
Wing Lian
|
2455254b92
|
more fixes
|
2023-08-28 05:39:13 -04:00 |
|
Wing Lian
|
918e040601
|
rename mmlu to bench
|
2023-08-28 05:39:13 -04:00 |
|
Wing Lian
|
ef062d8fcb
|
more fixes
|
2023-08-28 05:39:13 -04:00 |
|
Wing Lian
|
d4c8b66f3d
|
fix elif and add better messaging
|
2023-08-28 05:39:13 -04:00 |
|
Wing Lian
|
64e9824d3e
|
fix the data file
|
2023-08-28 05:39:13 -04:00 |
|
Wing Lian
|
1134654c98
|
sample benchmarks, ensure we drop long samples
|
2023-08-28 05:39:13 -04:00 |
|
Wing Lian
|
2fc756c289
|
fix mmlu evals
|
2023-08-28 05:39:13 -04:00 |
|
Wing Lian
|
943b84c490
|
another callback fix for collator max len attribute
|
2023-08-28 05:39:13 -04:00 |
|
Wing Lian
|
6f166464d8
|
include metrics in callback
|
2023-08-28 05:39:13 -04:00 |
|
Wing Lian
|
e3b07402a7
|
make sure to define all the explicit positional args
|
2023-08-28 05:39:13 -04:00 |
|
Wing Lian
|
8d3c8a3eab
|
default to mmlu-zs
|
2023-08-28 05:39:13 -04:00 |
|
Wing Lian
|
c30120e684
|
use hf dataset for mmlu evals
|
2023-08-28 05:39:13 -04:00 |
|
Wing Lian
|
9aed60fa54
|
add mmlu callback
|
2023-08-28 05:39:12 -04:00 |
|
Wing Lian
|
98bf76e236
|
fsdp requires params be the same type too (#493)
|
2023-08-28 04:33:50 -04:00 |
|
NanoCode012
|
4c37bd0b54
|
Fix(tokenizer): Make sure to add pad for CodeLlamaTokenizer (#489)
|
2023-08-28 09:39:10 +09:00 |
|
Aman Gupta Karmani
|
f144e98a32
|
Merge pull request #485 from maximegmd/patch-4
fix: finetune model inference needs the dtype fix to work with flash-attn
|
2023-08-27 16:27:47 -04:00 |
|
Aman Karmani
|
3a011ea1ef
|
fix condition and add logging
|
2023-08-27 20:09:26 +00:00 |
|
Aman Karmani
|
1f613e5aa7
|
Merge branch 'main' into patch-4
|
2023-08-27 19:57:34 +00:00 |
|
Aman Karmani
|
f319b0bc67
|
rename var and reformat
|
2023-08-27 19:55:11 +00:00 |
|
Maxime
|
7fd662dd89
|
Update src/axolotl/utils/models.py
Co-authored-by: Aman Gupta Karmani <aman@tmm1.net>
|
2023-08-27 21:01:43 +02:00 |
|
Maxime
|
9e699683d7
|
Update src/axolotl/utils/models.py
Co-authored-by: Aman Gupta Karmani <aman@tmm1.net>
|
2023-08-27 21:01:37 +02:00 |
|
mhenrichsen
|
35130711d6
|
Feat(cfg): Add code-llama configs for all sizes (#479)
* configs for all sizes
* update tokenizer type
---------
Co-authored-by: mhenrichsen <some_email@hey.com>
|
2023-08-27 10:20:17 +09:00 |
|
mhenrichsen
|
3fc9006298
|
Feat(deepspeed): Add zero2 config (#476)
* zero2 config
* config added
* linting
---------
Co-authored-by: mhenrichsen <some_email@hey.com>
|
2023-08-27 10:10:33 +09:00 |
|
NanoCode012
|
ad8be435ad
|
Feat(doc): Update eval_steps doc (#487)
|
2023-08-27 10:09:09 +09:00 |
|
Charles O. Goddard
|
fe4d6baf92
|
Add example Llama 2 ReLoRA config (#471)
* Add example Llama 2 ReLoRA config
* Use adamw_bnb_8bit in example relora config
|
2023-08-27 10:08:34 +09:00 |
|
Aman Gupta Karmani
|
f31301063d
|
Merge pull request #486 from OpenAccess-AI-Collective/adam-bnb-simpler
let transformers handle adamw_bnb_8bit
|
2023-08-26 20:44:19 -04:00 |
|
Aman Karmani
|
868530c39c
|
let transformers handle adamw_bnb_8bit
|
2023-08-26 21:40:12 +00:00 |
|
Maxime
|
d03887fad5
|
ignore: address pr review
|
2023-08-26 22:45:45 +02:00 |
|
Maxime
|
17605b85d8
|
fix: inference did not move the model to the correct device (#483)
|
2023-08-26 16:40:56 -04:00 |
|
Maxime
|
a184549e4c
|
ignore: linter
|
2023-08-26 22:36:14 +02:00 |
|
Maxime
|
f311df9462
|
fix: finetune model inference needs the dtype fix to work with flash-attn
|
2023-08-26 22:34:11 +02:00 |
|
Maxime
|
c500d02517
|
Fix missing 'packaging' wheel (#482)
|
2023-08-26 12:02:15 -04:00 |
|
Wing Lian
|
31f3e71764
|
fix checkpints on multigpu (#481)
|
2023-08-26 12:00:03 -04:00 |
|
Aman Gupta Karmani
|
56c4a94caf
|
Merge pull request #484 from OpenAccess-AI-Collective/reqs
allow newer deps in requirements.txt
|
2023-08-26 11:13:41 -04:00 |
|
Aman Karmani
|
c29117a0d7
|
allow newer deps
|
2023-08-26 15:06:05 +00:00 |
|
Wing Lian
|
0b7ba57ec4
|
fix types w lora (#478)
|
2023-08-25 02:03:24 -04:00 |
|
NanoCode012
|
71bd06243c
|
Fix(tokenizer): Fix condition to add pad token (#477)
* Fix(tokenizer): Fix condition to add pad token
* chore: fix lint
|
2023-08-25 14:30:50 +09:00 |
|
Wing Lian
|
cb9797ef5a
|
improve llama pad token handling (#475)
* improve llama pad token handling
* tweak logic to not clobber
|
2023-08-24 13:20:35 -04:00 |
|
Charles O. Goddard
|
bde3c5a478
|
ReLoRA implementation (with quantization) (#322)
* Experimental ReLoRA (+qlora) implementation
* Add CPU offload
* Remove local config
* Fix saving logic
* Remove redundant assert
* Fix logic errors
* Move ReLoRA into its own trainer class with a method override to create the proper scheduler
* Formatting & typing fixes
* Use safe_serialization
* Don't allow fsdp/deepspeed with ReLoRA
* Fix cpu-offload logic, enable multi gpu
* Document parameters and add comment
* Fix merge issue
* Smooth over some sharp edges
* Implement resume from checkpoint for relora
* Address review comments
* Fix saving logic
* Add necessary metadata to safetensors
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
|
2023-08-23 23:07:18 -04:00 |
|
NanoCode012
|
55c23c7bcb
|
Fix(doc): Clarify config (#466)
|
2023-08-23 11:56:01 -04:00 |
|