Wing Lian
881d333b84
wip for new datasets abstractions
pre-commit / pre-commit (push) Has been cancelled
PyTest / test (3.10) (push) Has been cancelled
PyTest / test (3.9) (push) Has been cancelled
2023-09-05 16:37:48 -04:00
Wing Lian
3355706e22
Add support for GPTQ using native transformers/peft ( #468 )
...
* auto gptq support
* more tweaks and add yml
* remove old gptq docker
* don't need explicit peft install for tests
* fix setup.py to use extra index url
install torch for tests
fix cuda version for autogptq index
set torch in requirements so that it installs properly
move gptq install around to work with github cicd
* gptq doesn't play well with sample packing
* address pr feedback
* remove torch install for now
* set quantization_config from model config
* Fix the implementation for getting quant config from model config
2023-09-05 12:43:22 -04:00
mhenrichsen
daa4faca12
Merge pull request #520 from bdashore3/sharegpt-fixes
...
Allow for custom system prompts with ShareGPT
2023-09-05 09:02:55 +02:00
Aman Karmani
fc8766e502
reorg a bit
2023-09-05 02:21:24 +00:00
Aman Gupta Karmani
72a6fe1c1f
use flash_attn rmsnorm when available ( #526 )
...
* use flash_attn xentropy when available
* use flash_attn.ops.rms_norm when available
* log when xentropy is not found
* log how to install RMSNorm
* add quotes so pip install works
2023-09-04 19:44:51 -04:00
Aman Gupta Karmani
5fe30b1497
use flash_attn xentropy when available ( #525 )
...
* use flash_attn xentropy when available
* log when xentropy is not found
2023-09-04 17:49:16 -04:00
Aman Gupta Karmani
44454ae4c4
move is_llama_derived_model into normalize_config ( #524 )
2023-09-04 00:19:03 -04:00
Wing Lian
09f154397e
No gather single gpu ( #523 )
...
* don't attempt to gather on multi-gpu
* also check distributed status in bench callback
2023-09-03 23:24:28 -04:00
kingbri
995557bdf3
Prompters: ShareGPT: Allow for custom system prompts
...
If a system prompt is present in a conversation, add it instead of
using the default.
Signed-off-by: kingbri <bdashore3@proton.me >
2023-09-01 13:53:05 -04:00
Maxime
1991946c5a
fix: bad dtype for full finetune ( #504 )
...
* fix: bad dtype for full finetune
* Update src/axolotl/utils/models.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* Update models.py
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-09-01 07:11:45 -07:00
NanoCode012
f51c9c56c6
Fix(doc): Inform Windows users to use WSL/docker ( #518 )
2023-09-01 00:08:21 -07:00
Wing Lian
7710e81f50
log supervised token count ( #448 )
2023-08-31 15:45:23 -07:00
Tom Jobbins
48434bec54
Debug tokenization output: Add ability to output text only (no tokens), and/or specify num samples to see ( #511 )
2023-08-31 14:26:52 -07:00
Jan Philipp Harries
396a7a74fc
Added advanced DDP args ( #515 )
...
* add ddp_config
* add advanced ddp config
* add ddp_config
* add advanced ddp config
---------
Co-authored-by: Jan Philipp Harries <jphme@users.noreply.github.com >
2023-08-31 10:37:47 -07:00
Wing Lian
b21e4a20fe
split train from other cli options ( #503 )
2023-08-30 22:01:47 -07:00
Alpay Ariyak
42f9642792
Changed Bench Eval to report metrics correctly by split. Added total accuracy and renamed previously used bench_accuracy to bench_average_accuracy. ( #512 )
...
* Added "eval_" prefix
* Added total bench accuracy and renamed the previous one to bench_average_accuracy. Changed naming to use bench_split instead of always using eval_ prefix.
2023-08-30 22:00:50 -07:00
Wing Lian
c56b450cf5
drop empty tokenized rows too ( #509 )
2023-08-30 06:55:26 -07:00
Aman Gupta Karmani
1e07c162f1
set zero3 optimizer betas to auto so they inherit from HF trainer config ( #507 )
2023-08-30 08:10:33 -04:00
Wing Lian
76576323df
add eval benchmark callback ( #441 )
...
* add mmlu callback
* use hf dataset for mmlu evals
* default to mmlu-zs
* make sure to define all the explicit positional args
* include metrics in callback
* another callback fix for collator max len attribute
* fix mmlu evals
* sample benchmarks, ensure we drop long samples
* fix the data file
* fix elif and add better messaging
* more fixes
* rename mmlu to bench
* more fixes
* dataset handling and aggregate across benchmark
* better handling when no subjects
* benchmark callback has its own dataloader and collator
* fixes
* updated dataset
* more fixes
* missing transformers import
* improve support for customized dataset for bench evals
* gather benchmarks from all ranks
* fix for gather across multiple gpus
2023-08-29 13:24:19 -07:00
Wing Lian
548787daae
customizable ascii art ( #506 )
2023-08-29 10:13:42 -07:00
Wing Lian
5ac3392075
support for datasets with multiple names ( #480 )
...
* support for datasets with multiple names
* update docs
2023-08-29 06:18:17 -07:00
Aman Gupta Karmani
e356b297cb
remove --force-reinstall from Dockerfile to ensure correct pytorch version ( #492 )
2023-08-29 06:17:51 -07:00
NanoCode012
48c56470d0
Fix(doc): Clarify no amp to full yaml docs ( #496 )
2023-08-29 06:17:37 -07:00
Maxime
36b2e1cfee
tweak: use default config file when only one file is present ( #501 )
2023-08-29 06:17:10 -07:00
Wing Lian
125cccb786
Refactor train cfg cli ( #499 )
...
* wip to cleanup cfg cli options
* fix launcher
* fix cli args
2023-08-29 05:37:53 -07:00
Aman Karmani
fd55bc87e2
use math.ceil instead of round /cc #498
2023-08-29 01:03:41 +00:00
Birch-san
8e197f6fb4
pad_to_worst_case_seq_len boolean, for testing memory limits ( #498 )
...
* pad_to_worst_case_seq_len boolean, for testing memory limits
* remove collator_pad_to_longest option since it does nothing
see docs: https://huggingface.co/docs/transformers/main_classes/data_collator#transformers.DataCollatorWithPadding.padding
True and "longest" mean the same thing
* rename to `pad_to_sequence_len, and ensure 64 alignment
---------
Co-authored-by: Aman Karmani <aman@tmm1.net >
2023-08-28 18:47:16 -04:00
Aman Karmani
267b7b24e5
simplify linear layer locator
2023-08-28 09:45:16 -04:00
Wing Lian
98bf76e236
fsdp requires params be the same type too ( #493 )
2023-08-28 04:33:50 -04:00
NanoCode012
4c37bd0b54
Fix(tokenizer): Make sure to add pad for CodeLlamaTokenizer ( #489 )
2023-08-28 09:39:10 +09:00
Aman Gupta Karmani
f144e98a32
Merge pull request #485 from maximegmd/patch-4
...
fix: finetune model inference needs the dtype fix to work with flash-attn
2023-08-27 16:27:47 -04:00
Aman Karmani
3a011ea1ef
fix condition and add logging
2023-08-27 20:09:26 +00:00
Aman Karmani
1f613e5aa7
Merge branch 'main' into patch-4
2023-08-27 19:57:34 +00:00
Aman Karmani
f319b0bc67
rename var and reformat
2023-08-27 19:55:11 +00:00
Maxime
7fd662dd89
Update src/axolotl/utils/models.py
...
Co-authored-by: Aman Gupta Karmani <aman@tmm1.net >
2023-08-27 21:01:43 +02:00
Maxime
9e699683d7
Update src/axolotl/utils/models.py
...
Co-authored-by: Aman Gupta Karmani <aman@tmm1.net >
2023-08-27 21:01:37 +02:00
mhenrichsen
35130711d6
Feat(cfg): Add code-llama configs for all sizes ( #479 )
...
* configs for all sizes
* update tokenizer type
---------
Co-authored-by: mhenrichsen <some_email@hey.com >
2023-08-27 10:20:17 +09:00
mhenrichsen
3fc9006298
Feat(deepspeed): Add zero2 config ( #476 )
...
* zero2 config
* config added
* linting
---------
Co-authored-by: mhenrichsen <some_email@hey.com >
2023-08-27 10:10:33 +09:00
NanoCode012
ad8be435ad
Feat(doc): Update eval_steps doc ( #487 )
2023-08-27 10:09:09 +09:00
Charles O. Goddard
fe4d6baf92
Add example Llama 2 ReLoRA config ( #471 )
...
* Add example Llama 2 ReLoRA config
* Use adamw_bnb_8bit in example relora config
2023-08-27 10:08:34 +09:00
Aman Gupta Karmani
f31301063d
Merge pull request #486 from OpenAccess-AI-Collective/adam-bnb-simpler
...
let transformers handle adamw_bnb_8bit
2023-08-26 20:44:19 -04:00
Aman Karmani
868530c39c
let transformers handle adamw_bnb_8bit
2023-08-26 21:40:12 +00:00
Maxime
d03887fad5
ignore: address pr review
2023-08-26 22:45:45 +02:00
Maxime
17605b85d8
fix: inference did not move the model to the correct device ( #483 )
2023-08-26 16:40:56 -04:00
Maxime
a184549e4c
ignore: linter
2023-08-26 22:36:14 +02:00
Maxime
f311df9462
fix: finetune model inference needs the dtype fix to work with flash-attn
2023-08-26 22:34:11 +02:00
Maxime
c500d02517
Fix missing 'packaging' wheel ( #482 )
2023-08-26 12:02:15 -04:00
Wing Lian
31f3e71764
fix checkpints on multigpu ( #481 )
2023-08-26 12:00:03 -04:00
Aman Gupta Karmani
56c4a94caf
Merge pull request #484 from OpenAccess-AI-Collective/reqs
...
allow newer deps in requirements.txt
2023-08-26 11:13:41 -04:00
Aman Karmani
c29117a0d7
allow newer deps
2023-08-26 15:06:05 +00:00