Commit Graph

894 Commits

Author SHA1 Message Date
Wing Lian
bf0804447c fix wandb so mypy doesn't complain (#562)
* fix wandb so mypy doesn't complain

* fix wandb so mypy doesn't complain

* no need for mypy override anymore
2023-09-13 10:36:16 -04:00
Glavin Wiechert
5b67ea98a6 Add training callback to send predictions to WandB table (#521)
* WIP Add training callback to send predictions to WandB table

* WIP improve wandb table reporting callback

* WIP improve wandb table reporting callback (cont)

* Add VSCode launching for debugging

* Add tiny llama example

* WIP attempt to improve post-eval prediction generation for table

* WIP attempt to improve post-eval prediction generation for table - part 2

* WIP batch generation

* WIP attempt to handle sample_packing using position_ids for wandb prediction table

* WIP add code for debugging

* Fix sample_packing support for wandb prediction table

* Clean up code for PR review

* Add eval_table_size, eval_table_max_new_tokens configs & clean up code

* Clean up PR, delete VSCode config, add tiny-llama example

* Add eval_table_size, eval_table_max_new_tokens documentation. Fix linting/formatting
2023-09-13 09:51:08 -04:00
Jan Philipp Harries
2f586d18db Fix pretraining with iterable/streaming Dataset (#556)
* return without packing prep/len

* fix remove columns

* fix encode arguments

* add error when max steps not set

* fix test

---------

Co-authored-by: Jan Philipp Harries <jphme@users.noreply.github.com>
2023-09-13 00:16:40 -04:00
Wing Lian
9845c5e12d document that packaging needs to be installed before flash-attn (#559) 2023-09-12 12:18:30 -04:00
Wing Lian
772cd870d4 fix the sed command to replace the version w the tag
Some checks failed
pre-commit / pre-commit (push) Has been cancelled
publish pypi / Upload release to PyPI (push) Has been cancelled
PyTest / test (3.10) (push) Has been cancelled
PyTest / test (3.9) (push) Has been cancelled
v0.3.0
2023-09-11 13:44:19 -04:00
Wing Lian
6c5fbe6223 add long_description for pypi push (#555) 2023-09-11 13:34:29 -04:00
Wing Lian
bcbc9597e9 replace tags, build dist for pypi publish (#553)
* replace tags, build dist for pypi publish

* missing trailing comma
2023-09-11 13:25:41 -04:00
The Objective Dad
6d57f2f0f0 ergonomic update to optimizer config doc (#548) 2023-09-11 12:35:45 -04:00
Wing Lian
20ed4c1f9e pypi on tag push (#552) 2023-09-11 10:33:42 -04:00
Wing Lian
c5dedb17ad remove with section, doesn't seem to work (#551) 2023-09-11 10:27:17 -04:00
Wing Lian
b56503d423 publish to pypi workflow on tagged release (#549) 2023-09-11 09:44:47 -04:00
Wing Lian
a94f9cb99e fix for quant config from model (#540) 2023-09-10 12:40:52 -04:00
dongxiaolong
c1921c9acb Update requirements.txt (#543)
fix fsdp
2023-09-08 16:07:11 -04:00
Wing Lian
0b4cf5bc8c workaround for md5 variations (#533)
* workaround for md5 variations

* refactor the prepared hash too
2023-09-08 16:01:05 -04:00
SlapDrone
78ee2cdab2 add git environment variables to compose: avoid checkout failure error 128 on build (#534) 2023-09-08 15:59:49 -04:00
Wing Lian
34c0a86a11 update readme to point to direct link to runpod template, cleanup install instrucitons (#532)
* update readme to point to direct link to runpod template, cleanup install instrucitons

* default install flash-attn and auto-gptq now too

* update readme w flash-attn extra

* fix version in setup
2023-09-08 11:58:54 -04:00
The Objective Dad
5e2d8a42d9 Adding NCCL Timeout Guide (#536)
* fixes NCCL_P2P_LEVEL=NVL #429

* adding more insights into verious values of NCCL_P2P_LEVEL
2023-09-08 11:57:47 -04:00
Wing Lian
e30f1e3cf7 Early stopping metric (#537)
* set early stopping metric to check

* tweak how load_best_model_at_end gets set for early stopping

* add validation for earl;y stopping patience

* remove negation

* save results to metrics in callback

* move early stopping callback after the benchmark evals

* broadcast metrics so early stopping works
2023-09-08 11:57:02 -04:00
Wing Lian
343714972b recommend padding when using sample packing (#531) 2023-09-06 17:00:21 -04:00
Wing Lian
245c5c41e2 log rank too (#527) 2023-09-06 08:37:51 -04:00
Wing Lian
a546ca2813 misc fixes/improvements (#513)
fix per pr feedback
2023-09-05 16:40:13 -04:00
Wing Lian
3355706e22 Add support for GPTQ using native transformers/peft (#468)
* auto gptq support

* more tweaks and add yml

* remove old gptq docker

* don't need explicit peft install for tests

* fix setup.py to use extra index url

install torch for tests
fix cuda version for autogptq index
set torch in requirements so that it installs properly
move gptq install around to work with github cicd

* gptq doesn't play well with sample packing

* address pr feedback

* remove torch install for now

* set quantization_config from model config

* Fix the implementation for getting quant config from model config
2023-09-05 12:43:22 -04:00
mhenrichsen
daa4faca12 Merge pull request #520 from bdashore3/sharegpt-fixes
Allow for custom system prompts with ShareGPT
2023-09-05 09:02:55 +02:00
Aman Karmani
fc8766e502 reorg a bit 2023-09-05 02:21:24 +00:00
Aman Gupta Karmani
72a6fe1c1f use flash_attn rmsnorm when available (#526)
* use flash_attn xentropy when available

* use flash_attn.ops.rms_norm when available

* log when xentropy is not found

* log how to install RMSNorm

* add quotes so pip install works
2023-09-04 19:44:51 -04:00
Aman Gupta Karmani
5fe30b1497 use flash_attn xentropy when available (#525)
* use flash_attn xentropy when available

* log when xentropy is not found
2023-09-04 17:49:16 -04:00
Aman Gupta Karmani
44454ae4c4 move is_llama_derived_model into normalize_config (#524) 2023-09-04 00:19:03 -04:00
Wing Lian
09f154397e No gather single gpu (#523)
* don't attempt to gather on multi-gpu

* also check distributed status in bench callback
2023-09-03 23:24:28 -04:00
kingbri
995557bdf3 Prompters: ShareGPT: Allow for custom system prompts
If a system prompt is present in a conversation, add it instead of
using the default.

Signed-off-by: kingbri <bdashore3@proton.me>
2023-09-01 13:53:05 -04:00
Maxime
1991946c5a fix: bad dtype for full finetune (#504)
* fix: bad dtype for full finetune

* Update src/axolotl/utils/models.py

Co-authored-by: Wing Lian <wing.lian@gmail.com>

* Update models.py

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2023-09-01 07:11:45 -07:00
NanoCode012
f51c9c56c6 Fix(doc): Inform Windows users to use WSL/docker (#518) 2023-09-01 00:08:21 -07:00
Wing Lian
7710e81f50 log supervised token count (#448) 2023-08-31 15:45:23 -07:00
Tom Jobbins
48434bec54 Debug tokenization output: Add ability to output text only (no tokens), and/or specify num samples to see (#511) 2023-08-31 14:26:52 -07:00
Jan Philipp Harries
396a7a74fc Added advanced DDP args (#515)
* add ddp_config

* add advanced ddp config

* add ddp_config

* add advanced ddp config

---------

Co-authored-by: Jan Philipp Harries <jphme@users.noreply.github.com>
2023-08-31 10:37:47 -07:00
Wing Lian
b21e4a20fe split train from other cli options (#503) 2023-08-30 22:01:47 -07:00
Alpay Ariyak
42f9642792 Changed Bench Eval to report metrics correctly by split. Added total accuracy and renamed previously used bench_accuracy to bench_average_accuracy. (#512)
* Added "eval_" prefix

* Added total bench accuracy and renamed the previous one to bench_average_accuracy. Changed naming to use bench_split instead of always using eval_ prefix.
2023-08-30 22:00:50 -07:00
Wing Lian
c56b450cf5 drop empty tokenized rows too (#509) 2023-08-30 06:55:26 -07:00
Aman Gupta Karmani
1e07c162f1 set zero3 optimizer betas to auto so they inherit from HF trainer config (#507) 2023-08-30 08:10:33 -04:00
Wing Lian
76576323df add eval benchmark callback (#441)
* add mmlu callback

* use hf dataset for mmlu evals

* default to mmlu-zs

* make sure to define all the explicit positional args

* include metrics in callback

* another callback fix for collator max len attribute

* fix mmlu evals

* sample benchmarks, ensure we drop long samples

* fix the data file

* fix elif and add better messaging

* more fixes

* rename mmlu to bench

* more fixes

* dataset handling and aggregate across benchmark

* better handling when no subjects

* benchmark callback has its own dataloader and collator

* fixes

* updated dataset

* more fixes

* missing transformers import

* improve support for customized dataset for bench evals

* gather benchmarks from all ranks

* fix for gather across multiple gpus
2023-08-29 13:24:19 -07:00
Wing Lian
548787daae customizable ascii art (#506) 2023-08-29 10:13:42 -07:00
Wing Lian
5ac3392075 support for datasets with multiple names (#480)
* support for datasets with multiple names

* update docs
2023-08-29 06:18:17 -07:00
Aman Gupta Karmani
e356b297cb remove --force-reinstall from Dockerfile to ensure correct pytorch version (#492) 2023-08-29 06:17:51 -07:00
NanoCode012
48c56470d0 Fix(doc): Clarify no amp to full yaml docs (#496) 2023-08-29 06:17:37 -07:00
Maxime
36b2e1cfee tweak: use default config file when only one file is present (#501) 2023-08-29 06:17:10 -07:00
Wing Lian
125cccb786 Refactor train cfg cli (#499)
* wip to cleanup cfg cli options

* fix launcher

* fix cli args
2023-08-29 05:37:53 -07:00
Aman Karmani
fd55bc87e2 use math.ceil instead of round /cc #498 2023-08-29 01:03:41 +00:00
Birch-san
8e197f6fb4 pad_to_worst_case_seq_len boolean, for testing memory limits (#498)
* pad_to_worst_case_seq_len boolean, for testing memory limits

* remove collator_pad_to_longest option since it does nothing

see docs: https://huggingface.co/docs/transformers/main_classes/data_collator#transformers.DataCollatorWithPadding.padding

True and "longest" mean the same thing

* rename to `pad_to_sequence_len, and ensure 64 alignment

---------

Co-authored-by: Aman Karmani <aman@tmm1.net>
2023-08-28 18:47:16 -04:00
Aman Karmani
267b7b24e5 simplify linear layer locator 2023-08-28 09:45:16 -04:00
Wing Lian
98bf76e236 fsdp requires params be the same type too (#493) 2023-08-28 04:33:50 -04:00
NanoCode012
4c37bd0b54 Fix(tokenizer): Make sure to add pad for CodeLlamaTokenizer (#489) 2023-08-28 09:39:10 +09:00