Wing Lian
ca476d7f8e
don't load the actual model when pre-loading to load modeling code
2023-09-20 13:37:32 -04:00
Wing Lian
faecff9798
support to disable exllama for gptq ( #604 )
...
* support to disable exllama for gptq
* update property instead of item
* fix config key
2023-09-19 17:51:08 -04:00
bofeng huang
aa656e04bd
Delete duplicate lines ( #606 )
2023-09-19 16:40:05 -04:00
Wing Lian
1eebbd09c3
improve handling for empty text on the tokenization step ( #502 )
2023-09-19 08:09:56 -04:00
Wing Lian
62a774140b
Fix for check with cfg and merge_lora ( #600 )
2023-09-18 21:14:32 -04:00
Wing Lian
31b9e0c6e8
minor tweaks to simplify ( #597 )
2023-09-18 11:45:44 -04:00
Wing Lian
6b9b229356
btlm and falcon monkey patches for flash attn ( #566 )
2023-09-17 13:49:18 -04:00
Wing Lian
131afdbd89
add bf16 check ( #587 )
2023-09-17 13:49:03 -04:00
NanoCode012
00dce35fb2
Feat(data): Allow loading local csv and text ( #594 )
...
* Feat(data): Allow loading local csv and text
* chore: update readme for loading data
2023-09-17 11:32:27 -04:00
Wing Lian
b15b19eb8d
gather/broadcast the max value of the packing efficiency automatically ( #463 )
2023-09-17 11:08:18 -04:00
Wing Lian
ab534d75ba
don't add position_ids for evals ( #591 )
2023-09-16 16:11:57 -04:00
Wing Lian
21ec195c9f
optionally configure sample packing for evals ( #589 )
2023-09-16 00:09:48 -04:00
Wing Lian
62eaee7649
make phi training work with Loras ( #588 )
...
* valdiation for phi loras
* fix model config class check
* update readme for phi traiing
2023-09-15 20:51:55 -04:00
Jan Philipp Harries
be75668400
set fsdp state dict ( #584 )
...
Co-authored-by: Jan Philipp Harries <jphme@users.noreply.github.com >
2023-09-15 17:47:36 -04:00
Wing Lian
aeec7c4688
pop block_cls since it's not an actual kwarg
2023-09-15 15:54:06 -04:00
Wing Lian
360788296a
don't resize embeddings if it's already large enough ( #577 )
...
* don't resize embeddings if it's already large enough
* make sure to tie weights, even if we aren't resizing
2023-09-15 15:47:09 -04:00
Wing Lian
12a2dbbc2c
Support Sample packing for phi arch ( #586 )
...
* phi sequence packing
* sample packing fixes
* fix linting
* fix inference and phi e2e tests
* update phi example now that sample packing works
* wandb import keeps getting moved around
2023-09-15 15:46:54 -04:00
Wing Lian
f7a22632d7
support custom field for completion from yml ( #580 )
...
* support custom field for completion from yml
* remove legacy completion check and add doc
* update README docs
2023-09-15 07:48:21 -04:00
Wing Lian
8dcd40ac78
prevent cli functions from getting fired on import ( #581 )
2023-09-15 04:03:32 -04:00
Wing Lian
861cecac2a
refactor scripts/finetune.py into new cli modules ( #550 )
...
* refactor scripts/finetune.py into new cli modules
* continue to support scripts/finetune.py
* update readme with updated cli commands
* Update scripts/finetune.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2023-09-15 01:43:52 -04:00
Wing Lian
24146733db
E2e device cuda ( #575 )
...
* use torch.cuda.current_device() instead of local_rank
* ignore NVML errors for gpu stats
* llama lora packing e2e tests
2023-09-14 22:49:27 -04:00
Wing Lian
c6d870b91d
mypy wandb ignore ( #572 )
...
* mypy wandb ignore
* fix isort for wandb
2023-09-14 11:17:30 -04:00
Wing Lian
115795079d
remove columns after tokenizing for pretraining ( #571 )
2023-09-14 11:08:22 -04:00
Wing Lian
3fbde762ab
fix save_steps so it doesn't get duplicated ( #567 )
2023-09-13 20:40:33 -04:00
Wing Lian
f6060a664e
Model parallel ( #538 )
...
* model-parallel for single process
* fix device/device_map
* fix handling for device
2023-09-13 11:45:30 -04:00
Wing Lian
a4e1bb6606
let hf trainer handle torch compile ( #516 )
...
* let hf trainer handle torch compile
* remove torch compile checks, include option for backend
* suppress torch errors to get further
* require min torch version of 2.1.0 for torch compile to work
---------
Co-authored-by: Aman Karmani <aman@tmm1.net >
2023-09-13 11:42:12 -04:00
Wing Lian
36e53c7442
improve how we setup eval/save strategies and steps ( #547 )
...
* setup save end eval strategies to be consistent with trainer logic
* add comments
* better eval handling
2023-09-13 11:37:23 -04:00
Wing Lian
e7aa7b1a1e
gracefully handle length feature used for group by ( #565 )
2023-09-13 11:23:30 -04:00
Wing Lian
e5bb22a56b
add optimization for group-by-len ( #563 )
2023-09-13 10:57:12 -04:00
Wing Lian
bf0804447c
fix wandb so mypy doesn't complain ( #562 )
...
* fix wandb so mypy doesn't complain
* fix wandb so mypy doesn't complain
* no need for mypy override anymore
2023-09-13 10:36:16 -04:00
Glavin Wiechert
5b67ea98a6
Add training callback to send predictions to WandB table ( #521 )
...
* WIP Add training callback to send predictions to WandB table
* WIP improve wandb table reporting callback
* WIP improve wandb table reporting callback (cont)
* Add VSCode launching for debugging
* Add tiny llama example
* WIP attempt to improve post-eval prediction generation for table
* WIP attempt to improve post-eval prediction generation for table - part 2
* WIP batch generation
* WIP attempt to handle sample_packing using position_ids for wandb prediction table
* WIP add code for debugging
* Fix sample_packing support for wandb prediction table
* Clean up code for PR review
* Add eval_table_size, eval_table_max_new_tokens configs & clean up code
* Clean up PR, delete VSCode config, add tiny-llama example
* Add eval_table_size, eval_table_max_new_tokens documentation. Fix linting/formatting
2023-09-13 09:51:08 -04:00
Jan Philipp Harries
2f586d18db
Fix pretraining with iterable/streaming Dataset ( #556 )
...
* return without packing prep/len
* fix remove columns
* fix encode arguments
* add error when max steps not set
* fix test
---------
Co-authored-by: Jan Philipp Harries <jphme@users.noreply.github.com >
2023-09-13 00:16:40 -04:00
Wing Lian
a94f9cb99e
fix for quant config from model ( #540 )
2023-09-10 12:40:52 -04:00
Wing Lian
0b4cf5bc8c
workaround for md5 variations ( #533 )
...
* workaround for md5 variations
* refactor the prepared hash too
2023-09-08 16:01:05 -04:00
Wing Lian
e30f1e3cf7
Early stopping metric ( #537 )
...
* set early stopping metric to check
* tweak how load_best_model_at_end gets set for early stopping
* add validation for earl;y stopping patience
* remove negation
* save results to metrics in callback
* move early stopping callback after the benchmark evals
* broadcast metrics so early stopping works
2023-09-08 11:57:02 -04:00
Wing Lian
343714972b
recommend padding when using sample packing ( #531 )
2023-09-06 17:00:21 -04:00
Wing Lian
245c5c41e2
log rank too ( #527 )
2023-09-06 08:37:51 -04:00
Wing Lian
a546ca2813
misc fixes/improvements ( #513 )
...
fix per pr feedback
2023-09-05 16:40:13 -04:00
Wing Lian
3355706e22
Add support for GPTQ using native transformers/peft ( #468 )
...
* auto gptq support
* more tweaks and add yml
* remove old gptq docker
* don't need explicit peft install for tests
* fix setup.py to use extra index url
install torch for tests
fix cuda version for autogptq index
set torch in requirements so that it installs properly
move gptq install around to work with github cicd
* gptq doesn't play well with sample packing
* address pr feedback
* remove torch install for now
* set quantization_config from model config
* Fix the implementation for getting quant config from model config
2023-09-05 12:43:22 -04:00
mhenrichsen
daa4faca12
Merge pull request #520 from bdashore3/sharegpt-fixes
...
Allow for custom system prompts with ShareGPT
2023-09-05 09:02:55 +02:00
Aman Karmani
fc8766e502
reorg a bit
2023-09-05 02:21:24 +00:00
Aman Gupta Karmani
72a6fe1c1f
use flash_attn rmsnorm when available ( #526 )
...
* use flash_attn xentropy when available
* use flash_attn.ops.rms_norm when available
* log when xentropy is not found
* log how to install RMSNorm
* add quotes so pip install works
2023-09-04 19:44:51 -04:00
Aman Gupta Karmani
5fe30b1497
use flash_attn xentropy when available ( #525 )
...
* use flash_attn xentropy when available
* log when xentropy is not found
2023-09-04 17:49:16 -04:00
Aman Gupta Karmani
44454ae4c4
move is_llama_derived_model into normalize_config ( #524 )
2023-09-04 00:19:03 -04:00
Wing Lian
09f154397e
No gather single gpu ( #523 )
...
* don't attempt to gather on multi-gpu
* also check distributed status in bench callback
2023-09-03 23:24:28 -04:00
kingbri
995557bdf3
Prompters: ShareGPT: Allow for custom system prompts
...
If a system prompt is present in a conversation, add it instead of
using the default.
Signed-off-by: kingbri <bdashore3@proton.me >
2023-09-01 13:53:05 -04:00
Maxime
1991946c5a
fix: bad dtype for full finetune ( #504 )
...
* fix: bad dtype for full finetune
* Update src/axolotl/utils/models.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* Update models.py
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-09-01 07:11:45 -07:00
Wing Lian
7710e81f50
log supervised token count ( #448 )
2023-08-31 15:45:23 -07:00
Tom Jobbins
48434bec54
Debug tokenization output: Add ability to output text only (no tokens), and/or specify num samples to see ( #511 )
2023-08-31 14:26:52 -07:00
Jan Philipp Harries
396a7a74fc
Added advanced DDP args ( #515 )
...
* add ddp_config
* add advanced ddp config
* add ddp_config
* add advanced ddp config
---------
Co-authored-by: Jan Philipp Harries <jphme@users.noreply.github.com >
2023-08-31 10:37:47 -07:00