Wing Lian
75e4fc2825
wip more tp fixes
2023-11-01 01:45:36 -04:00
Wing Lian
e13c2fd6b1
getting better
2023-10-31 22:23:40 -04:00
Wing Lian
8a21e14a21
load to cpu first
2023-10-31 22:23:15 -04:00
Wing Lian
9c52a83403
load model faster w low_cpu_mem_usage
2023-10-31 22:23:15 -04:00
Wing Lian
fb8ee37ca6
wip tp
2023-10-31 22:23:14 -04:00
Wing Lian
65f3a4f703
tensor-parallel support
2023-10-31 22:21:40 -04:00
NanoCode012
10388a8daf
fix(tokenizer): update log order after update ( #806 )
2023-10-31 13:21:20 +09:00
NanoCode012
637ed095a0
fix(config): Set eos/bos to tokenizer if different ( #801 )
...
* fix(config): Set eos/bos to tokenizer if different
* chore: fix lint
2023-10-29 21:32:37 +09:00
Wing Lian
827ec3d274
refactor neft patch to be more re-usable similar to trl's impl ( #796 )
2023-10-29 04:33:13 -04:00
NanoCode012
11d1d607db
chore: refactor truthy check and fix mypy ( #780 )
2023-10-24 12:28:40 +09:00
Casper
15d3a654bf
Implement fused modules ( #747 )
...
* MLP: Memory saving
* Remove RMSNorm restrictions
* Map packed weights to original
* FusedAttention module
* Simplify code
* Move fused modules
* Fix critical typo
* Split inplace
* Add FFT config
* Add validation of fused arguments
* Add fused arguments to config
* Update docs
* Fix validation logic
* Add fused modules to flash attn
* Only fuse during training
* Remove timing
* Formatting
* Formatting
* Formatting
* chore: lint
* chore: lint
* add e2e tests for fused llama
* no lora for tests
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-21 16:08:25 -04:00
NanoCode012
440c3ab527
Fix(model): Linear detected and added to target module with rope linear ( #738 )
...
* Fix(model): Linear detected and added to target module with rope linear
* fix: exclude layer instead
2023-10-18 22:13:20 -04:00
Maxime
3bd9528390
add noisy embedding ( #721 )
...
* add noisy embedding
* fix format
* Update README.md
* Update README.md
* linter issues
* caseus fixes
---------
Co-authored-by: Maxime <maxime@nope.no >
2023-10-13 10:00:42 -04:00
NanoCode012
669f1d052c
Fix: Higher vram usage for mistral and sample_packing ( #691 )
...
* Fix: Higher vram usage for mistral and sample_packing
* chore: update comment
* chore: lint
2023-10-06 12:33:43 -04:00
Wing Lian
2d60ba3a6e
flash_attention + sample packing for stablelm 3b ( #671 )
...
* stablelm epoch fa patch
* is causal for fa
* working stablelm fa w packing
* chore: pre-commit linting
2023-10-05 16:03:43 -04:00
NanoCode012
eb480dfd68
Fix: ValueError when FA + Mistral when padding_side=right ( #681 )
...
* Fix: ValueError when FA + Mistral when padding_side=right
* fix: remove tokenizer class check
2023-10-06 04:12:54 +09:00
NanoCode012
e0b7eeabfd
Fix(tokenizer): Set rstrip,lstrip,norm to False ( #678 )
2023-10-06 03:50:49 +09:00
NanoCode012
e62d5901b5
chore: Clean up repetitive model kwargs ( #670 )
2023-10-04 20:41:26 +09:00
NanoCode012
697c50d408
Feat: Allow usage of native Mistral FA when no sample_packing ( #669 )
...
* Allow usage of native Mistral FA when no sample_packing
* fix: do not apply custom patch when sample_pack off
* chore: lint
* chore: pin transformer to v4.35.0.dev0
* fix: split sample_packing to separate test
2023-10-04 20:40:47 +09:00
Wing Lian
f34648c8b9
remove patch fix for phi ( #664 )
2023-10-02 21:07:41 -04:00
Wing Lian
b6ab8aad62
Mistral flash attn packing ( #646 )
...
* add mistral monkeypatch
* add arg for decoder attention masl
* fix lint for duplicate code
* make sure to update transformers too
* tweak install for e2e
* move mistral patch to conditional
2023-09-27 18:41:00 -04:00
Wing Lian
895f0a0723
skip some flash attn patches unless explicitly enabled ( #643 )
...
* skip some flash attn patches if explicitly disabled
* make the other patches optional
2023-09-27 12:11:07 -04:00
NanoCode012
19a600a8b8
Feat: Add support for upstream FA2 ( #626 )
...
* Feat: Add support for upstream FA2
* chore: add is_falcon_derived_model: true to examples
* chore: add config to readme for documentation
* feat: add extra model types
* fix: remove old falcon flash patch
* chore: pin transformers and accelerate
2023-09-26 09:53:28 -04:00
Wing Lian
03e59077a0
misc fixes to add gptq tests ( #621 )
...
* misc fixes to add gptq tests
* set bf16 needed for fa2
2023-09-21 21:52:12 -04:00
Wing Lian
faecff9798
support to disable exllama for gptq ( #604 )
...
* support to disable exllama for gptq
* update property instead of item
* fix config key
2023-09-19 17:51:08 -04:00
bofeng huang
aa656e04bd
Delete duplicate lines ( #606 )
2023-09-19 16:40:05 -04:00
Wing Lian
6b9b229356
btlm and falcon monkey patches for flash attn ( #566 )
2023-09-17 13:49:18 -04:00
Wing Lian
62eaee7649
make phi training work with Loras ( #588 )
...
* valdiation for phi loras
* fix model config class check
* update readme for phi traiing
2023-09-15 20:51:55 -04:00
Wing Lian
360788296a
don't resize embeddings if it's already large enough ( #577 )
...
* don't resize embeddings if it's already large enough
* make sure to tie weights, even if we aren't resizing
2023-09-15 15:47:09 -04:00
Wing Lian
12a2dbbc2c
Support Sample packing for phi arch ( #586 )
...
* phi sequence packing
* sample packing fixes
* fix linting
* fix inference and phi e2e tests
* update phi example now that sample packing works
* wandb import keeps getting moved around
2023-09-15 15:46:54 -04:00
Glavin Wiechert
5b67ea98a6
Add training callback to send predictions to WandB table ( #521 )
...
* WIP Add training callback to send predictions to WandB table
* WIP improve wandb table reporting callback
* WIP improve wandb table reporting callback (cont)
* Add VSCode launching for debugging
* Add tiny llama example
* WIP attempt to improve post-eval prediction generation for table
* WIP attempt to improve post-eval prediction generation for table - part 2
* WIP batch generation
* WIP attempt to handle sample_packing using position_ids for wandb prediction table
* WIP add code for debugging
* Fix sample_packing support for wandb prediction table
* Clean up code for PR review
* Add eval_table_size, eval_table_max_new_tokens configs & clean up code
* Clean up PR, delete VSCode config, add tiny-llama example
* Add eval_table_size, eval_table_max_new_tokens documentation. Fix linting/formatting
2023-09-13 09:51:08 -04:00
Wing Lian
a94f9cb99e
fix for quant config from model ( #540 )
2023-09-10 12:40:52 -04:00
Wing Lian
3355706e22
Add support for GPTQ using native transformers/peft ( #468 )
...
* auto gptq support
* more tweaks and add yml
* remove old gptq docker
* don't need explicit peft install for tests
* fix setup.py to use extra index url
install torch for tests
fix cuda version for autogptq index
set torch in requirements so that it installs properly
move gptq install around to work with github cicd
* gptq doesn't play well with sample packing
* address pr feedback
* remove torch install for now
* set quantization_config from model config
* Fix the implementation for getting quant config from model config
2023-09-05 12:43:22 -04:00
Maxime
1991946c5a
fix: bad dtype for full finetune ( #504 )
...
* fix: bad dtype for full finetune
* Update src/axolotl/utils/models.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* Update models.py
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-09-01 07:11:45 -07:00
Wing Lian
125cccb786
Refactor train cfg cli ( #499 )
...
* wip to cleanup cfg cli options
* fix launcher
* fix cli args
2023-08-29 05:37:53 -07:00
Aman Karmani
267b7b24e5
simplify linear layer locator
2023-08-28 09:45:16 -04:00
Wing Lian
98bf76e236
fsdp requires params be the same type too ( #493 )
2023-08-28 04:33:50 -04:00
NanoCode012
4c37bd0b54
Fix(tokenizer): Make sure to add pad for CodeLlamaTokenizer ( #489 )
2023-08-28 09:39:10 +09:00
Aman Karmani
3a011ea1ef
fix condition and add logging
2023-08-27 20:09:26 +00:00
Aman Karmani
f319b0bc67
rename var and reformat
2023-08-27 19:55:11 +00:00
Maxime
7fd662dd89
Update src/axolotl/utils/models.py
...
Co-authored-by: Aman Gupta Karmani <aman@tmm1.net >
2023-08-27 21:01:43 +02:00
Maxime
9e699683d7
Update src/axolotl/utils/models.py
...
Co-authored-by: Aman Gupta Karmani <aman@tmm1.net >
2023-08-27 21:01:37 +02:00
Maxime
d03887fad5
ignore: address pr review
2023-08-26 22:45:45 +02:00
Maxime
a184549e4c
ignore: linter
2023-08-26 22:36:14 +02:00
Maxime
f311df9462
fix: finetune model inference needs the dtype fix to work with flash-attn
2023-08-26 22:34:11 +02:00
Wing Lian
0b7ba57ec4
fix types w lora ( #478 )
2023-08-25 02:03:24 -04:00
NanoCode012
71bd06243c
Fix(tokenizer): Fix condition to add pad token ( #477 )
...
* Fix(tokenizer): Fix condition to add pad token
* chore: fix lint
2023-08-25 14:30:50 +09:00
Wing Lian
cb9797ef5a
improve llama pad token handling ( #475 )
...
* improve llama pad token handling
* tweak logic to not clobber
2023-08-24 13:20:35 -04:00
Wing Lian
96deb6bd67
recast loralayer, norm, lmhead + embed token weights per original qlora ( #393 )
...
* recast loralayer, norm, lmhead + embed token weights per original qlora
* try again for the fix
* refactor torch dtype picking
* linter fixes
* missing import for LoraLayer
* fix install for tests now that peft is involved
2023-08-21 18:41:12 -04:00
Wing Lian
ee262818ef
fix evals ( #447 )
2023-08-20 23:39:42 -04:00