Mads Henrichsen
c371d6b546
cpu offloading
2023-12-31 12:02:29 +01:00
Mads Henrichsen
d6273188f0
fft
2023-12-31 07:42:46 +01:00
Mads Henrichsen
72797b04a5
fix modules
2023-12-31 07:40:33 +01:00
Mads Henrichsen
de47bb5eb0
better lr
2023-12-30 22:36:50 +01:00
Mads Henrichsen
c04df54b4b
new lr
2023-12-30 21:36:01 +01:00
Mads Henrichsen
e3716db386
small batch size
2023-12-30 13:20:45 +01:00
Mads Henrichsen
97943d8fc4
model revision
2023-12-30 12:55:17 +01:00
Mads Henrichsen
9d3f80cd40
disable packing
2023-12-30 12:51:03 +01:00
Mads Henrichsen
bfae79a634
trust
2023-12-30 12:47:50 +01:00
Mads Henrichsen
5a85ee16eb
yayi2
2023-12-30 12:43:46 +01:00
Tazik Shahjahan
3678a6c41d
Fix: bf16 support for inference ( #981 )
...
* Fix: bf16 torch dtype
* simplify casting to device and dtype
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-12-29 16:15:53 -06:00
mhenrichsen
f8ae59b0a8
Adds chat templates ( #1022 )
2023-12-29 15:44:23 -06:00
Hamel Husain
4f4d638b84
[WandB] Push axolotl config to top level wandb files ( #1014 )
2023-12-29 10:52:12 -08:00
Wing Lian
ba043a361e
add ultrachat prompt strategies ( #996 )
2023-12-29 12:23:29 -06:00
NanoCode012
41353d2ea0
feat: expose bnb kwargs ( #1018 )
...
* feat: expose bnb kwargs
* chore: added examples and link per suggestion
* Uncomment defaults per suggestion for readability
Co-authored-by: Hamel Husain <hamel.husain@gmail.com >
---------
Co-authored-by: Hamel Husain <hamel.husain@gmail.com >
2023-12-29 18:16:26 +09:00
NanoCode012
f6ecf14dd4
feat: remove need to add load_in* during merge ( #1017 )
2023-12-29 18:15:30 +09:00
Hamel Husain
dec66d7c53
[Docs] Nit: Remind people to auth to wandb if they are going to use it ( #1013 )
2023-12-28 18:00:16 -08:00
Hamel Husain
76357dc5da
Update README.md ( #1012 )
2023-12-28 18:00:02 -08:00
Wing Lian
70b46ca4f4
remove landmark attn and xpos rope implementations ( #1010 )
2023-12-27 21:07:27 -08:00
Hamel Husain
85dd4d525b
add config to model card ( #1005 )
...
* add config to model card
* rm space
* apply black formatting
* apply black formatting
* fix formatting
* check for cfg attribute
* add version
* add version
* put the config in a collapsible element
* put the config in a collapsible element
2023-12-27 21:25:33 -06:00
Kevin Sydney
384b817dc0
Set eval_sample_packing to false in mistral config.yaml ( #1003 )
...
Without eval_sampling_packing set to false, ValueError occurs with eval dataset split is too small for sample_packing.
2023-12-27 16:11:55 -08:00
Younes Belkada
db9094df0f
FEAT: add tagging support to axolotl ( #1004 )
...
* add tagging support to axolotl
* chore: lint
* fix method w self
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-12-27 16:25:20 -06:00
Evan Griffiths
6ef46f8dca
Add an example config for finetuning a 34B model on a 24GB GPU ( #1000 )
...
* Add an example config for finetuning a 34B model on a 24GB GPU
* Remore wandb project
2023-12-25 10:29:55 -08:00
Wing Lian
628b754824
set output_router_logits for mixtral config: ( #995 )
2023-12-22 12:57:02 -05:00
Wing Lian
37820f6540
support for cuda 12.1 ( #989 )
2023-12-22 11:08:22 -05:00
NanoCode012
7d4185ffcb
chore: Update transformers to latest ( #986 )
2023-12-23 00:29:36 +09:00
mhenrichsen
93ebec1ac5
change val size ( #992 )
2023-12-22 16:18:16 +01:00
Hamel Husain
2e61dc3180
Add tests to Docker ( #993 )
2023-12-22 06:37:20 -08:00
NanoCode012
1ffa3866f2
Feat: Warns to add to modules_to_save when adding tokens or switching special_tokens ( #787 )
...
* Feat: Auto add to modules_to_save when adding tokens
* fix: swap to error instead of warning
* feat: add check when special_tokens differ and add test
2023-12-22 21:49:07 +09:00
Hamel Husain
62ba1609b6
bump actions versions
2023-12-21 08:54:08 -08:00
Hamel Husain
7bbaac98f7
fix mistral prompt assembly ( #982 )
...
* fix mistral prompts
* fix spacing
* remove elif
2023-12-21 08:00:55 -08:00
Wing Lian
161bcb6517
Dockerfile torch fix ( #987 )
...
* add torch to requirements.txt at build time to force version to stick
* fix xformers check
* better handling of xformers based on installed torch version
* fix for ci w/o torch
2023-12-21 09:38:20 -05:00
Ikko Eltociear Ashimine
d25c34caa6
Update README.md ( #966 )
2023-12-17 09:51:25 -05:00
NanoCode012
13e938149d
fix: add lr scheduler kwargs to Trainer ( #972 )
2023-12-17 18:48:28 +09:00
Wing Lian
85de004dd4
fix for build for nccl in dockerfile ( #970 )
2023-12-16 19:12:01 -05:00
Wing Lian
80ec7af358
update to latest nccl in docker image ( #965 )
2023-12-16 18:31:25 -05:00
dumpmemory
f28e75513b
update transformers to fix checkpoint saving ( #963 )
2023-12-15 21:03:17 -05:00
Hamel Husain
5ada140ff0
Fix prompt assembly for llama ( #952 )
...
* start at index 0
* add test to check for missing turns
* apply black
* Update test_prompt_tokenizers.py
* Update src/axolotl/monkeypatch/fastchat_conversation_turns.py
Co-authored-by: Motoki Wu <tokestermw@gmail.com >
* fix linting
* apply black
* add more tests for llama/sharegpt
* make logic clearer
---------
Co-authored-by: Motoki Wu <tokestermw@gmail.com >
2023-12-14 10:03:59 -08:00
Hamel Husain
712fd27b3f
Add docs ( #947 )
...
* move section
* update README
* update README
* update README
* update README
* update README
* Update README.md
Co-authored-by: Wing Lian <wing.lian@gmail.com >
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-12-13 14:22:52 -08:00
kallewoof
ef24342538
fix: switch to using the HuggingFace Transformers NEFT implementation ( #941 )
...
* fix: switch to using the HuggingFace Transformers NEFT implementation
* linter
* add support for noisy_embedding_alpha with a warning about it being renamed
* restore pre/posttrain_hooks
* move validation of NEFT noise alpha into validate_config()
* linter
2023-12-13 17:15:34 -05:00
Wing Lian
5ea3aa31f0
Fix Deepspeed loading ( #950 )
...
* add check for zero3
* freeze parameters
* fixes for deepspeed loading
* fix model parameter check
* unfrozen parameters in example mixtral and logging when unfreezing
2023-12-13 16:03:23 -05:00
Wing Lian
f1f60cb5b2
Flash attn hotfix ( #951 )
...
* use previous arg
* use eager to use legacy attention that can be patched
2023-12-13 13:42:23 -05:00
kallewoof
450e04d3c4
fix: remove excessive newlines in system prompt(s) for alpaca ( #936 )
2023-12-13 16:40:02 +09:00
Juraj Bednar
b0cf397ecb
More hints on what to do with CUDA Out of memory errors ( #925 )
2023-12-13 16:38:38 +09:00
Wing Lian
5f79b8242f
new evals_per_epoch and saves_per_epoch to make things cleaner ( #944 )
...
* new evals_per_epoch and saves_per_epoch to make things cleaner
* update per PR feedback
2023-12-12 15:35:23 -05:00
Hamel Husain
f1de29dd1e
Respect sequence_len in config for type: llama2_chat ( #926 )
...
* Respect sequence_len in config for `type: llama2_chat`
It was hardcoded to `4096` I am not sure why? This updates it to pull from the config.
cc: @winglian
* Update llama2_chat.py
* apply black formatting
* fix tokenizer
* update test data
* lint fixtures
2023-12-12 09:39:22 -08:00
Wing Lian
7fabc4d95e
Mixtral official ( #942 )
...
* multipack support for official mixtral implementation
* fix patch to load multipack for mixtral
* chore: lint
2023-12-11 23:44:33 -05:00
Motoki Wu
9a5eb3990c
Update requirements.txt ( #940 )
2023-12-11 22:57:28 -05:00
Casper
86487c2e96
Mixtral: More correct MoE, lower loss ( #932 )
...
* More correct MoE
* Fix formatting
2023-12-10 10:34:25 -05:00
Wing Lian
35f9b0f149
update to latest transformers for mixstral support ( #929 )
...
* update to latest transformers for mixstral support
* pin transformers
* fix typo
2023-12-10 10:32:27 -05:00