Wing Lian
998763bade
ia3 keeps casting to float32, handle it here for now
2023-10-18 22:17:38 -04:00
Wing Lian
c8e42a0f4f
fix load_in_8bit check
2023-10-18 22:17:38 -04:00
Wing Lian
1da328eb9a
prepare ia3 for 8bit
2023-10-18 22:17:38 -04:00
Wing Lian
2d7cccfc8e
add ia3 peft support
2023-10-18 22:17:38 -04:00
NanoCode012
440c3ab527
Fix(model): Linear detected and added to target module with rope linear ( #738 )
...
* Fix(model): Linear detected and added to target module with rope linear
* fix: exclude layer instead
2023-10-18 22:13:20 -04:00
Napuh
992d57f20a
catch ConnectionError when checking dataset from HuggingFace ( #743 )
2023-10-18 22:11:54 -04:00
mhenrichsen
91a016f410
badge ( #739 )
...
* badge
* fixed text
2023-10-18 10:21:34 -04:00
Casper
a045db0214
Mistral: Sliding Window Attention with Flash Attention and Sample Packing ( #732 )
...
* Implement Mistral FA + SWA + Sample Packing
* Handle unbroadcastable tensor
* chore: lint
* Simplify _prepare_decoder_attention_mask
* Uncomment window size
* Upgrade flash-attn to minimum of 2.3.0 to support SWA
* Add original condition to avoid error during inference
* chore: lint
* use torchscript to prevent oom
* chore: pylint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-16 15:13:46 -04:00
Casper
e1b214c62b
Clarify custom format example ( #729 )
...
* Clarify custom prompt format
* Simplify format
2023-10-14 09:28:12 -04:00
Wing Lian
3553172e3c
fixes for alpaca w chatml, and don't include attention_mask w mistral for flash attention ( #728 )
2023-10-14 09:27:07 -04:00
Wing Lian
7f2027d93f
tweak for xformers install w pytorch 2.1.0 ( #727 )
2023-10-13 15:21:17 -04:00
Wing Lian
8d288a2ad4
workaround for installing xformers w torch 2.1.0 ( #725 )
2023-10-13 11:19:30 -04:00
Wing Lian
f30afe4544
misc sharegpt fixes ( #723 )
...
* support for sharegpt with assistant talking first, better masking of assistant token, allow remap of roles from dataset
* invalid role is actually not possible
* update tokenized fixture for corrected labels
2023-10-13 11:04:39 -04:00
Wing Lian
bfbdba8614
pin xformers >= 0.0.22 ( #724 )
2023-10-13 10:27:56 -04:00
Maxime
3bd9528390
add noisy embedding ( #721 )
...
* add noisy embedding
* fix format
* Update README.md
* Update README.md
* linter issues
* caseus fixes
---------
Co-authored-by: Maxime <maxime@nope.no >
2023-10-13 10:00:42 -04:00
Wing Lian
2aa1f71464
fix pytorch 2.1.0 build, add multipack docs ( #722 )
2023-10-13 08:57:28 -04:00
Wing Lian
1c412c7e9d
improve handling of the prepared ds path and other cfg defaults ( #701 )
2023-10-13 07:46:07 -04:00
Jan Philipp Harries
490923fb78
Save Axolotl config as WandB artifact ( #716 )
2023-10-11 07:28:12 -04:00
NanoCode012
5855dded3d
fix(doc): update default doc according to arg ( #714 )
2023-10-10 21:51:56 +09:00
atgctg
ace70b33c6
Fix: lowercase True values in config ( #713 )
...
* Fix: lowercase `True` values in config
* Fix: lowercase `True` values in config
2023-10-10 21:32:20 +09:00
NanoCode012
11c48c5e03
fix(doc): Add note on inference w sample packing ( #712 )
2023-10-10 21:08:17 +09:00
lukemarsden
295b2662e1
Get qlora mistral-7b fine tuning working on a single 4090 ( #708 )
2023-10-10 15:14:23 +09:00
seungduk.kim.2304
77c84e02fd
Update README with some explanations ( #700 )
...
* Update README with some explanations
* revert commit-hook change
* add more explanation about batch size and gradient accum
* not use latex foromat
* decorate
* git hook again
* Attach a link that explains about LoRA hyperparameters
* update table of content
* Explanation about lora_modules_to_save
2023-10-08 13:37:54 -04:00
mhenrichsen
f91db198f3
fix unneeded space ( #699 )
2023-10-07 14:19:25 -04:00
Wing Lian
7f2618b5f4
add docker images for pytorch 2.10 ( #697 )
2023-10-07 12:23:31 -04:00
Wing Lian
aca0398315
apex not needed as amp is part of pytorch ( #696 )
2023-10-07 12:20:45 -04:00
mhenrichsen
29b8f46aed
Merge pull request #693 from OpenAccess-AI-Collective/update-mistral-example
...
update mistral lr, sample pack
2023-10-07 11:04:58 +02:00
mhenrichsen
83a950bb87
lint
2023-10-07 11:04:35 +02:00
Wing Lian
de87ea68f6
fix multiline for docker ( #694 )
2023-10-06 22:38:15 -04:00
mhenrichsen
4c8ddf2c6f
new lr, sample pack
2023-10-06 22:58:13 +02:00
NanoCode012
669f1d052c
Fix: Higher vram usage for mistral and sample_packing ( #691 )
...
* Fix: Higher vram usage for mistral and sample_packing
* chore: update comment
* chore: lint
2023-10-06 12:33:43 -04:00
Abhishek Mishra
d4a88e4eca
Adding qlora config for Mistral ( #675 )
...
* Adding qlora config for Mistral
Contains fix for Mistral FA issue - ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Mistral. Make sure to call tokenizer.padding_side = 'left' before tokenizing the input.
Fix for now is to set sample_packing: true and pad_to_sequence_len: true
* Renamed to qlora.yml
2023-10-06 21:05:56 +09:00
Wing Lian
2d60ba3a6e
flash_attention + sample packing for stablelm 3b ( #671 )
...
* stablelm epoch fa patch
* is causal for fa
* working stablelm fa w packing
* chore: pre-commit linting
2023-10-05 16:03:43 -04:00
NanoCode012
eb480dfd68
Fix: ValueError when FA + Mistral when padding_side=right ( #681 )
...
* Fix: ValueError when FA + Mistral when padding_side=right
* fix: remove tokenizer class check
2023-10-06 04:12:54 +09:00
NanoCode012
133e676bcc
Feat: Set WORKDIR to /workspace/axolotl ( #679 )
2023-10-06 04:09:14 +09:00
NanoCode012
69fac9a020
Fix: Future deprecation warning with use_auth_token ( #680 )
2023-10-06 03:56:18 +09:00
NanoCode012
e0b7eeabfd
Fix(tokenizer): Set rstrip,lstrip,norm to False ( #678 )
2023-10-06 03:50:49 +09:00
NanoCode012
43856c0a39
Fix(version): Update FA to work with Mistral SWA ( #673 )
2023-10-04 21:32:19 +09:00
NanoCode012
e62d5901b5
chore: Clean up repetitive model kwargs ( #670 )
2023-10-04 20:41:26 +09:00
NanoCode012
697c50d408
Feat: Allow usage of native Mistral FA when no sample_packing ( #669 )
...
* Allow usage of native Mistral FA when no sample_packing
* fix: do not apply custom patch when sample_pack off
* chore: lint
* chore: pin transformer to v4.35.0.dev0
* fix: split sample_packing to separate test
2023-10-04 20:40:47 +09:00
NanoCode012
90e0d673f7
Feat: Add config yaml to section for reprod in bug-report.yaml ( #667 )
...
* Update bug-report.yaml
* Update bug-report.yaml
* Update bug-report.yaml
2023-10-03 23:38:42 +09:00
Wing Lian
2642caedf2
refactor to set eval_batch_size earlier if unset, so we can warn if mismatched ( #662 )
2023-10-02 21:08:07 -04:00
Wing Lian
f34648c8b9
remove patch fix for phi ( #664 )
2023-10-02 21:07:41 -04:00
Wing Lian
e50a64e85e
prepared dataset caching, other misc fixes ( #665 )
...
* prepared dataset caching, other misc fixes
* also don't load from disk cache unless explicit
2023-10-02 21:07:24 -04:00
Wing Lian
f4868d733c
make sure we also run CI tests when requirements.txt changes ( #663 )
2023-10-02 08:43:40 -04:00
Napuh
a7e56d83c2
removed duplicate on requirements.txt ( #661 )
2023-10-02 08:40:05 -04:00
Wing Lian
5b0bc48fbc
add mistral e2e tests ( #649 )
...
* mistral e2e tests
* make sure to enable flash attention for the e2e tests
* use latest transformers full sha
* uninstall first
2023-09-29 00:22:40 -04:00
Kyle Corbitt
9ec20777ba
Make dataset_processes configurable ( #651 )
...
I'm using the Axolotl script to train models on https://modal.com serverless GPUs. Unfortunately, their environment seems to have some kind of bug where if I try to run `datasets.filter` with too high a `num_proc`, it throws an error and dies.
This PR adds a new configuration option `dataset_processes`, which lets you explicitly set the number of processes used to map/filter the dataset. If not included, this defaults to the current behavior of setting that to `os.cpu_count()`.
2023-09-29 00:22:22 -04:00
ich
590d6032fd
Fix bug when using pretokenized datasets ( #652 )
...
* fix pretokenized datasets readme
* check if dataset type is not set to handle pretokenized datasets
2023-09-28 22:54:10 -04:00
Wing Lian
409ca0f21c
add support for defined train split ( #654 )
2023-09-28 20:14:14 -04:00