Wing Lian
2d8def68dc
simplify by removing duplicate base_model_config ( #772 )
2023-10-23 01:42:38 -04:00
atgctg
ace70b33c6
Fix: lowercase True values in config ( #713 )
...
* Fix: lowercase `True` values in config
* Fix: lowercase `True` values in config
2023-10-10 21:32:20 +09:00
lukemarsden
295b2662e1
Get qlora mistral-7b fine tuning working on a single 4090 ( #708 )
2023-10-10 15:14:23 +09:00
mhenrichsen
f91db198f3
fix unneeded space ( #699 )
2023-10-07 14:19:25 -04:00
mhenrichsen
83a950bb87
lint
2023-10-07 11:04:35 +02:00
mhenrichsen
4c8ddf2c6f
new lr, sample pack
2023-10-06 22:58:13 +02:00
NanoCode012
669f1d052c
Fix: Higher vram usage for mistral and sample_packing ( #691 )
...
* Fix: Higher vram usage for mistral and sample_packing
* chore: update comment
* chore: lint
2023-10-06 12:33:43 -04:00
Abhishek Mishra
d4a88e4eca
Adding qlora config for Mistral ( #675 )
...
* Adding qlora config for Mistral
Contains fix for Mistral FA issue - ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Mistral. Make sure to call tokenizer.padding_side = 'left' before tokenizing the input.
Fix for now is to set sample_packing: true and pad_to_sequence_len: true
* Renamed to qlora.yml
2023-10-06 21:05:56 +09:00
Wing Lian
e50a64e85e
prepared dataset caching, other misc fixes ( #665 )
...
* prepared dataset caching, other misc fixes
* also don't load from disk cache unless explicit
2023-10-02 21:07:24 -04:00
Adarsh Shirawalmath
b88f51512a
Update mistral/README.md ( #647 )
2023-09-28 10:24:56 -04:00
NanoCode012
eb41f76f92
Feat: Add example for Mistral ( #644 )
...
* Feat: Add example for Mistral
* chore: turn off flash
* chore: add is_mistral_derived_model
* chore: update following PR
2023-09-28 20:15:00 +09:00