kallewoof
58ec8b1113
feature: loss watchdog for terminating training runs that are failing ( #899 )
...
Co-authored-by: Karl-Johan Alm <kalle@gmail.com >
2023-12-04 07:54:34 -05:00
Wing Lian
f544ab2bed
don't compile deepspeed or bitsandbytes from source ( #837 )
2023-11-08 19:49:55 -05:00
Wing Lian
8b79ff0e94
fix eval_steps to be a sane default ( #797 )
...
* fix eval_steps to be a sane default
* update docs for fractional eval_steps
2023-10-27 22:36:30 -04:00
Wing Lian
9b43e7ea15
disable eval table w sample packing in examples ( #778 )
2023-10-23 09:18:44 -04:00
Wing Lian
2d8def68dc
simplify by removing duplicate base_model_config ( #772 )
2023-10-23 01:42:38 -04:00
atgctg
ace70b33c6
Fix: lowercase True values in config ( #713 )
...
* Fix: lowercase `True` values in config
* Fix: lowercase `True` values in config
2023-10-10 21:32:20 +09:00
lukemarsden
295b2662e1
Get qlora mistral-7b fine tuning working on a single 4090 ( #708 )
2023-10-10 15:14:23 +09:00
NanoCode012
669f1d052c
Fix: Higher vram usage for mistral and sample_packing ( #691 )
...
* Fix: Higher vram usage for mistral and sample_packing
* chore: update comment
* chore: lint
2023-10-06 12:33:43 -04:00
Abhishek Mishra
d4a88e4eca
Adding qlora config for Mistral ( #675 )
...
* Adding qlora config for Mistral
Contains fix for Mistral FA issue - ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Mistral. Make sure to call tokenizer.padding_side = 'left' before tokenizing the input.
Fix for now is to set sample_packing: true and pad_to_sequence_len: true
* Renamed to qlora.yml
2023-10-06 21:05:56 +09:00