* make pad_to_sequence_len default to the same value as sample_packing
* remove duplicate validation
* fix test
* update description meta
Co-authored-by: NanoCode012 <nano@axolotl.ai>
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai>
* checkpoint model on first step callback
* remove debug
* add test cases; update existing tests not to save on first step
* move test out of solo
* delete
* default to False
* typo
* Add CausalLMBenchEvalCallback for measuring seq2seq performance
* Fix code for pre-commit
* Fix typing and improve logging
* eval_sample_packing must be false with CausalLMBenchEvalCallback
* set fp16 to false if bf16, update bf16: auto in example YAMLs
* unset fp16 so that it fallsback properly if bf16 isn't available
* Update README.md [skip-ci]
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* test that bf16 disables fp16
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* Feat: Update to handle wandb env better
* chore: rename wandb_run_id to wandb_name
* feat: add new recommendation and update config
* fix: indent and pop disabled env if project passed
* feat: test env set for wandb and recommendation
* feat: update to use wandb_name and allow id
* chore: add info to readme
* Adding qlora config for Mistral
Contains fix for Mistral FA issue - ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Mistral. Make sure to call tokenizer.padding_side = 'left' before tokenizing the input.
Fix for now is to set sample_packing: true and pad_to_sequence_len: true
* Renamed to qlora.yml