Commit Graph

37 Commits

Author SHA1 Message Date
Motoki Wu
98c25e15cb Add ORPO example and e2e test (#1572)
* add example for mistral orpo

* sample_packing: false for orpo

* go to load_dataset (since load_rl_datasets require a transfom_fn, which only dpo uses currently)
2024-04-27 12:07:06 -04:00
Wing Lian
c10563c444 fix broken linting (#1541)
* chore: lint

* include examples in yaml check

* mistral decided to gate their models...

* more mistral models that were gated
2024-04-19 01:03:04 -04:00
Atlas
0eadfc8c86 Create mixtral_22.yml (#1514) [skip ci]
Code sourced from here:

https://twitter.com/mattshumer_/status/1778135774887567712
2024-04-17 01:16:00 -04:00
Wing Lian
132eb740f0 DBRX Model Support (#1462)
* wip for dbrx finetuning

* add fastcore for parallel loading of sharded weights

* fix dtype for load, use PartialState instead of accelerator to init process group, remove redundant wandb callback

* update to use v2 of the converted model

* more fixes for dbrx loras

* make sure to enable fsdp activation checkpointing

* fix support for 8bit loras too for dbrx

* apply z3 leaf moe fix for DBRX with deepspeed

* don't raise value error since child module searches could fail and be ok

* revert a previous change to fix fsdp

* update mistral/mistral qlora+fsdp yamls

* fix qlora+fsdp quant storage type

* more edge cases for qlora-fsdp

* fixes for fsdp+qlora w optimizer in 8bit

* add bigstral z3 config and make sure to use full_state_dict for fsdp
2024-04-12 09:02:36 -04:00
NanoCode012
f1ebaa07c6 chore(config): refactor old mistral config (#1435)
* chore(config): refactor old mistral config

* chore: add link to colab on readme
2024-03-25 12:00:44 +09:00
Seungduk Kim
05bcc9ea56 Train parameters exclusively in specific ranges (#1390)
* Train parameters exclusively in specific ranges

* Fix the style and update docs

* Update yaml example
2024-03-14 11:05:42 -04:00
Wing Lian
9b6ee83a73 FDSP + QLoRA (#1378)
* wip qlora + fsdp fixes

* more fixes

* make sure to load the lora 🤦

* only setup quantized meta on non-zero rank:

* only run setup_quantized_peft_meta_for_training for qlora+fsdp

* more fixes for qlora+fsdp

* chore: lint

* add example yml

* support mistral too

* fix for model_type and add mixtral support too

* set cpu_offload: false to reduce vram, constrain new accleerator logic to qlora + fsdp

* refactor for duplicate code
2024-03-08 14:31:01 -05:00
Maxime
0f6af36d50 Mps mistral lora (#1292) [skip ci]
* Lora example for Mistral on MPS backend

* Add some MPS documentation

* Update examples/mistral/lora-mps.yml

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* Update examples/mistral/lora-mps.yml

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* Update README.md

---------

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-02-26 22:39:57 -05:00
NanoCode012
a7a9a1433a fix(examples): remove is_*_derived as it's parsed automatically (#1297) 2024-02-22 00:52:46 +09:00
Leonardo Emili
5a5d47458d Add seq2seq eval benchmark callback (#1274)
* Add CausalLMBenchEvalCallback for measuring seq2seq performance

* Fix code for pre-commit

* Fix typing and improve logging

* eval_sample_packing must be false with CausalLMBenchEvalCallback
2024-02-13 08:24:30 -08:00
Wing Lian
54d2ac155b Mixtral fixes 20240124 (#1192) [skip ci]
* mixtral nccl fixes

* make sure to patch for z3
2024-01-24 14:59:57 -05:00
Tilemachos Chatzipapas
cc250391a0 Fine-Tuning Mistral-7b for Real-World Chatbot Applications Using Axolotl (Lora used) (#1155)
* Mistral-7b finetune example using axolotl with code,config,data

* Corrected the path for huggingface dataset

* Update data.jsonl

* chore: lint

---------

Co-authored-by: twenty8th <twenty8th@users.noreply.github.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-01-23 07:32:21 -05:00
Wing Lian
782b6a4216 set fp16 to false if bf16, update bf16: auto in example YAMLs (#1122) [skip ci]
* set fp16 to false if bf16, update bf16: auto in example YAMLs

* unset fp16 so that it fallsback properly if bf16 isn't available

* Update README.md [skip-ci]

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* test that bf16 disables fp16

---------

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
2024-01-22 18:44:01 -05:00
Kevin Sydney
384b817dc0 Set eval_sample_packing to false in mistral config.yaml (#1003)
Without eval_sampling_packing set to false, ValueError occurs with eval dataset split is too small for sample_packing.
2023-12-27 16:11:55 -08:00
Wing Lian
628b754824 set output_router_logits for mixtral config: (#995) 2023-12-22 12:57:02 -05:00
mhenrichsen
93ebec1ac5 change val size (#992) 2023-12-22 16:18:16 +01:00
Wing Lian
5ea3aa31f0 Fix Deepspeed loading (#950)
* add check for zero3

* freeze parameters

* fixes for deepspeed loading

* fix model parameter check

* unfrozen parameters in example mixtral and logging when unfreezing
2023-12-13 16:03:23 -05:00
Wing Lian
5f79b8242f new evals_per_epoch and saves_per_epoch to make things cleaner (#944)
* new evals_per_epoch and saves_per_epoch to make things cleaner

* update per PR feedback
2023-12-12 15:35:23 -05:00
Wing Lian
7fabc4d95e Mixtral official (#942)
* multipack support for official mixtral implementation

* fix patch to load multipack for mixtral

* chore: lint
2023-12-11 23:44:33 -05:00
Wing Lian
35f9b0f149 update to latest transformers for mixstral support (#929)
* update to latest transformers for mixstral support

* pin transformers

* fix typo
2023-12-10 10:32:27 -05:00
Wing Lian
68b227a7d8 Mixtral multipack (#928)
* mixtral multipack

* use mixtral model

* sample yml

* calculate cu_seqlens properly

* use updated flash ettention setting

* attn var checks

* force use of flash attention 2 for packing

* lint

* disable future fix for now

* update support table
2023-12-09 21:26:30 -05:00
NanoCode012
a1da39cd48 Feat(wandb): Refactor to be more flexible (#767)
* Feat: Update to handle wandb env better

* chore: rename wandb_run_id to wandb_name

* feat: add new recommendation and update config

* fix: indent and pop disabled env if project passed

* feat: test env set for wandb and recommendation

* feat: update to use wandb_name and allow id

* chore: add info to readme
2023-12-04 22:17:25 +09:00
kallewoof
58ec8b1113 feature: loss watchdog for terminating training runs that are failing (#899)
Co-authored-by: Karl-Johan Alm <kalle@gmail.com>
2023-12-04 07:54:34 -05:00
Wing Lian
f544ab2bed don't compile deepspeed or bitsandbytes from source (#837) 2023-11-08 19:49:55 -05:00
Wing Lian
8b79ff0e94 fix eval_steps to be a sane default (#797)
* fix eval_steps to be a sane default

* update docs for fractional eval_steps
2023-10-27 22:36:30 -04:00
Wing Lian
9b43e7ea15 disable eval table w sample packing in examples (#778) 2023-10-23 09:18:44 -04:00
Wing Lian
2d8def68dc simplify by removing duplicate base_model_config (#772) 2023-10-23 01:42:38 -04:00
atgctg
ace70b33c6 Fix: lowercase True values in config (#713)
* Fix: lowercase `True` values in config

* Fix: lowercase `True` values in config
2023-10-10 21:32:20 +09:00
lukemarsden
295b2662e1 Get qlora mistral-7b fine tuning working on a single 4090 (#708) 2023-10-10 15:14:23 +09:00
mhenrichsen
f91db198f3 fix unneeded space (#699) 2023-10-07 14:19:25 -04:00
mhenrichsen
83a950bb87 lint 2023-10-07 11:04:35 +02:00
mhenrichsen
4c8ddf2c6f new lr, sample pack 2023-10-06 22:58:13 +02:00
NanoCode012
669f1d052c Fix: Higher vram usage for mistral and sample_packing (#691)
* Fix: Higher vram usage for mistral and sample_packing

* chore: update comment

* chore: lint
2023-10-06 12:33:43 -04:00
Abhishek Mishra
d4a88e4eca Adding qlora config for Mistral (#675)
* Adding qlora config for Mistral

Contains fix for Mistral FA issue - ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Mistral. Make sure to  call tokenizer.padding_side  = 'left' before tokenizing the input.

Fix for now is to set sample_packing: true and pad_to_sequence_len: true

* Renamed to qlora.yml
2023-10-06 21:05:56 +09:00
Wing Lian
e50a64e85e prepared dataset caching, other misc fixes (#665)
* prepared dataset caching, other misc fixes

* also don't load from disk cache unless explicit
2023-10-02 21:07:24 -04:00
Adarsh Shirawalmath
b88f51512a Update mistral/README.md (#647) 2023-09-28 10:24:56 -04:00
NanoCode012
eb41f76f92 Feat: Add example for Mistral (#644)
* Feat: Add example for Mistral

* chore: turn off flash

* chore: add is_mistral_derived_model

* chore: update following PR
2023-09-28 20:15:00 +09:00