axolotl

Author	SHA1	Message	Date
Wing Lian	0ddfb24fcf	LISA (#1469 ) * add lisa support * fix default and fix attribute traversal for layers * improve lisa callback logging * fix LISA by ensuring params are not frozen during __init__ * example config for lisa --------- Co-authored-by: Aman Karmani <aman@tmm1.net>	2024-04-01 04:54:53 -07:00
Wing Lian	6086be85f7	qwen2_moe support w multipack (#1455 )	2024-03-29 11:04:53 -04:00
Wing Lian	05b398a072	fix some of the edge cases for Jamba (#1452 ) * fix some of the edge cases for Jamba * update requirements for jamba	2024-03-29 02:38:02 -04:00
Wing Lian	02af0820f7	Jamba (#1451 ) * fixes for larger models * add qlora example for deepspeed * add readme for jamba	2024-03-28 21:03:22 -04:00
Satpal Singh Rathore	c19d060a74	turn sample_packing on for training (#1438 ) [skip ci]	2024-03-26 15:19:04 -04:00
NanoCode012	f1ebaa07c6	chore(config): refactor old mistral config (#1435 ) * chore(config): refactor old mistral config * chore: add link to colab on readme	2024-03-25 12:00:44 +09:00
Wing Lian	2a1589f6f6	strip out hacky qlora-fsdp workarounds now that qlora-fsdp fixes are upstreamed (#1428 )	2024-03-21 11:56:13 -04:00
Sebastian Raschka	6366b0c212	Fix Gemma 7b qlora.yml (#1405 )	2024-03-14 15:44:38 -04:00
Seungduk Kim	05bcc9ea56	Train parameters exclusively in specific ranges (#1390 ) * Train parameters exclusively in specific ranges * Fix the style and update docs * Update yaml example	2024-03-14 11:05:42 -04:00
Wing Lian	9b6ee83a73	FDSP + QLoRA (#1378 ) * wip qlora + fsdp fixes * more fixes * make sure to load the lora 🤦 * only setup quantized meta on non-zero rank: * only run setup_quantized_peft_meta_for_training for qlora+fsdp * more fixes for qlora+fsdp * chore: lint * add example yml * support mistral too * fix for model_type and add mixtral support too * set cpu_offload: false to reduce vram, constrain new accleerator logic to qlora + fsdp * refactor for duplicate code	2024-03-08 14:31:01 -05:00
Eric Hartford	e0f1895408	add starcoder2 (#1349 ) * add starcoder2 * Apply suggestions from code review Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * chore: lint * Apply suggestions from code review Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> --------- Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2024-03-05 19:49:17 -05:00
Sebastian Raschka	8984bf1722	Update tinyllama lora.yml to fix eval packing issue (#1362 )	2024-03-05 14:36:29 -05:00
NanoCode012	170d4d7092	chore: enable sample_packing for Gemma (#1351 )	2024-03-01 21:56:22 -05:00
Maxime	0f6af36d50	Mps mistral lora (#1292 ) [skip ci] * Lora example for Mistral on MPS backend * Add some MPS documentation * Update examples/mistral/lora-mps.yml Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * Update examples/mistral/lora-mps.yml Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * Update README.md --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-02-26 22:39:57 -05:00
Nathan Cooper	f30d062b48	Add StableLM 2 Example Scripts (#1327 ) [skip ci] * Add StableLM examples and configurations * Add FFT and LORA configuration files and modify readme with usage	2024-02-26 18:44:25 -05:00
Wing Lian	2752d5f958	multipack for gemma (#1313 ) * multipack for gemma * chore: lint * handle cache_position kwarg in updated llama modeling * add position_ids to rotary embed call for updated llama modeling	2024-02-21 19:24:21 -05:00
Monk	9e300aca0c	Adding Google's gemma Model (#1312 )	2024-02-21 12:56:47 -05:00
Jared Palmer	6ab69ec5f8	Add instructions for playing with qlora model to colab example (#1290 ) * Add instructions for playing with qlora model to colab example * Update examples/colab-notebooks/colab-axolotl-example.ipynb Co-authored-by: JohanWork <39947546+JohanWork@users.noreply.github.com> --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> Co-authored-by: JohanWork <39947546+JohanWork@users.noreply.github.com>	2024-02-22 02:46:27 +09:00
NanoCode012	a7a9a1433a	fix(examples): remove is_*_derived as it's parsed automatically (#1297 )	2024-02-22 00:52:46 +09:00
Leonardo Emili	5a5d47458d	Add seq2seq eval benchmark callback (#1274 ) * Add CausalLMBenchEvalCallback for measuring seq2seq performance * Fix code for pre-commit * Fix typing and improve logging * eval_sample_packing must be false with CausalLMBenchEvalCallback	2024-02-13 08:24:30 -08:00
Maxime	fac2d98c26	Add MPS support (#1264 ) * add mps support * linter stuff * CI fixes * install packaging for various tests * Update setup.py * Revert "install packaging for various tests" This reverts commit `980e7aa44d`. * Revert "CI fixes" This reverts commit `4609e3b166`. --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-02-12 08:30:32 -05:00
JohanWork	1c7ed26785	lock pytorch (#1247 ) [skip ci]	2024-02-06 07:48:26 -05:00
Wing Lian	c7cf3810bd	Pretrain transforms (#1261 ) * wip for pretraining/iterable data with arbitrary prompt strategies * more fixes, wip * more fixes for custom pretraining * iterable ds wrapper not needed * remove extra features * chore: lint * update pretraning example yml * fix order for partials * fixup for tests	2024-02-06 00:37:03 -05:00
Wing Lian	4cb7900a56	Peft lotfq (#1222 ) * loftq support for lora * fix loftq check * update readme for loftq * readability cleanup * use peft main for loftq fixes, remove unnecessary special tokens * remove unused test from older deprecation	2024-01-28 18:50:08 -05:00
Igor Berlenko	5407ddd233	Update qlora.yml - remove `max_packed_sequence_len` (#1210 ) [skip ci]	2024-01-26 07:43:05 -05:00
JohanWork	ee0b5f60e5	add colab example (#1196 ) [skip ci]	2024-01-24 20:09:09 -05:00
Wing Lian	54d2ac155b	Mixtral fixes 20240124 (#1192 ) [skip ci] * mixtral nccl fixes * make sure to patch for z3	2024-01-24 14:59:57 -05:00
Wing Lian	814aee6603	Phi2 multipack (#1173 ) * phi2 multipack * update validation and examples for phi * more updates to phi examples * make sure to use the correct collator for phi multipack * phi needs attention mask now for multipack * if the special token already exists in the tokenizer, don't require in lora modules to save * fix qlora yml for phi, fix phi test validation * test qlora too * make sure flash attention is enabled for the test * don't use remote code for phi anymore * reduce sequence len for sample packing phi	2024-01-23 12:54:36 -05:00
Tilemachos Chatzipapas	cc250391a0	Fine-Tuning Mistral-7b for Real-World Chatbot Applications Using Axolotl (Lora used) (#1155 ) * Mistral-7b finetune example using axolotl with code,config,data * Corrected the path for huggingface dataset * Update data.jsonl * chore: lint --------- Co-authored-by: twenty8th <twenty8th@users.noreply.github.com> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-23 07:32:21 -05:00
Wing Lian	e799e08d3c	Falcon embeddings (#1149 ) [skip docker] * also fix multipack for falcon and add smoke tests * make sure to handle special tokens and added tokens for lora * fix reference to model_type * fix tests for falcon * fix stray typo * fixes for smoke tests	2024-01-22 21:01:42 -05:00
Wing Lian	782b6a4216	set fp16 to false if bf16, update bf16: auto in example YAMLs (#1122 ) [skip ci] * set fp16 to false if bf16, update bf16: auto in example YAMLs * unset fp16 so that it fallsback properly if bf16 isn't available * Update README.md [skip-ci] Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * test that bf16 disables fp16 --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2024-01-22 18:44:01 -05:00
Joe Cummings	1d70f24b50	Add shifted sparse attention (#973 ) [skip-ci] * Add s2_attn to hijack flash code * Refactor code to account for s2_attn * Add test for models utils * Add ``s2_attention`` option to llama configs * Add ``s2_attention`` option to README config * Format code to appease linter * chore: lint * Remove xpos and llama-landmark [bad merge] * add e2e smoke tests for shifted sparse attention * remove stray patch from merge * update yml with link to paper for s2_attention/longlora * fix assertion check for full fine tune * increase sequence len for tests and PR feedback updates * reduce context len to 16k for tests * reduce context len to 16k for tests * reduce batch size for larger context len and udpate test to check message * fix test for message --------- Co-authored-by: joecummings <jrcummings@devvm050.nha0.facebook.com> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-18 10:16:07 -05:00
Wing Lian	c1b741d9fb	pin model_revision for phi2 (#1123 )	2024-01-14 17:31:51 -05:00
Wing Lian	732851f105	Phi2 rewrite (#1058 ) * restore to current phi modeling code from phi-2 * enable gradient checkpointing * don't cast everything to float32 all the time * gradient checkpointing for phi2 ParallelBlock module too * fix enabling flash attn for phi2 * add comment about import * fix phi2 example * fix model type check for tokenizer * revert float32 -> bf16 casting changes * support fused dense flash attn * fix the repo for flash-attn * add package name for subdir pkg * fix the data collator when not using sample packing * install packaging for pytests in ci * also fix setup to not install flash attn fused dense subdir if not extras * split out the fused-dense-lib in extra requires * don't train w group_by_length for phi * update integration test to use phi2 * set max steps and save steps for phi e2e tests * try to workaround ssave issue in ci * skip phi2 e2e test for now	2024-01-08 14:04:22 -05:00
JinK	553c80f79a	streaming multipack for pretraining dataset (#959 ) * [Feat] streaming multipack * WIP make continued pretraining work w multipack * fix up hadrcoding, lint * fix dict check * update test for updated pretraining multipack code * fix hardcoded data collator fix for multipack pretraining * fix the collator to be the max length for multipack pretraining * don't bother with latest tag for test * cleanup docker build/test --------- Co-authored-by: jinwonkim93@github.com <jinwonkim> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-05 22:13:21 -05:00
NanoCode012	8ba27f3bde	fix: lint (#1037 )	2024-01-03 10:23:44 -05:00
Tim Dolan	c75f916745	added tiny llama examples for lora and qlora (#1027 ) * added tiny llama examples for lora and qlora * corrected yml files and removed tiny-llama.yml from llama-2 example	2024-01-02 20:00:37 -05:00
Kevin Sydney	384b817dc0	Set eval_sample_packing to false in mistral config.yaml (#1003 ) Without eval_sampling_packing set to false, ValueError occurs with eval dataset split is too small for sample_packing.	2023-12-27 16:11:55 -08:00
Evan Griffiths	6ef46f8dca	Add an example config for finetuning a 34B model on a 24GB GPU (#1000 ) * Add an example config for finetuning a 34B model on a 24GB GPU * Remore wandb project	2023-12-25 10:29:55 -08:00
Wing Lian	628b754824	set output_router_logits for mixtral config: (#995 )	2023-12-22 12:57:02 -05:00
mhenrichsen	93ebec1ac5	change val size (#992 )	2023-12-22 16:18:16 +01:00
Wing Lian	5ea3aa31f0	Fix Deepspeed loading (#950 ) * add check for zero3 * freeze parameters * fixes for deepspeed loading * fix model parameter check * unfrozen parameters in example mixtral and logging when unfreezing	2023-12-13 16:03:23 -05:00
Wing Lian	5f79b8242f	new evals_per_epoch and saves_per_epoch to make things cleaner (#944 ) * new evals_per_epoch and saves_per_epoch to make things cleaner * update per PR feedback	2023-12-12 15:35:23 -05:00
Wing Lian	7fabc4d95e	Mixtral official (#942 ) * multipack support for official mixtral implementation * fix patch to load multipack for mixtral * chore: lint	2023-12-11 23:44:33 -05:00
Wing Lian	35f9b0f149	update to latest transformers for mixstral support (#929 ) * update to latest transformers for mixstral support * pin transformers * fix typo	2023-12-10 10:32:27 -05:00
Wing Lian	68b227a7d8	Mixtral multipack (#928 ) * mixtral multipack * use mixtral model * sample yml * calculate cu_seqlens properly * use updated flash ettention setting * attn var checks * force use of flash attention 2 for packing * lint * disable future fix for now * update support table	2023-12-09 21:26:30 -05:00
Wing Lian	40a6362c92	support for mamba (#915 ) * support for mamba * more mamba fixes * use fork for mamba kwargs fix * grad checkpointing doesn't work * fix extras for mamaba * mamba loss fix * use fp32 and remove verbose logging * mamba fixes * fix collator for mamba * set model_type on training_args * don't save safetensors for mamba * update mamba config to disable safetensor checkpooints, install for tests * no evals for mamba tests * handle save_pretrained * handle unused safetensors arg	2023-12-09 12:10:41 -05:00
NanoCode012	a1da39cd48	Feat(wandb): Refactor to be more flexible (#767 ) * Feat: Update to handle wandb env better * chore: rename wandb_run_id to wandb_name * feat: add new recommendation and update config * fix: indent and pop disabled env if project passed * feat: test env set for wandb and recommendation * feat: update to use wandb_name and allow id * chore: add info to readme	2023-12-04 22:17:25 +09:00
kallewoof	58ec8b1113	feature: loss watchdog for terminating training runs that are failing (#899 ) Co-authored-by: Karl-Johan Alm <kalle@gmail.com>	2023-12-04 07:54:34 -05:00
NanoCode012	a48dbf6561	fix: remove FA for qwen examples (#900 ) * fix: remove FA for qwen lora * fix: remove FA for qlora	2023-11-27 21:23:54 +09:00

1 2 3

138 Commits