axolotl

Author	SHA1	Message	Date
Antoni-Joan Solergibert	b32c08f8cc	adding llama3 fastchat conversation monkeypatch (#1539 ) * adding llama3 fastchat conversation monkeypatch * Updated conversation turns to work with PR3259 of FastChat * fixed bos token * bump fastchat version --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-05-10 10:40:05 -04:00
Haoxiang Wang	60f5ce0569	Add support for Gemma chat template (#1530 ) * Add support for Gemma chat template * Update fschat version to include its newest support for Gemma chat style * pin fastchat to current HEAD --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-04-21 19:55:40 -04:00
Wing Lian	7d1d22f72f	ORPO Trainer replacement (#1551 ) * WIP use trl ORPOTrainer * fixes to make orpo work with trl * fix the chat template laoding * make sure to handle the special tokens and add_generation for assistant turn too	2024-04-19 17:25:36 -04:00
NanoCode012	59ef25470c	fix(packages): lock datasets version (#1545 )	2024-04-19 20:42:10 +09:00
Wing Lian	132eb740f0	DBRX Model Support (#1462 ) * wip for dbrx finetuning * add fastcore for parallel loading of sharded weights * fix dtype for load, use PartialState instead of accelerator to init process group, remove redundant wandb callback * update to use v2 of the converted model * more fixes for dbrx loras * make sure to enable fsdp activation checkpointing * fix support for 8bit loras too for dbrx * apply z3 leaf moe fix for DBRX with deepspeed * don't raise value error since child module searches could fail and be ok * revert a previous change to fix fsdp * update mistral/mistral qlora+fsdp yamls * fix qlora+fsdp quant storage type * more edge cases for qlora-fsdp * fixes for fsdp+qlora w optimizer in 8bit * add bigstral z3 config and make sure to use full_state_dict for fsdp	2024-04-12 09:02:36 -04:00
Wing Lian	5aa50974ce	Pretrain multipack v2 (#1470 )	2024-04-02 05:42:16 -07:00
Wing Lian	6086be85f7	qwen2_moe support w multipack (#1455 )	2024-03-29 11:04:53 -04:00
Wing Lian	05b398a072	fix some of the edge cases for Jamba (#1452 ) * fix some of the edge cases for Jamba * update requirements for jamba	2024-03-29 02:38:02 -04:00
Wing Lian	2a1589f6f6	strip out hacky qlora-fsdp workarounds now that qlora-fsdp fixes are upstreamed (#1428 )	2024-03-21 11:56:13 -04:00
Wing Lian	dd449c5cd8	support galore once upstreamed into transformers (#1409 ) * support galore once upstreamed into transformers * update module name for llama in readme and fix typing for all linear * bump trl for deprecation fixes from newer transformers * include galore as an extra and install in docker image * fix optim_args type * fix optim_args * update dependencies for galore * add galore to cicd dockerfile	2024-03-19 09:26:35 -04:00
Wing Lian	9b6ee83a73	FDSP + QLoRA (#1378 ) * wip qlora + fsdp fixes * more fixes * make sure to load the lora 🤦 * only setup quantized meta on non-zero rank: * only run setup_quantized_peft_meta_for_training for qlora+fsdp * more fixes for qlora+fsdp * chore: lint * add example yml * support mistral too * fix for model_type and add mixtral support too * set cpu_offload: false to reduce vram, constrain new accleerator logic to qlora + fsdp * refactor for duplicate code	2024-03-08 14:31:01 -05:00
Wing Lian	58b0d4b0d8	update flash attention for gemma support: (#1368 )	2024-03-06 10:08:54 -05:00
Wing Lian	0cfdb2c90c	support for DoRA w/ PEFT (#1363 )	2024-03-05 21:20:15 -05:00
Wing Lian	00018629e7	run tests again on Modal (#1289 ) [skip ci] * run tests again on Modal * make sure to run the full suite of tests on modal * run cicd steps via shell script * run tests in different runs * increase timeout * split tests into steps on modal * increase workflow timeout * retry doing this with only a single script * fix yml launch for modal ci * reorder tests to run on modal * skip dpo tests on modal * run on L4s, A10G takes too long * increase CPU and RAM for modal test * run modal tests on A100s * skip phi test on modal * env not arg in modal dockerfile * upgrade pydantic and fastapi for modal tests * cleanup stray character * use A10s instead of A100 for modal	2024-02-29 14:26:26 -05:00
NanoCode012	5be8b555a0	fix: checkpoint saving with deepspeed (#1321 )	2024-02-27 15:46:44 +09:00
Wing Lian	cc3cebfa70	Pydantic 2.x cfg (#1239 ) * WIP conversion to use pydantic for config validation * wip, more fields, add capabilities * wip * update pydantic validation to match existing tests * tweak requirements * setup deprecated paams pydantic model * more validations * wrap up rest of the validations * flesh out the rest of the options from the readme into pydantic * fix model validators as class methods remember to return in validator missing return add missing relora attributes fix test for DictDefault change fix sys template for mistral from fastchat change in PR 2872 fix test for batch size warning * more missing attributes for cfg * updates from PR feedback * fix validation for datasets and pretrain datasets * fix test for lora check	2024-02-26 12:24:14 -05:00
Wing Lian	5894f0e57e	make mlflow optional (#1317 ) * make mlflow optional * fix xformers don't patch swiglu if xformers not working fix the check for xformers swiglu * fix install of xformers with extra index url for docker builds * fix docker build arg quoting	2024-02-26 11:41:33 -05:00
Wing Lian	2752d5f958	multipack for gemma (#1313 ) * multipack for gemma * chore: lint * handle cache_position kwarg in updated llama modeling * add position_ids to rotary embed call for updated llama modeling	2024-02-21 19:24:21 -05:00
Leonardo Emili	5a5d47458d	Add seq2seq eval benchmark callback (#1274 ) * Add CausalLMBenchEvalCallback for measuring seq2seq performance * Fix code for pre-commit * Fix typing and improve logging * eval_sample_packing must be false with CausalLMBenchEvalCallback	2024-02-13 08:24:30 -08:00
Hamel Husain	9bca7db133	add support for https remote yamls (#1277 )	2024-02-08 20:02:17 -08:00
Wing Lian	c67fb71583	Peft deepspeed resume (#1227 ) * import deepspeed integration * monkeypatch peft adapater with deepspeed for resume from checkpoint * fix patch * fix patches attempt 2 * make sure to set lora_model_dir * skip pylint for deepspeed.utils * pick up upstream fix in transformers * remove monkeypatch for deepspeed/peft fix * no need to set the lora_model_dir on resume * unset load_in_bit when using quant config guard before del * better handling of load_in* kwargs	2024-01-31 18:13:29 -05:00
Wing Lian	4cb7900a56	Peft lotfq (#1222 ) * loftq support for lora * fix loftq check * update readme for loftq * readability cleanup * use peft main for loftq fixes, remove unnecessary special tokens * remove unused test from older deprecation	2024-01-28 18:50:08 -05:00
Wing Lian	8da1633124	Revert "run PR e2e docker CI tests in Modal" (#1220 ) [skip ci]	2024-01-26 16:50:44 -05:00
Wing Lian	36d053f6f0	run PR e2e docker CI tests in Modal (#1217 ) [skip ci] * wip modal for ci * handle falcon layernorms better * update * rebuild the template each time with the pseudo-ARGS * fix ref * update tests to use modal * cleanup ci script * make sure to install jinja2 also * kickoff the gh action on gh hosted runners and specify num gpus	2024-01-26 16:13:27 -05:00
Wing Lian	a01b998c0f	Update deps 202401 (#1204 ) [skip ci] * update deps * xformers fix too	2024-01-25 10:11:49 -05:00
Wing Lian	8a49309489	upgrade deepspeed to 0.13.1 for mixtral fixes (#1189 ) [skip ci] * upgrade deepspeed to 0.13.1 for mixtral fixes * move deepspeed-kernels install to setup.py	2024-01-24 14:26:40 -05:00
Wing Lian	f5a828aa20	Qwen2 (#1166 ) * qwen2 multipack support * fix qwen derived model check so it doesn't break qwen2 * fixes to ensure qwen2 packing works * bump requirements for qwen2 * requirements typo	2024-01-22 18:24:15 -05:00
Casper	91502b98d4	Remove fused-dense-lib from requirements.txt (#1087 )	2024-01-10 21:26:41 +01:00
NanoCode012	d69ba2b0b7	fix: warn user to install mamba_ssm package (#1019 )	2024-01-10 02:50:56 -05:00
Wing Lian	9e3f0cb5a7	pin accelerate for deepspeed fix (#1080 )	2024-01-10 00:50:04 -05:00
Casper	9be92d1448	Separate AutoGPTQ dep to `pip install -e .[auto-gptq]` (#1077 ) * Separate AutoGPTQ dep to `pip install -e .[auto-gptq]` * Fix code review	2024-01-09 23:39:25 +01:00
Wing Lian	d7057ccd36	paired kto support (#1069 )	2024-01-09 13:30:45 -05:00
mtenenholtz	768d348f42	update peft to 0.7.0 (#1073 )	2024-01-09 12:22:14 -05:00
Johan Hansson	090c24dcb0	Add: mlflow for experiment tracking (#1059 ) [skip ci] * Update requirements.txt adding mlflow * Update __init__.py Imports for mlflow * Update README.md * Create mlflow_.py (#1) * Update README.md * fix precommits * Update README.md Update mlflow_tracking_uri * Update trainer_builder.py update trainer building * chore: lint * make ternary a bit more readable --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-09 09:34:09 -05:00
Wing Lian	732851f105	Phi2 rewrite (#1058 ) * restore to current phi modeling code from phi-2 * enable gradient checkpointing * don't cast everything to float32 all the time * gradient checkpointing for phi2 ParallelBlock module too * fix enabling flash attn for phi2 * add comment about import * fix phi2 example * fix model type check for tokenizer * revert float32 -> bf16 casting changes * support fused dense flash attn * fix the repo for flash-attn * add package name for subdir pkg * fix the data collator when not using sample packing * install packaging for pytests in ci * also fix setup to not install flash attn fused dense subdir if not extras * split out the fused-dense-lib in extra requires * don't train w group_by_length for phi * update integration test to use phi2 * set max steps and save steps for phi e2e tests * try to workaround ssave issue in ci * skip phi2 e2e test for now	2024-01-08 14:04:22 -05:00
Wing Lian	f243c2186d	RL/DPO (#935 ) * ipo-dpo trainer * fix missing abstract method * chatml template, grad checkpointing kwargs support * fix steps calc for RL and add dataloader kwargs * wip to fix dpo and start ppo * more fixes * refactor to generalize map fn * fix dataset loop and handle argilla pref dataset * set training args * load reference model on seperate gpu if more than one device * no auto upload to hub for dpo, don't add lora adapters to ref model for dpo * fixes for rl training * support for ipo from yaml * set dpo training args from the config, add tests * chore: lint * set sequence_len for model in test * add RLHF docs	2024-01-04 18:22:55 -05:00
Wing Lian	bcc78d8fa3	bump transformers and update attention class map name (#1023 ) * bump transformers and update attention class map name * also run the tests in docker * add mixtral e2e smoke test * fix base name for docker image in test * mixtral lora doesn't seem to work, at least check qlora * add testcase for mixtral w sample packing * check monkeypatch for flash attn multipack * also run the e2e tests in docker * use all gpus to run tests in docker ci * use privileged mode too for docker w gpus * rename the docker e2e actions for gh ci * set privileged mode for docker and update mixtral model self attn check * use fp16/bf16 for mixtral w fa2 * skip e2e tests on docker w gpus for now * tests to validate mistral and mixtral patches * fix rel import	2024-01-03 12:11:04 -08:00
NanoCode012	7d4185ffcb	chore: Update transformers to latest (#986 )	2023-12-23 00:29:36 +09:00
dumpmemory	f28e75513b	update transformers to fix checkpoint saving (#963 )	2023-12-15 21:03:17 -05:00
Wing Lian	7fabc4d95e	Mixtral official (#942 ) * multipack support for official mixtral implementation * fix patch to load multipack for mixtral * chore: lint	2023-12-11 23:44:33 -05:00
Motoki Wu	9a5eb3990c	Update requirements.txt (#940 )	2023-12-11 22:57:28 -05:00
Wing Lian	35f9b0f149	update to latest transformers for mixstral support (#929 ) * update to latest transformers for mixstral support * pin transformers * fix typo	2023-12-10 10:32:27 -05:00
Wing Lian	6a4562ac08	update datasets version to cut down the warnings due to pyarrow arg change (#897 ) * update datasets to cut down the warnings * set versions for tokenizers and gradio * upgrade transformers to latest version	2023-11-25 16:30:00 -05:00
Wing Lian	0de1457189	try #2 : pin hf transformers and accelerate to latest release, don't reinstall pytorch (#867 ) * isolate torch from the requirements.txt * fix typo for removed line ending * pin transformers and accelerate to latest releases * try w auto-gptq==0.5.1 * update README to remove manual peft install * pin xformers to 0.0.22 * bump flash-attn to 2.3.3 * pin flash attn to exact version	2023-11-16 10:42:36 -05:00
NanoCode012	3cc67d2cdd	Feat: Add dataset loading from S3, GCS (#765 ) * Feat: Add dataset loading from S3, GCS * chore: update docs * chore: add more info on cloud loading	2023-11-16 14:33:58 +09:00
Wing Lian	b3a61e8ce2	add e2e tests for checking functionality of resume from checkpoint (#865 ) * use tensorboard to see if resume from checkpoint works * make sure e2e test is either fp16 or bf16 * set max_steps and save limit so we have the checkpoint when testing resuming * fix test parameters	2023-11-15 23:05:55 -05:00
Bryan Thornbury	105d0b350b	Pin optimum package (#838 )	2023-11-09 22:36:15 -05:00
Wing Lian	f544ab2bed	don't compile deepspeed or bitsandbytes from source (#837 )	2023-11-08 19:49:55 -05:00
Jason Stillerman	738a057674	Feat: Added Gradio support (#812 ) * Added gradio support * queuing and title * pre-commit run	2023-11-04 23:59:22 -04:00
NanoCode012	6459ac7357	fix: pin autogptq (#818 )	2023-11-03 10:14:55 -04:00

1 2

99 Commits