axolotl

Author	SHA1	Message	Date
Zac Brannelly	73f1bdaa15	Fix bug preventing model_kwargs being injected (#1262 )	2024-02-07 09:38:35 -05:00
Philip May	13eea21f9b	Add more save strategies for DPO training. (#1255 ) * Set save_strategy and save_steps in HFDPOTrainerBuilder * fix doublicate save_steps	2024-02-06 00:38:43 -05:00
Chirag Jain	1072f28874	Fix typo `bloat16` -> `bfloat16` (#1257 )	2024-02-06 00:38:14 -05:00
Wing Lian	c7cf3810bd	Pretrain transforms (#1261 ) * wip for pretraining/iterable data with arbitrary prompt strategies * more fixes, wip * more fixes for custom pretraining * iterable ds wrapper not needed * remove extra features * chore: lint * update pretraning example yml * fix order for partials * fixup for tests	2024-02-06 00:37:03 -05:00
Wing Lian	8c2e05ade3	relora: magnitude pruning of the optimizer (#1245 ) * magnitude pruning of the optimizer * add alpaca chat template and fix relora patch * fix handling of lora adapter for relora * fix merge and save call * fixes for 8-bit lora merge * save intermediate checkpoint adapters * auto merge * fix eval check * handle relora annealing * fix anneal step logic * chore: lint * misx fix * fix types * Update tests/e2e/test_relora_llama.py * check for safetensors saved from relora	2024-02-06 00:35:30 -05:00
NanoCode012	2d65f470d5	fix(model): apply gate fp32 only for mixtral (#1241 ) * fix(model): apply gate fp32 only for mixtral * Update src/axolotl/utils/models.py * fix gate layer check --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-02-01 13:55:05 -05:00
Wing Lian	00568c1539	support for true batches with multipack (#1230 ) * support for true batches with multipack * patch the map dataset fetcher to handle batches with packed indexes * patch 4d mask creation for sdp attention * better handling for BetterTransformer * patch general case for 4d mask * setup forward patch. WIP * fix patch file * support for multipack w/o flash attention for llama * cleanup * add warning about bf16 vs fp16 for multipack with sdpa * bugfixes * add 4d multipack tests, refactor patches * update tests and add warnings * fix e2e file check * skip sdpa test if not at least torch 2.1.1, update docs	2024-02-01 10:18:42 -05:00
Wing Lian	c67fb71583	Peft deepspeed resume (#1227 ) * import deepspeed integration * monkeypatch peft adapater with deepspeed for resume from checkpoint * fix patch * fix patches attempt 2 * make sure to set lora_model_dir * skip pylint for deepspeed.utils * pick up upstream fix in transformers * remove monkeypatch for deepspeed/peft fix * no need to set the lora_model_dir on resume * unset load_in_bit when using quant config guard before del * better handling of load_in* kwargs	2024-01-31 18:13:29 -05:00
DreamGenX	25e037fe2d	Support for additional_special_tokens (#1221 ) [skip ci] * Support for additional_special_tokens * Support for additional_special_tokens. Adjust whitespace. * Support for additional_special_tokens. Use correct quotes. * Support for additional_special_tokens. Safe pop. * Support for additional_special_tokens. nt. * Support for additional_special_tokens. cfg.special_tokens may be None. * add token if not in vocabulary when adding additional_special_tokens * fix logic for copy/pasta * bugfix for popping from config and tokenizer reload * no need to add tokens manually now with previous bugfix --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-31 18:13:13 -05:00
DreamGenX	5787e1a23f	Fix and document test_datasets (#1228 ) * Make sure test_dataset are used and treat val_set_size. * Add test_datasets docs. * Apply suggestions from code review --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-31 06:48:57 -05:00
xhedit	8608d8003e	Fix typo (#1231 ) [skip ci]	2024-01-31 06:46:55 -05:00
Wing Lian	4cb7900a56	Peft lotfq (#1222 ) * loftq support for lora * fix loftq check * update readme for loftq * readability cleanup * use peft main for loftq fixes, remove unnecessary special tokens * remove unused test from older deprecation	2024-01-28 18:50:08 -05:00
Filippo Broggini	18f811978c	FEAT: add tagging support to axolotl for DPOTrainer (#1209 ) * Add AxolotlDPOTrainer * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-26 20:01:57 -05:00
Wing Lian	8da1633124	Revert "run PR e2e docker CI tests in Modal" (#1220 ) [skip ci]	2024-01-26 16:50:44 -05:00
Wing Lian	36d053f6f0	run PR e2e docker CI tests in Modal (#1217 ) [skip ci] * wip modal for ci * handle falcon layernorms better * update * rebuild the template each time with the pseudo-ARGS * fix ref * update tests to use modal * cleanup ci script * make sure to install jinja2 also * kickoff the gh action on gh hosted runners and specify num gpus	2024-01-26 16:13:27 -05:00
JohanWork	af29d81f80	ADD: warning if hub_model_id ist set but not any save strategy (#1202 ) * warning if hub model id set but no save * add warning * move the warning * add test * allow more public methods for tests for now * fix tests --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-26 10:38:55 -05:00
DreamGenX	62ca4a2b71	Respect sliding_window=None (#1214 )	2024-01-26 07:43:37 -05:00
Wing Lian	e923e62d24	more checks and fixes for deepspeed and fsdp (#1208 ) [skip ci]	2024-01-25 20:01:45 -05:00
Wing Lian	ba944e6554	workaround for transformers bug requireing do_sample for saveing pretrained (#1206 )	2024-01-25 11:34:41 -05:00
Wing Lian	badda3783b	make sure to register the base chatml template even if no system message is provided (#1207 )	2024-01-25 10:38:08 -05:00
Wing Lian	33e117088f	precompute dpo logprobs setting and fixes (#1199 ) [skip ci] * add support for precompute_ref_log_probs for dpo * add chatml.icr type for argilla orca dpo * update inline doc * also set use_reentrant to false for dpo when not set * don't set use_reentrant to true for rl * make sure to set gradient checkpointing too	2024-01-25 09:31:55 -05:00
Ricardo Dominguez-Olmedo	b4ac96adef	fix learning rate scheduler's warnings (#1135 ) [skip ci] * fix schedulers warnings * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-25 07:09:34 -05:00
mhenrichsen	98b4762077	Feat/chatml add system message (#1117 ) * add system message to template * readme update * added code to register new system message * register chatml template for test --------- Co-authored-by: Mads Henrichsen <mads@BrbartiendeMads.lan> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-25 08:24:27 +01:00
NanoCode012	08719b9609	fix(log): improve warning to clarify that lora_modules_to_save expect a list (#1197 )	2024-01-24 20:08:34 -05:00
Wing Lian	54d2ac155b	Mixtral fixes 20240124 (#1192 ) [skip ci] * mixtral nccl fixes * make sure to patch for z3	2024-01-24 14:59:57 -05:00
Oleh Kuznetsov	af0243021c	Standardize system prompt format for AlpacaPrompter (#1190 ) [skip ci]	2024-01-24 14:27:01 -05:00
Wing Lian	5bce45f800	more dpo fixes for dataset loading and docs (#1185 ) [skip ci] * more dpo fixes for dataset loading and docs * preprocess dpo datasets	2024-01-24 14:23:55 -05:00
Wing Lian	d85d4942cf	report min lenght of tokenized data (#1186 ) [skip ci]	2024-01-24 09:17:50 -05:00
Agung Baptiso Sorlawan	02f2c720fc	Fix generation_config validation raises Exception for do_merge_lora (#1184 )	2024-01-24 00:42:15 -05:00
James Wade	71141deb18	Add support for offline mode with HF_HUB_OFFLINE envvar (#1182 ) * Add support for offline mode with HF_HUB_OFFLINE envvar * Apply styling * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-24 00:41:47 -05:00
Wing Lian	59a31fe613	DPO fixes v2 (#1174 ) * check for length before trying to remove it * add validation for sample packing with RLHF	2024-01-23 12:56:24 -05:00
Wing Lian	814aee6603	Phi2 multipack (#1173 ) * phi2 multipack * update validation and examples for phi * more updates to phi examples * make sure to use the correct collator for phi multipack * phi needs attention mask now for multipack * if the special token already exists in the tokenizer, don't require in lora modules to save * fix qlora yml for phi, fix phi test validation * test qlora too * make sure flash attention is enabled for the test * don't use remote code for phi anymore * reduce sequence len for sample packing phi	2024-01-23 12:54:36 -05:00
Wing Lian	fb7f9b9516	don't fail if can't cast weights due to offload when merging (#1172 ) [skip ci]	2024-01-23 09:17:08 -05:00
Wing Lian	7523d1f557	DPO cleanup (#1126 ) * cleanup dpo to be a little more extensible, add zephyr/nectar strategy * fix eos slash * support for eval split * fix kwargs * handle empty evals * don't load peft model for dpo * ensure dpo traning args gets bf16 for peft if applicable * fix duplicate kwargs for bf16 * make sure to respect the configured lr scheduler * supprt trainer callback to push config to wandb * set dataloader preload args * ensure that we are loading the lora when merging * Update src/axolotl/utils/data.py Co-authored-by: Agus <agustin.piqueres@gmail.com> * support local datasets for dpo Co-authored-by: Agus <agustin.piqueres@gmail.com> * chore: lint * dpo/kto/ipo smoke tests w lora, simplify dpo dataset type names * add split to dpo tests * fix rebase/merging error * handle edge case w logging * use accelerator for dpo datasets so it doesn't break the logger * missing args * validate checkpoint is an adapter for now * log warning when dataset strategy is not loadable --------- Co-authored-by: Agus <agustin.piqueres@gmail.com>	2024-01-23 00:40:37 -05:00
Casper	684038111e	Add desc to map/filter (#1162 ) * Add desc to map/filter * update descriptions --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-22 21:30:53 -05:00
Wing Lian	cda52dc32b	support for explicit test_dataset definition for evals (#786 )	2024-01-22 21:29:56 -05:00
Wing Lian	e799e08d3c	Falcon embeddings (#1149 ) [skip docker] * also fix multipack for falcon and add smoke tests * make sure to handle special tokens and added tokens for lora * fix reference to model_type * fix tests for falcon * fix stray typo * fixes for smoke tests	2024-01-22 21:01:42 -05:00
Wing Lian	32580c1ca7	Vram fix attempt (#1164 ) [skip ci] * revert order of filter/drop_long step and handle calc for max_input_len only during preprocessing * revert some changes to preparing for packing to allow more flexibility * prepare dataset for packing during pre-processing step * prepare dataset hash based on sample packing too * enclose none check * just cast straight to string for ds hash	2024-01-22 19:54:54 -05:00
Wing Lian	802f9667a2	improve vram use w gradient checkpointing (#1167 ) [skip ci]	2024-01-22 19:48:22 -05:00
JohanWork	b8e5603467	Add mlflow callback for pushing config to mlflow artifacts (#1125 ) * Update callbacks.py adding callback for mlflow * Update trainer_builder.py * clean up	2024-01-22 18:44:39 -05:00
Wing Lian	782b6a4216	set fp16 to false if bf16, update bf16: auto in example YAMLs (#1122 ) [skip ci] * set fp16 to false if bf16, update bf16: auto in example YAMLs * unset fp16 so that it fallsback properly if bf16 isn't available * Update README.md [skip-ci] Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * test that bf16 disables fp16 --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2024-01-22 18:44:01 -05:00
Wing Lian	eaaeefce55	jupyter lab fixes (#1139 ) [skip ci] * add a basic notebook for lab users in the root * update notebook and fix cors for jupyter * cell is code * fix eval batch size check * remove intro notebook	2024-01-22 18:42:40 -05:00
Wing Lian	f5a828aa20	Qwen2 (#1166 ) * qwen2 multipack support * fix qwen derived model check so it doesn't break qwen2 * fixes to ensure qwen2 packing works * bump requirements for qwen2 * requirements typo	2024-01-22 18:24:15 -05:00
Wing Lian	fccb542b47	make sure the model config loader respects the model_revision too (#1160 ) [skip-ci]	2024-01-22 13:23:14 -05:00
Wing Lian	2ce5c0d68a	Deprecate max packed sequence len (#1141 )	2024-01-20 05:11:50 -05:00
NanoCode012	3db5f2fd17	feat(dataset): add config to keep processed dataset in memory (#1152 )	2024-01-20 13:19:28 +09:00
Wing Lian	6910e6a8ca	Multipack simplify for Mixtral (#1142 )	2024-01-18 16:23:49 -05:00
Joe Cummings	1d70f24b50	Add shifted sparse attention (#973 ) [skip-ci] * Add s2_attn to hijack flash code * Refactor code to account for s2_attn * Add test for models utils * Add ``s2_attention`` option to llama configs * Add ``s2_attention`` option to README config * Format code to appease linter * chore: lint * Remove xpos and llama-landmark [bad merge] * add e2e smoke tests for shifted sparse attention * remove stray patch from merge * update yml with link to paper for s2_attention/longlora * fix assertion check for full fine tune * increase sequence len for tests and PR feedback updates * reduce context len to 16k for tests * reduce context len to 16k for tests * reduce batch size for larger context len and udpate test to check message * fix test for message --------- Co-authored-by: joecummings <jrcummings@devvm050.nha0.facebook.com> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-18 10:16:07 -05:00
Wing Lian	317fa2555a	fix bf16 check when preprocessing data (#1140 )	2024-01-17 22:41:23 -05:00
NanoCode012	1e56b88cde	fix(preprocess): Make sure dataset not loaded from cache when using preprocess cli (#1136 )	2024-01-18 03:03:52 +09:00

1 2 3 4 5 ...

599 Commits