Commit Graph

  • ead34c516a swap the data collator for evals if not using sample packing (#1076) Wing Lian 2024-01-09 22:16:24 -05:00
  • ec02b7cc4e Update FUNDING.yml [skip ci] Wing Lian 2024-01-09 22:15:27 -05:00
  • 3b4c646f87 Update FUNDING.yml with bitcoin (#1079) [skip ci] Wing Lian 2024-01-09 21:56:52 -05:00
  • 788649fe95 attempt to also run e2e tests that needs gpus (#1070) Wing Lian 2024-01-09 21:23:23 -05:00
  • 9be92d1448 Separate AutoGPTQ dep to pip install -e .[auto-gptq] (#1077) Casper 2024-01-09 23:39:25 +01:00
  • d7057ccd36 paired kto support (#1069) Wing Lian 2024-01-09 13:30:45 -05:00
  • 768d348f42 update peft to 0.7.0 (#1073) mtenenholtz 2024-01-09 11:22:14 -06:00
  • 090c24dcb0 Add: mlflow for experiment tracking (#1059) [skip ci] Johan Hansson 2024-01-09 15:34:09 +01:00
  • 651b7a31fc fix double eos token for chatml (#1054) [skip ci] Wing Lian 2024-01-09 09:33:38 -05:00
  • 04b978b428 Cosine learning rate schedule - minimum learning rate (#1062) Ricardo Dominguez-Olmedo 2024-01-09 15:29:56 +01:00
  • c3e8165f26 fix: torch_dtype mistral default to fp32 (#1050) NanoCode012 2024-01-09 21:48:15 +09:00
  • 7f381750d9 Update FUNDING.yml for Kofi link (#1067) Wing Lian 2024-01-08 19:26:51 -05:00
  • 14964417ee Sponsors (#1065) Wing Lian 2024-01-08 18:52:00 -05:00
  • 81d384598e Efficiently get the length of the tokenized docs (#1063) Ricardo Dominguez-Olmedo 2024-01-08 21:48:30 +01:00
  • 732851f105 Phi2 rewrite (#1058) Wing Lian 2024-01-08 14:04:22 -05:00
  • 7ecc3a408c Fix(debug): Use space delimiter for debug_text_only also NanoCode012-patch-1 NanoCode012 2024-01-07 12:45:19 +09:00
  • 9ca358b671 Simplify Docker Unit Test CI (#1055) [skip ci] Hamel Husain 2024-01-06 05:20:33 -08:00
  • 553c80f79a streaming multipack for pretraining dataset (#959) JinK 2024-01-06 12:13:21 +09:00
  • eb4c99431b Update tests-docker.yml (#1052) [skip ci] Hamel Husain 2024-01-05 11:26:18 -08:00
  • cbdbf9e6e5 feat: always push checkpoint to hub if set (#1049) [skip ci] NanoCode012 2024-01-06 03:09:42 +09:00
  • bdfefaf054 feature: better device mapping for large models (#918) kallewoof 2024-01-05 22:22:21 +09:00
  • 63fb3eb426 set default for merge (#1044) Hamel Husain 2024-01-04 18:14:20 -08:00
  • 31d23504a5 fix model card upload for PEFT models (#1043) Hamel Husain 2024-01-04 18:13:54 -08:00
  • f243c2186d RL/DPO (#935) Wing Lian 2024-01-04 18:21:25 -05:00
  • 59b2d302c8 Added chatglm3 conversation type for training models like TinyLLama (#1036) xaviviro 2024-01-04 13:03:04 +01:00
  • bcc78d8fa3 bump transformers and update attention class map name (#1023) Wing Lian 2024-01-03 15:11:04 -05:00
  • 74532ddc45 chore(config): clean up old log for Qwen (#1034) NanoCode012 2024-01-04 01:19:52 +09:00
  • 8ba27f3bde fix: lint (#1037) NanoCode012 2024-01-04 00:23:44 +09:00
  • a3e8783328 [Docs] delete unused cfg value lora_out_dir (#1029) Hamel Husain 2024-01-02 21:35:20 -08:00
  • b31038aae9 chore(readme): update instruction to set config to load from cache (#1030) NanoCode012 2024-01-03 11:56:19 +09:00
  • c75f916745 added tiny llama examples for lora and qlora (#1027) Tim Dolan 2024-01-02 20:00:37 -05:00
  • 4d2e842e46 use recommended setting for use_reentrant w gradient checkpointing (#1021) Wing Lian 2024-01-01 22:17:27 -05:00
  • 272bced137 cpu offloading yayi2 Mads Henrichsen 2023-12-31 22:17:43 +01:00
  • c371d6b546 cpu offloading Mads Henrichsen 2023-12-31 12:02:29 +01:00
  • d6273188f0 fft Mads Henrichsen 2023-12-31 07:42:46 +01:00
  • 72797b04a5 fix modules Mads Henrichsen 2023-12-31 07:40:33 +01:00
  • de47bb5eb0 better lr Mads Henrichsen 2023-12-30 22:36:50 +01:00
  • c04df54b4b new lr Mads Henrichsen 2023-12-30 21:36:01 +01:00
  • e3716db386 small batch size Mads Henrichsen 2023-12-30 13:20:45 +01:00
  • 97943d8fc4 model revision Mads Henrichsen 2023-12-30 12:55:17 +01:00
  • 9d3f80cd40 disable packing Mads Henrichsen 2023-12-30 12:51:03 +01:00
  • bfae79a634 trust Mads Henrichsen 2023-12-30 12:47:50 +01:00
  • 5a85ee16eb yayi2 Mads Henrichsen 2023-12-30 12:43:46 +01:00
  • 3678a6c41d Fix: bf16 support for inference (#981) Tazik Shahjahan 2023-12-29 14:15:53 -08:00
  • f8ae59b0a8 Adds chat templates (#1022) mhenrichsen 2023-12-29 22:44:23 +01:00
  • 4f4d638b84 [WandB] Push axolotl config to top level wandb files (#1014) Hamel Husain 2023-12-29 10:52:12 -08:00
  • ba043a361e add ultrachat prompt strategies (#996) Wing Lian 2023-12-29 12:23:29 -06:00
  • 41353d2ea0 feat: expose bnb kwargs (#1018) NanoCode012 2023-12-29 18:16:26 +09:00
  • f6ecf14dd4 feat: remove need to add load_in* during merge (#1017) NanoCode012 2023-12-29 18:15:30 +09:00
  • dec66d7c53 [Docs] Nit: Remind people to auth to wandb if they are going to use it (#1013) Hamel Husain 2023-12-28 18:00:16 -08:00
  • 76357dc5da Update README.md (#1012) Hamel Husain 2023-12-28 18:00:02 -08:00
  • 70b46ca4f4 remove landmark attn and xpos rope implementations (#1010) Wing Lian 2023-12-27 23:07:27 -06:00
  • 85dd4d525b add config to model card (#1005) Hamel Husain 2023-12-27 19:25:33 -08:00
  • 384b817dc0 Set eval_sample_packing to false in mistral config.yaml (#1003) Kevin Sydney 2023-12-27 16:11:55 -08:00
  • db9094df0f FEAT: add tagging support to axolotl (#1004) Younes Belkada 2023-12-27 23:25:20 +01:00
  • 6ef46f8dca Add an example config for finetuning a 34B model on a 24GB GPU (#1000) Evan Griffiths 2023-12-25 18:29:55 +00:00
  • 628b754824 set output_router_logits for mixtral config: (#995) Wing Lian 2023-12-22 12:57:02 -05:00
  • 37820f6540 support for cuda 12.1 (#989) Wing Lian 2023-12-22 11:08:22 -05:00
  • 7d4185ffcb chore: Update transformers to latest (#986) NanoCode012 2023-12-23 00:29:36 +09:00
  • 93ebec1ac5 change val size (#992) mhenrichsen 2023-12-22 16:18:16 +01:00
  • 2e61dc3180 Add tests to Docker (#993) Hamel Husain 2023-12-22 06:37:20 -08:00
  • 1ffa3866f2 Feat: Warns to add to modules_to_save when adding tokens or switching special_tokens (#787) NanoCode012 2023-12-22 21:49:07 +09:00
  • 62ba1609b6 bump actions versions Hamel Husain 2023-12-21 08:54:08 -08:00
  • 7bbaac98f7 fix mistral prompt assembly (#982) Hamel Husain 2023-12-21 08:00:55 -08:00
  • 161bcb6517 Dockerfile torch fix (#987) Wing Lian 2023-12-21 09:38:20 -05:00
  • 856f5f6115 Update README.md hamelsmu-patch-1 Hamel Husain 2023-12-19 17:15:58 -08:00
  • d25c34caa6 Update README.md (#966) Ikko Eltociear Ashimine 2023-12-17 23:51:25 +09:00
  • 13e938149d fix: add lr scheduler kwargs to Trainer (#972) NanoCode012 2023-12-17 18:48:28 +09:00
  • 85de004dd4 fix for build for nccl in dockerfile (#970) Wing Lian 2023-12-16 19:12:01 -05:00
  • 80ec7af358 update to latest nccl in docker image (#965) Wing Lian 2023-12-16 18:31:25 -05:00
  • f28e75513b update transformers to fix checkpoint saving (#963) dumpmemory 2023-12-16 10:03:17 +08:00
  • 5ada140ff0 Fix prompt assembly for llama (#952) Hamel Husain 2023-12-14 10:03:59 -08:00
  • 712fd27b3f Add docs (#947) Hamel Husain 2023-12-13 14:22:52 -08:00
  • ef24342538 fix: switch to using the HuggingFace Transformers NEFT implementation (#941) kallewoof 2023-12-14 07:15:34 +09:00
  • 5ea3aa31f0 Fix Deepspeed loading (#950) Wing Lian 2023-12-13 16:03:23 -05:00
  • f1f60cb5b2 Flash attn hotfix (#951) Wing Lian 2023-12-13 13:42:23 -05:00
  • 450e04d3c4 fix: remove excessive newlines in system prompt(s) for alpaca (#936) mixtral_optimized kallewoof 2023-12-13 16:40:02 +09:00
  • b0cf397ecb More hints on what to do with CUDA Out of memory errors (#925) Juraj Bednar 2023-12-13 08:38:38 +01:00
  • 5bb4a782ce dataloader defaults 20231212-fixes Wing Lian 2023-12-12 17:33:31 -05:00
  • 5f79b8242f new evals_per_epoch and saves_per_epoch to make things cleaner (#944) Wing Lian 2023-12-12 15:35:23 -05:00
  • f1de29dd1e Respect sequence_len in config for type: llama2_chat (#926) Hamel Husain 2023-12-12 09:39:22 -08:00
  • 7fabc4d95e Mixtral official (#942) Wing Lian 2023-12-11 23:44:33 -05:00
  • 9a5eb3990c Update requirements.txt (#940) Motoki Wu 2023-12-11 19:57:28 -08:00
  • a58a9e5f6c Only fuse if flash_attn_fuse_mlp is True mixtral_swiglu Casper 2023-12-10 19:17:12 +01:00
  • 279a1401b5 Formatting again Casper 2023-12-10 17:23:06 +01:00
  • 083beb6425 Fix import Casper 2023-12-10 17:21:06 +01:00
  • 2ac1a72e4b Formatting Casper 2023-12-10 17:15:42 +01:00
  • 23103ac5ac Mixtral: Replace FeedForward with SwiGLU Casper 2023-12-10 17:10:04 +01:00
  • 86487c2e96 Mixtral: More correct MoE, lower loss (#932) Casper 2023-12-10 16:34:25 +01:00
  • 35f9b0f149 update to latest transformers for mixstral support (#929) Wing Lian 2023-12-10 10:32:27 -05:00
  • 68b227a7d8 Mixtral multipack (#928) Wing Lian 2023-12-09 21:26:30 -05:00
  • 03c6318ba3 fixing prompt template of chatml by removal of linebreak (#922) Timothy Lim 2023-12-10 02:07:44 +08:00
  • 40a6362c92 support for mamba (#915) Wing Lian 2023-12-09 12:10:41 -05:00
  • d339beb9d9 chore: clarify Readme on sharegpt system role NanoCode012 2023-12-08 11:35:53 +09:00
  • fde091cb12 fix(tokenizer): handle fast tokenizer properly for bos/eos (#914) NanoCode012 2023-12-08 11:31:13 +09:00
  • e12133613b Remove another unused import refactor-flash-attention Casper 2023-12-07 21:58:29 +01:00
  • d9d97e3896 Remove unused imports Casper 2023-12-07 21:57:51 +01:00
  • 1cb7977026 Implement Mistral SwiGLU Casper 2023-12-07 19:59:29 +01:00
  • dfd06a0f88 Fix naming of rms_norm for Llama Casper 2023-12-07 19:53:20 +01:00
  • 40d231a91b Enable replacing xentropy, rmsnorm for Mistral Casper 2023-12-07 19:52:40 +01:00