Commit Graph

  • bd8cab49c9 update path to align with fsdp example mhenrichsen 2023-08-15 19:51:58 +02:00
  • c01015f33f Fix(config): Update handling of deepspeed config (#404) NanoCode012 2023-08-16 01:22:43 +09:00
  • 72fe3f8e3d Fix(docs): Update flash attn requirements (#409) NanoCode012 2023-08-15 22:40:52 +09:00
  • 47961fdb8b update docs for tokenizer_legacy (#401) Wing Lian 2023-08-15 09:34:42 -04:00
  • 7ad37cb6d7 Fix(template): Remove iPhone/android from Issue template (#407) NanoCode012 2023-08-15 22:32:51 +09:00
  • 29241cf1e4 Ax art (#405) Wing Lian 2023-08-15 08:34:30 -04:00
  • 31db0ecce4 add templates, CoC and contributing guide (#126) lightningRalf 2023-08-15 13:41:05 +02:00
  • da10af03e9 fix eval steps and strategy (#403) Wing Lian 2023-08-15 07:28:50 -04:00
  • 85cf4f8e2c better handling of empty input ids when tokenizing (#395) Wing Lian 2023-08-15 01:09:59 -04:00
  • 2e22404d2d add utils.data.prepare_dataset Aman Karmani 2023-08-15 04:15:55 +00:00
  • be294fd605 Feat(doc): Add how to save by epochs (#396) NanoCode012 2023-08-15 13:24:25 +09:00
  • fc2d6be96d use context manager to run things on rank0 before others (#397) Wing Lian 2023-08-15 00:10:47 -04:00
  • 31079cd5fd smart resize embeddings embeddings-resize Wing Lian 2023-08-07 10:15:10 -04:00
  • 1687be6a35 don't use mask expansion for inference (#392) Wing Lian 2023-08-14 20:52:54 -04:00
  • 41ecb451c2 Feat(doc): Add max_steps to readme (#389) NanoCode012 2023-08-15 00:34:22 +09:00
  • 3c2ad00d07 Feat(config): add max steps (#387) Gabriel Puliatti 2023-08-14 10:19:29 -05:00
  • 5d48a10548 Added "epoch" evaluation_strategy (#388) florian peyron 2023-08-14 16:59:23 +02:00
  • 73a0b6ead5 Feat(config): Add hub_strategy (#386) NanoCode012 2023-08-14 20:12:55 +09:00
  • 63fdb5a7fb Error msg for sharegpt if conv has less than 2 msg (#379) florian peyron 2023-08-14 10:40:40 +02:00
  • fdffef5940 new llama-2 default settings (#370) mhenrichsen 2023-08-14 10:39:09 +02:00
  • 919246fbc1 don't pass rope_scaling kwarg if it's None (#383) Wing Lian 2023-08-13 18:57:38 -04:00
  • ffac902c1b bump flash-attn to 2.0.4 for the base docker image (#382) Wing Lian 2023-08-13 17:55:04 -04:00
  • 15f6e57eaa Fix crash when running without CUDA Charles Goddard 2023-08-13 13:19:48 -07:00
  • 956a177678 speed up flash-attn inference feature/attn-patches Aman Karmani 2023-08-13 18:03:38 +00:00
  • 747e84d3bb update flash-attn patch for 70B/GQA and inference using helper from flash-attn tests Aman Karmani 2023-08-13 15:41:44 +00:00
  • c45a786039 sync xformers patch to follow shared format and be diffable Aman Karmani 2023-08-13 15:41:06 +00:00
  • 70e6c28121 split sdp attn into its own patch Aman Karmani 2023-08-13 15:40:43 +00:00
  • 729c299256 Feat(doc): Improve sharegpt doc (#378) NanoCode012 2023-08-14 00:36:00 +09:00
  • 86a91e260b save tokenizer before training starts (#380) Wing Lian 2023-08-13 11:28:58 -04:00
  • 094fc2c6e6 try to detect accelerate and only use device_map=None in that case (#373) Aman Gupta Karmani 2023-08-12 21:32:07 -07:00
  • 2dafa730ef Create FUNDING.yml Wing Lian 2023-08-13 00:30:34 -04:00
  • 343ac84e5a fix check for flash attn branching (#377) Wing Lian 2023-08-12 22:48:08 -04:00
  • 0c967279ce remove unnecessary local variable Aman Karmani 2023-08-13 01:58:39 +00:00
  • efb3b2c95e simplify load_tokenizer Aman Karmani 2023-08-13 01:33:38 +00:00
  • 7b55fe6419 improve GPU logging to break out pytorch cache and system mem Aman Karmani 2023-08-13 01:50:32 +00:00
  • e029ab34ea quiet noise from llama tokenizer by setting pad token earlier Aman Karmani 2023-08-13 01:30:54 +00:00
  • 8cec513447 extract module for working with cfg Aman Karmani 2023-08-13 01:22:20 +00:00
  • a13e45d548 fix DefaultDict.__or__ Aman Karmani 2023-08-10 03:56:50 +00:00
  • 1afbd8af2d Fix logic errors feature/relora-rebased Charles Goddard 2023-07-25 16:19:53 -07:00
  • b4f2eea2ed Remove redundant assert Charles Goddard 2023-07-24 23:23:24 -07:00
  • bbf88b02c1 Fix saving logic Charles Goddard 2023-07-24 22:14:16 -07:00
  • 64a8e04430 Remove local config Charles Goddard 2023-07-24 21:11:52 -07:00
  • c8f7213bc6 Add CPU offload Charles Goddard 2023-07-24 21:07:36 -07:00
  • b57238ecec Experimental ReLoRA (+qlora) implementation Charles Goddard 2023-07-24 09:53:27 -07:00
  • 918f1b0dfb revert previous change and build ax images w docker on gpu (#371) Wing Lian 2023-08-12 20:23:00 -04:00
  • c3fde36ada attempt to run non-base docker builds on regular cpu hosts (#369) Wing Lian 2023-08-12 19:07:38 -04:00
  • 2bb0b78975 Attention mask and position id fixes for packing (#285) Wing Lian 2023-08-12 15:14:56 -04:00
  • a276c9c88d Fix(save): Save as safetensors (#363) NanoCode012 2023-08-13 01:22:52 +09:00
  • 7019509daa Add wandb_entity to wandb options, update example configs, update README (#361) Morgan McGuire 2023-08-12 17:17:11 +01:00
  • 96bd6ae1c4 Fix(model loading): Warn when model revision is passed to gptq (#364) NanoCode012 2023-08-13 01:16:59 +09:00
  • e37d9358e6 Fix(message): Improve error message for bad format (#365) NanoCode012 2023-08-13 01:16:18 +09:00
  • b5212068ac Feat: Add rope scaling (#343) NanoCode012 2023-08-13 00:50:15 +09:00
  • 289d5c403d feat(merge): save tokenizer on merge (#362) NanoCode012 2023-08-13 00:18:10 +09:00
  • 35c8b90306 Merge pull request #355 from tmm1/bitsandbytes-fixes Aman Gupta Karmani 2023-08-11 15:15:38 -07:00
  • 64af21bcb2 set env vars trainer needs for FSDP packing-attn-limit-fa2-rebased Wing Lian 2023-08-11 08:37:33 -04:00
  • 6b5cf8b5ea optimize length reducer from 9m -> <5sec Wing Lian 2023-08-11 08:30:30 -04:00
  • fae6ed8092 Update README.md on pretraining_dataset (#360) NanoCode012 2023-08-11 12:17:07 +09:00
  • 94d03c8402 Clarify pre-tokenize before multigpu (#359) NanoCode012 2023-08-11 11:27:42 +09:00
  • 79500f358a need to pass total num tokens to trainer too Wing Lian 2023-08-10 19:08:23 -04:00
  • 7e977a9b68 optimization if total_num_tokens is already known Wing Lian 2023-08-10 19:02:28 -04:00
  • ac4b700daa optimization if total_num_tokens is already known Wing Lian 2023-08-10 19:01:17 -04:00
  • 2565c2f259 async batching for multipack Wing Lian 2023-08-10 18:28:15 -04:00
  • a07f432d9c calculate cum seq lens with pos_ids instead of mask, simplify packing params, fix distributed barrier Wing Lian 2023-08-10 17:16:01 -04:00
  • 11ddccb80f Merge pull request #356 from tmm1/load_model-args Aman Gupta Karmani 2023-08-09 18:24:34 -07:00
  • 964312199e Merge pull request #354 from tmm1/gpu-util Aman Gupta Karmani 2023-08-09 15:44:18 -07:00
  • 718102271f simplify load_model signature Aman Karmani 2023-08-09 22:35:33 +00:00
  • f5c11f8262 Merge pull request #350 from tmm1/group-len-false-examples Aman Gupta Karmani 2023-08-09 14:48:48 -07:00
  • fce40aab23 bump to latest bitsandbytes release with major bug fixes Aman Karmani 2023-08-09 21:47:11 +00:00
  • 9c314101d5 use newer pynvml package Aman Karmani 2023-08-09 21:06:28 +00:00
  • e303d64728 log GPU memory usage Aman Karmani 2023-08-09 08:10:37 +00:00
  • 57d9bf711c let's not cleanup the cached datasets Wing Lian 2023-08-08 21:27:55 -04:00
  • 26983a1974 fix sampler to prevent overfit w new epochs Wing Lian 2023-08-08 15:34:18 -04:00
  • 1b8747e319 use custom distributed checks Wing Lian 2023-08-08 13:35:04 -04:00
  • 035b3c760c add numba to requirements. Wing Lian 2023-08-08 10:55:29 -04:00
  • 17abbd59e1 previous accelerate is still most performant Wing Lian 2023-08-08 09:46:01 -04:00
  • 6ec76ddb4c fix steps calculation Wing Lian 2023-08-08 05:13:21 -04:00
  • 21d307b15b fix counts by accounting for num devices Wing Lian 2023-08-08 04:13:10 -04:00
  • 58e9dee204 fixes and go back to distributed sampler since batch sampler won't work Wing Lian 2023-08-08 03:49:29 -04:00
  • 4f7c04bae0 more fixes and optimizations Wing Lian 2023-08-08 03:16:00 -04:00
  • 1162b93b6b filter w multiple cpus Wing Lian 2023-08-08 00:50:56 -04:00
  • 21f445d763 more packing and dataset optimizations and fixes Wing Lian 2023-08-08 00:45:24 -04:00
  • b4d1d22782 note pattern when using groups Aman Karmani 2023-08-07 16:18:42 -07:00
  • 229b9165aa fix test and pylint checks Wing Lian 2023-08-07 09:36:29 -04:00
  • 394a65f11f add unit tests for cum seq lens, add ability to build cu_seq_lens from positional ids, fix prompt test Wing Lian 2023-08-06 17:33:40 -04:00
  • c70dae63cc add chatml Wing Lian 2023-08-05 22:41:47 -04:00
  • 7712955b35 fix chatml system prompt for openorca, legacy tokenizer opts Wing Lian 2023-08-04 13:57:17 -04:00
  • f93f0017cd fix flash-attn, xformers, packing, support chatml Wing Lian 2023-08-04 10:09:16 -04:00
  • 0b01da0713 properly calculate max len Wing Lian 2023-08-03 16:12:04 -04:00
  • b2f7bc7ccd use cumulative seq len with var len flash attn v2 w packing Wing Lian 2023-08-03 15:50:13 -04:00
  • b8905e2a91 sample_packing_seq_len_multiplier config Wing Lian 2023-08-03 08:24:33 -04:00
  • 7e1edc662a make sure the chunk size is an int Wing Lian 2023-08-03 00:27:33 -04:00
  • 98c9bc69de seq_len_multiple for packing Wing Lian 2023-08-02 23:20:19 -04:00
  • 8378335dc9 limit packing to sequences of max seq len Wing Lian 2023-08-02 22:07:40 -04:00
  • bdd34c7400 weighted CEL fixes Wing Lian 2023-08-02 21:36:39 -04:00
  • c6cc54c7d9 weighted CE losses Wing Lian 2023-08-02 15:57:00 -04:00
  • 83f7362480 don't split batches when packing Wing Lian 2023-08-02 08:26:49 -04:00
  • 958d423e7c only process eval dataset for packing if not None Wing Lian 2023-07-30 22:55:17 -04:00
  • e74eab6e73 add a test for the mask expansion for sequence packing Wing Lian 2023-07-28 12:10:15 -04:00
  • 487abfc769 pass sample packing efficiency to training args Wing Lian 2023-07-26 00:06:28 -04:00
  • 2bee646e85 fix step calc for packing Wing Lian 2023-07-25 23:52:34 -04:00