Commit Graph

  • 7402eb9dcb Fix setting correct repo id when pushing dataset to hub (#1657) ripes 2024-08-05 09:42:15 -07:00
  • 203816f7b4 Fix colab example notebook (#1805) [skip ci] Sri Kainkaryam 2024-08-04 12:24:26 -05:00
  • 0e3c88ad3c Built site for gh-pages Quarto GHA Workflow Runner 2024-07-31 00:59:21 +00:00
  • 78b42a3fe1 fix roles to train defaults and make logging less verbose (#1801) Wing Lian 2024-07-30 20:58:17 -04:00
  • d9dc70edfd Built site for gh-pages Quarto GHA Workflow Runner 2024-07-30 23:23:13 +00:00
  • 3ebf22464b qlora-fsdp ram efficient loading with hf trainer (#1791) Wing Lian 2024-07-30 19:21:38 -04:00
  • eb188acbd4 Add option chat_template_jinja to provide a jinja template Chirag Jain 2024-07-31 01:43:40 +05:30
  • 34ea51dcf3 Fix lint and bug post merge from main Chirag Jain 2024-07-30 23:59:38 +05:30
  • fd7538dca7 Merge branch 'main' into cj_tokenizer_default_prompt_template Chirag Jain 2024-07-30 23:48:43 +05:30
  • d44d6b003c Built site for gh-pages Quarto GHA Workflow Runner 2024-07-30 17:37:08 +00:00
  • dbf8fb549e publish axolotl images without extras in the tag name (#1798) Wing Lian 2024-07-30 13:36:19 -04:00
  • d4bceb4b7c Built site for gh-pages Quarto GHA Workflow Runner 2024-07-30 16:38:33 +00:00
  • 9a63884597 update test and main/nightly builds (#1797) Wing Lian 2024-07-30 12:37:40 -04:00
  • 84e31a4a47 Built site for gh-pages Quarto GHA Workflow Runner 2024-07-30 12:51:15 +00:00
  • c5587b45ac use 12.4.1 instead of 12.4 [skip-ci] (#1796) Wing Lian 2024-07-30 08:50:23 -04:00
  • 16c3ced9a8 Built site for gh-pages Quarto GHA Workflow Runner 2024-07-30 12:35:37 +00:00
  • d4f6a6b103 fix dockerfile and base builder (#1795) [skip-ci] Wing Lian 2024-07-30 08:34:37 -04:00
  • 1d8502f3b1 Built site for gh-pages Quarto GHA Workflow Runner 2024-07-30 12:07:00 +00:00
  • d8d1788ffc move to supporting mostly 12.1 w 2.3.1 and add new 12.4 with 2.4.0 (#1793) Wing Lian 2024-07-30 08:06:11 -04:00
  • 1d3bdf4242 Built site for gh-pages Quarto GHA Workflow Runner 2024-07-30 06:00:47 +00:00
  • 3bc8e64557 Update README.md (#1792) mhenrichsen 2024-07-30 07:59:53 +02:00
  • 751d50a6c6 Built site for gh-pages Quarto GHA Workflow Runner 2024-07-29 01:49:48 +00:00
  • 55cc214c76 Add flexible configuration options for chat_template dataset training (#1756) Adam Brusselback 2024-07-28 21:48:57 -04:00
  • 098b8a0d6e Built site for gh-pages Quarto GHA Workflow Runner 2024-07-28 11:26:45 +00:00
  • 94ba93259f various batch of fixes (#1785) Wing Lian 2024-07-28 07:25:54 -04:00
  • 28435f4356 Built site for gh-pages Quarto GHA Workflow Runner 2024-07-27 14:25:01 +00:00
  • 22680913f3 Bump deepspeed 20240727 (#1790) Wing Lian 2024-07-27 10:24:11 -04:00
  • 87261e4b30 Built site for gh-pages Quarto GHA Workflow Runner 2024-07-24 01:23:19 +00:00
  • 6a9cfec222 add support for simpo via cpo trainer (#1772) Wing Lian 2024-07-23 21:22:16 -04:00
  • 498d160f43 Built site for gh-pages Quarto GHA Workflow Runner 2024-07-23 23:55:21 +00:00
  • fe250ada78 fix fsdp loading of models, esp 70b (#1780) Wing Lian 2024-07-23 19:54:28 -04:00
  • e6b299dd79 bump flash attention to 2.6.2 (#1781) [skip ci] Wing Lian 2024-07-23 19:54:15 -04:00
  • e31f83bbf0 Built site for gh-pages Quarto GHA Workflow Runner 2024-07-23 17:21:50 +00:00
  • 608a2f3180 bump transformers for updated llama 3.1 (#1778) Wing Lian 2024-07-23 13:21:03 -04:00
  • 99b3bc7fbd Merge branch 'main' into cj_tokenizer_default_prompt_template Chirag Jain 2024-07-23 17:16:49 +05:30
  • f39eba1d4c Built site for gh-pages Quarto GHA Workflow Runner 2024-07-23 05:42:03 +00:00
  • 87455e7f32 swaps to use newer sample packing for mistral (#1773) Wing Lian 2024-07-23 01:41:11 -04:00
  • 6834703e9e Built site for gh-pages Quarto GHA Workflow Runner 2024-07-21 13:11:34 +00:00
  • 985819d89b Add a chat_template prompt strategy for DPO (#1725) Keith Stevens 2024-07-21 06:10:42 -07:00
  • 043f5a86f4 Built site for gh-pages Quarto GHA Workflow Runner 2024-07-19 16:22:30 +00:00
  • fa91b698e9 Fix untrained tokens (#1771) Wing Lian 2024-07-19 12:21:37 -04:00
  • 756e4ea5c6 Built site for gh-pages Quarto GHA Workflow Runner 2024-07-19 04:48:10 +00:00
  • e4063d60a7 bump transformers and set roundup_power2_divisions for more VRAM improvements, low bit ao optimizers (#1769) Wing Lian 2024-07-19 00:47:07 -04:00
  • f22f960067 Built site for gh-pages Quarto GHA Workflow Runner 2024-07-18 18:55:30 +00:00
  • 7830fe04b5 Unsloth rope (#1767) Wing Lian 2024-07-18 14:54:41 -04:00
  • 12af5e19b5 Built site for gh-pages Quarto GHA Workflow Runner 2024-07-17 19:39:37 +00:00
  • c86c32a627 set the number of dataset processes on the DPO Config rather than the trainer (#1762) Wing Lian 2024-07-17 15:38:37 -04:00
  • 8731b95d04 re-enable PYTORCH_CUDA_ALLOC_CONF expandable_segments (#1765) [skip ci] Wing Lian 2024-07-17 15:38:26 -04:00
  • 8619b2d855 add torch_compile_mode options (#1763) [skip ci] Wing Lian 2024-07-17 15:38:07 -04:00
  • 53813c39ea Built site for gh-pages Quarto GHA Workflow Runner 2024-07-17 14:59:31 +00:00
  • e3122e8644 Built site for gh-pages Quarto GHA Workflow Runner 2024-07-17 14:59:02 +00:00
  • 976f85195a fixes to accelerator so that iterable pretraining datasets work (#1759) Wing Lian 2024-07-17 10:58:38 -04:00
  • 152ab76623 fix num gpu check (#1760) Wing Lian 2024-07-17 10:58:14 -04:00
  • e86dd76154 attempt to set start method to spwan to prevent cuda issues for DPO dpo-spawn-fix Wing Lian 2024-07-17 09:29:15 -04:00
  • ed7c8b206b Built site for gh-pages Quarto GHA Workflow Runner 2024-07-16 21:37:27 +00:00
  • 5f58555bd0 support for llama multipack using updated code/patches (#1754) Wing Lian 2024-07-16 17:36:29 -04:00
  • e839a733f5 Built site for gh-pages Quarto GHA Workflow Runner 2024-07-16 20:01:17 +00:00
  • cfc533a7f7 torch compile and cuda alloc improvements (#1755) Wing Lian 2024-07-16 16:00:23 -04:00
  • f6925c788a Built site for gh-pages Quarto GHA Workflow Runner 2024-07-16 18:46:28 +00:00
  • e1725aef2b update modal package and don't cache pip install (#1757) Wing Lian 2024-07-16 14:45:38 -04:00
  • 20d0427ac9 update llama3 example base models to use nous update-examples-llama3-ez Wing Lian 2024-07-15 17:19:00 -04:00
  • 105c65390e add q-galore optimizer q-galore Wing Lian 2024-07-14 19:28:13 -04:00
  • cd94ab500e Built site for gh-pages Quarto GHA Workflow Runner 2024-07-14 23:13:45 +00:00
  • 78e12f8ca5 add basic support for the optimi adamw optimizer (#1727) Wing Lian 2024-07-14 19:12:57 -04:00
  • 7bf536f4b3 Built site for gh-pages Quarto GHA Workflow Runner 2024-07-14 23:12:23 +00:00
  • 98af5388ba bump flash attention 2.5.8 -> 2.6.1 (#1738) fa-261 Wing Lian 2024-07-14 19:11:31 -04:00
  • 2680421081 bump deepspeed to latest 0.14.4 deepspeed_0_14_4 Wing Lian 2024-07-13 14:16:47 -04:00
  • 219cd0d3c5 Fix eval_sample_packing in llama-3 lora example (#1716) [skip ci] RodriMora 2024-07-13 20:34:44 +02:00
  • 634f384e06 Changed URL for dataset docs (#1744) David Meikle 2024-07-13 19:34:28 +01:00
  • ac5dba4624 Built site for gh-pages Quarto GHA Workflow Runner 2024-07-13 18:05:19 +00:00
  • 4512738a73 bump xformers to 0.0.27 (#1740) Akshaya Shanbhogue 2024-07-13 11:04:31 -07:00
  • 1e57b4c562 update to pytorch 2.3.1 (#1746) [skip ci] Wing Lian 2024-07-13 13:28:17 -04:00
  • f614010392 Built site for gh-pages Quarto GHA Workflow Runner 2024-07-13 13:54:17 +00:00
  • a4a5bf057f fixes to prevent vram spike when train starts (#1742) Wing Lian 2024-07-13 09:53:13 -04:00
  • 10ab89df39 Built site for gh-pages Quarto GHA Workflow Runner 2024-07-13 13:42:36 +00:00
  • 137d84d1b4 add torch 2.3.1 base image (#1745) Wing Lian 2024-07-13 09:41:51 -04:00
  • 18abdb447a typo (#1685) [skip ci] Oliver Klingefjord 2024-07-13 03:24:01 +02:00
  • 4e38cea6b8 Add tests Chirag Jain 2024-07-12 09:04:59 +05:30
  • 5edaad5b8b Allow using tokenizer's default chat template with fallbacks Chirag Jain 2024-07-10 02:12:34 +05:30
  • 47e1916484 add tests so CI can catch updates where patches will break with unsloth (#1737) [skip ci] Wing Lian 2024-07-11 16:43:19 -04:00
  • 5ad84a9488 Built site for gh-pages Quarto GHA Workflow Runner 2024-07-11 13:20:35 +00:00
  • 1194c2e0b1 github urls (#1734) mhenrichsen 2024-07-11 15:19:29 +02:00
  • d2a96e9b1d Built site for gh-pages Quarto GHA Workflow Runner 2024-07-10 15:17:23 +00:00
  • a159724e44 bump trl and accelerate for latest releases (#1730) Wing Lian 2024-07-10 11:15:44 -04:00
  • 5e14e1c6b6 Built site for gh-pages Quarto GHA Workflow Runner 2024-07-05 13:24:57 +00:00
  • b3f680d305 sanity check ranges in freeze.py (#1686) Josh Bleecher Snyder 2024-07-05 06:24:07 -07:00
  • c13f4b1d7a Built site for gh-pages Quarto GHA Workflow Runner 2024-07-05 13:17:05 +00:00
  • c69b7eb2b5 full weights fsdp training seems broken with fsdp_cpu_ram_efficient_loading, disabling for now (#1726) Wing Lian 2024-07-05 09:15:36 -04:00
  • 1ad995ba47 Built site for gh-pages Quarto GHA Workflow Runner 2024-07-02 17:18:43 +00:00
  • c6d83a87c4 add support for .env files for env vars (#1724) Wing Lian 2024-07-02 13:17:40 -04:00
  • 43c266673d Built site for gh-pages Quarto GHA Workflow Runner 2024-06-29 05:39:41 +00:00
  • 5370cedf0c support for gemma2 w sample packing (#1718) Wing Lian 2024-06-29 01:38:55 -04:00
  • f2480a1d91 improve Pre-Tokenized Dataset docs (#1684) [skip ci] Josh Bleecher Snyder 2024-06-26 13:13:21 -07:00
  • 469e15607d basic llama multipack llama-multipack Wing Lian 2024-06-20 14:39:55 -04:00
  • c1008cc28d Built site for gh-pages Quarto GHA Workflow Runner 2024-06-20 14:06:12 +00:00
  • 559562d790 Allow "weight: 0" in messages to mask them (#1703) DavidFarago 2024-06-20 16:05:16 +02:00
  • d95cdf3c1c Built site for gh-pages Quarto GHA Workflow Runner 2024-06-20 14:03:48 +00:00
  • 4de4b4089f add support for multipack for deepseek_v2 (#1712) Wing Lian 2024-06-20 10:02:55 -04:00
  • 79e65bbe81 Built site for gh-pages Quarto GHA Workflow Runner 2024-06-19 03:33:26 +00:00
  • 3f1f5e3312 drop length column for issues with eval without packing (#1711) Wing Lian 2024-06-18 23:32:29 -04:00