Commit Graph

  • 5d61169f7c fix dpo eval override to call grandparent instead of the broken super (#2628) [skip ci] Wing Lian 2025-05-06 11:18:25 -04:00
  • e1586f7919 make sure gc_steps is used for all trainers (#2638) Wing Lian 2025-05-06 11:18:00 -04:00
  • e4bf3ffb17 repop cache (#2639) Wing Lian 2025-05-06 11:09:07 -04:00
  • 30150fe1e1 Adds example for training a TTS model on top of a LLM. (#2614) mhenrichsen 2025-05-06 10:11:06 +02:00
  • 7f7d7ade2e Fix logging deprecation warnings (#2623) Emmanuel Ferdman 2025-05-04 15:22:45 +03:00
  • 776cf70fe4 include multipack support for qwen3 family (#2622) Wing Lian 2025-05-03 12:02:39 -04:00
  • 8730951aba setup hf transfer too and fix auto bf16 when fp16 enabled (#2620) [skip ci] Wing Lian 2025-05-03 12:02:26 -04:00
  • e72c11ad55 qwen3 and qwen3_moe support for liger kernels (#2612) Wing Lian 2025-05-02 09:29:55 -04:00
  • 1a7978b960 remove keys to incoporate changes for the trl update (#2616) aitechguy 2025-05-02 20:47:42 +08:00
  • 60b0d14f1d automatically set pad_to_sequence_len when use packing (#2607) Wing Lian 2025-05-01 13:24:38 -04:00
  • a7a40378f5 fix: run preview-docs only when md/qmd changes (#2606) NanoCode012 2025-05-02 00:21:28 +07:00
  • b50d35bec9 Logging config for colab (#2611) Wing Lian 2025-05-01 12:58:00 -04:00
  • bc6dfa6899 add missing __init__ for lr monkeypatch fix (#2609) Wing Lian 2025-05-01 09:41:32 -04:00
  • 9d6e8af622 Add num_completions_to_print for trl and grpo (#2604) Dhruv Mullick 2025-04-30 19:00:30 -06:00
  • 17b441248c use latest hf-xet and don't install vllm for torch 2.7.0 (#2603) Wing Lian 2025-04-30 18:27:39 -04:00
  • d49a4268b8 additional args for grpo config/trainer (#2598) Wing Lian 2025-04-30 13:11:12 -04:00
  • 1d6e931115 replace zero_only with simpler if statement (#2592) Wing Lian 2025-04-30 13:11:03 -04:00
  • ff106ace44 ensure we pass axolotl extras to the Dockerfile so vllm is included in shipped images (#2599) Wing Lian 2025-04-30 11:35:45 -04:00
  • 24907533d1 don't automatically enable lora kernels for RL training (#2600) Wing Lian 2025-04-30 11:06:50 -04:00
  • 0e9d816d2e only import vllm serve cli if its being called (#2597) [skip ci] Wing Lian 2025-04-30 09:11:25 -04:00
  • 72f142186a Handle other reasoning trace dataset formats (#2591) Wing Lian 2025-04-30 03:32:55 -04:00
  • 87726322bf upload the deepspeed json to wandb (#2593) [skip ci] Wing Lian 2025-04-30 03:32:44 -04:00
  • ae8ae7534c feat: add qwen3 moe block for ds3 (#2596) [skip ci] NanoCode012 2025-04-30 14:32:23 +07:00
  • ee00142cb5 patch to convert LR from tensor to float when using DS (#2595) [skip ci] Wing Lian 2025-04-30 03:31:57 -04:00
  • 097e7e3b5b Plugins create_lr_scheduler support (#2584) Aleksandr Dremov 2025-04-29 23:08:30 +02:00
  • c714958181 auto-enable lora kernels where possible (#2589) Dan Saunders 2025-04-29 16:18:49 -04:00
  • 4402c293dc fix(doc): key used to point to url in multimodal doc (#2575) [skip ci] NanoCode012 2025-04-30 02:10:59 +07:00
  • 0d71f787a3 bump vllm==0.8.5 for qwen3 support (#2583) [skip ci] Wing Lian 2025-04-29 15:10:40 -04:00
  • c337ca0872 support for qwen3 with lora kernels (#2588) Wing Lian 2025-04-29 15:02:49 -04:00
  • f04f7cf5ad Fix eval + add smoke test (#2586) Dan Saunders 2025-04-29 12:58:54 -04:00
  • c64a951bc9 set config on the PluginManager for callback access (#2587) Wing Lian 2025-04-29 12:05:44 -04:00
  • fc88cc56cb Post release fixes (#2581) Wing Lian 2025-04-29 10:01:38 -04:00
  • e85cbb8645 remove torch 2.4.1 CI as part of support deprecation (#2582) Wing Lian 2025-04-29 08:28:32 -04:00
  • 0c059d7c77 Built site for gh-pages Quarto GHA Workflow Runner 2025-05-07 19:08:22 +00:00
  • 0f3587174d swap tinymodels that have safetensors for some ci tests (#2641) Wing Lian 2025-05-07 15:06:07 -04:00
  • ddad3501e4 Built site for gh-pages Quarto GHA Workflow Runner 2025-05-07 14:34:01 +00:00
  • 25e6c5f9bd Add CAME Optimizer (#2385) xzuyn 2025-05-07 10:31:46 -04:00
  • 32f51bca35 fix(doc): clarify instruction to delinearize llama4 similar to cli doc (#2644) [skip ci] NanoCode012 2025-05-07 21:29:47 +07:00
  • 9daa04da90 Fix: improve error message on failed dataset load (#2637) [skip ci] NanoCode012 2025-05-07 21:29:05 +07:00
  • ef883b6960 chore: refactor normalize_attn to use mapping and loop attention_enum NanoCode012 2025-05-07 17:07:08 +07:00
  • d0c4930dd5 fix: set replit mpt model to use eager attention NanoCode012 2025-05-07 16:55:01 +07:00
  • 6ee7cb30fa fixes from PR feedback Wing Lian 2025-04-27 20:11:11 -04:00
  • ba47adc24b replace attention in the yaml config with an enum Wing Lian 2025-04-04 23:37:30 -04:00
  • f87bc9ac3e Built site for gh-pages Quarto GHA Workflow Runner 2025-05-07 03:42:57 +00:00
  • 0d71b0aa5f Configurable embeddings upcast (#2621) Wing Lian 2025-05-06 23:40:44 -04:00
  • 63aaccf85b Fix cut_cross_entropy plugin install (#2642) [skip ci] Eric Meier 2025-05-06 19:56:00 -07:00
  • 1c751ca478 Built site for gh-pages Quarto GHA Workflow Runner 2025-05-07 02:51:31 +00:00
  • ff0fe767c8 xformers attention with packing (#2619) Wing Lian 2025-05-06 22:49:22 -04:00
  • 299b5a3e5e Built site for gh-pages Quarto GHA Workflow Runner 2025-05-07 00:10:22 +00:00
  • 8e4158cc0b Multipack parallel bin packing (#2631) Wing Lian 2025-05-06 20:08:08 -04:00
  • cd84325253 allow plugins to return their own dataset (#2617) [skip ci] Wing Lian 2025-05-06 20:05:51 -04:00
  • 0b140fef83 feat(doc): add split_thinking docs (#2613) [skip ci] NanoCode012 2025-05-07 07:05:32 +07:00
  • e4cfebe995 bump liger dep to 0.5.9 (#2640) [skip ci] Wing Lian 2025-05-06 20:05:19 -04:00
  • d790371b64 bump peft to 3.5.1 datasets-351 Wing Lian 2025-05-04 17:14:15 -04:00
  • a6cac5dd32 Update lr_scheduler options in config.qmd to include additional scheduling strategies for improved training flexibility. (#2636) [skip ci] mhenrichsen 2025-05-06 17:24:07 +02:00
  • 29394ec9f8 Built site for gh-pages Quarto GHA Workflow Runner 2025-05-06 15:20:09 +00:00
  • b71c0e3447 Print axolotl art if train is called outside of cli: (#2627) [skip ci] Wing Lian 2025-05-06 11:18:45 -04:00
  • ddaebf8309 fix dpo eval override to call grandparent instead of the broken super (#2628) [skip ci] Wing Lian 2025-05-06 11:18:25 -04:00
  • 679743087a make sure gc_steps is used for all trainers (#2638) Wing Lian 2025-05-06 11:18:00 -04:00
  • f222926ba1 Built site for gh-pages Quarto GHA Workflow Runner 2025-05-06 15:11:15 +00:00
  • f720b6e72d repop cache (#2639) Wing Lian 2025-05-06 11:09:07 -04:00
  • 373fd8c9e8 Built site for gh-pages Quarto GHA Workflow Runner 2025-05-06 08:13:23 +00:00
  • a980618fd0 Adds example for training a TTS model on top of a LLM. (#2614) mhenrichsen 2025-05-06 10:11:06 +02:00
  • 7610a02881 fix ops activations Wing Lian 2025-05-06 01:00:02 -04:00
  • b0cd54bcb9 WIP for activation offloading using streams and custom policy fn for checkpointing Wing Lian 2025-05-06 00:39:21 -04:00
  • 1a229b0901 add colab callback to fix inference post train colab-misc-fixes Wing Lian 2025-05-05 16:39:33 -04:00
  • 985ee95f2d use uint8 dtype for qlora colab-misc-fixes-test Wing Lian 2025-05-05 08:27:34 -04:00
  • 72ece3dadf support for configurable group and bin size for sample packing Wing Lian 2025-05-05 06:57:21 -04:00
  • 2e74e1d289 fix xformers inference Wing Lian 2025-05-05 06:20:06 -04:00
  • 4f478083e7 fix batch size setter Wing Lian 2025-05-05 03:49:01 -04:00
  • 82453bab7e handle xformers patch for inference too Wing Lian 2025-05-05 03:22:02 -04:00
  • 5b2bd75aba parallel bin packing fix error with lambda and pickling Wing Lian 2025-05-04 18:12:09 -04:00
  • 03508c6816 improve readability of multipack sampler Wing Lian 2025-05-04 18:02:17 -04:00
  • 48b3e14a24 Print axolotl art if train is called outside of cli: Wing Lian 2025-05-04 17:03:48 -04:00
  • 21652d070c Built site for gh-pages Quarto GHA Workflow Runner 2025-05-04 12:25:01 +00:00
  • 54960d4de0 Fix logging deprecation warnings (#2623) Emmanuel Ferdman 2025-05-04 15:22:45 +03:00
  • 544b1212d8 use relative import Wing Lian 2025-05-04 07:35:29 -04:00
  • 695fc2f802 missing __init__ Wing Lian 2025-05-04 07:31:01 -04:00
  • c7f38ba96b fix seq lens calc to drop hanging sequences Wing Lian 2025-05-03 21:12:25 -04:00
  • 372fd08548 fix fp16 / bf16 reset when using fp16 with bf16 auto Wing Lian 2025-05-03 18:34:39 -04:00
  • 52cab2aa5b refactor so we can add test Wing Lian 2025-05-03 21:47:45 -04:00
  • bed8f354a5 reorder the packing check Wing Lian 2025-05-03 15:38:29 -04:00
  • f301a165c3 fix xformers + packing validation Wing Lian 2025-05-03 14:24:38 -04:00
  • 2b3a09aeae wire up the patch Wing Lian 2025-05-03 14:19:37 -04:00
  • 648780de51 xformers attention with packing Wing Lian 2025-05-03 02:06:17 -04:00
  • ecc2388274 chunked cross entropy loss Wing Lian 2025-05-03 13:23:02 -04:00
  • 65d6453b39 Built site for gh-pages Quarto GHA Workflow Runner 2025-05-03 16:04:45 +00:00
  • ebf724a9d9 fix import Wing Lian 2025-05-03 03:07:29 -04:00
  • 99095573c3 add tabs back to code check Wing Lian 2025-05-03 02:46:50 -04:00
  • 140083a828 patch peft to not upcast everything Wing Lian 2025-05-03 02:32:47 -04:00
  • 37c27aedc1 fsdp embeddings should be float32 per comment Wing Lian 2025-05-03 01:56:09 -04:00
  • ed922796b7 include multipack support for qwen3 family (#2622) Wing Lian 2025-05-03 12:02:39 -04:00
  • 3dd9c3bf3f setup hf transfer too and fix auto bf16 when fp16 enabled (#2620) [skip ci] Wing Lian 2025-05-03 12:02:26 -04:00
  • 3474a9df88 fix(doc): update min torch version fix/vllm-version NanoCode012 2025-05-01 11:36:28 +07:00
  • f6151ce5cb feat: pin vllm to 0.8.5 for all torch NanoCode012 2025-05-01 11:36:17 +07:00
  • 64a47ebb69 Built site for gh-pages Quarto GHA Workflow Runner 2025-05-02 13:32:03 +00:00
  • 0ba7d362fa qwen3 and qwen3_moe support for liger kernels (#2612) Wing Lian 2025-05-02 09:29:55 -04:00
  • a0bfda96d5 Built site for gh-pages Quarto GHA Workflow Runner 2025-05-02 12:49:58 +00:00
  • e4f73bc98e remove keys to incoporate changes for the trl update (#2616) aitechguy 2025-05-02 20:47:42 +08:00
  • fc1900761b fix(trl): remove access to invalid property fix/dpo-labels NanoCode012 2025-05-02 15:41:53 +07:00