Commit Graph

  • d90463fff7 Built site for gh-pages Quarto GHA Workflow Runner 2025-05-22 12:22:23 +00:00
  • aa0492c366 feat: do not find turn indices if turn is not trainable (#2696) NanoCode012 2025-05-22 19:19:59 +07:00
  • 798b5f5cfd fix(RL): address plugin rl overwriting trainer_cls (#2697) [skip ci] NanoCode012 2025-05-22 19:19:12 +07:00
  • 1c83a1a020 feat(doc): clarify minimum pytorch and cuda to use blackwell (#2704) [skip ci] NanoCode012 2025-05-22 19:18:27 +07:00
  • c71c6fe545 Built site for gh-pages Quarto GHA Workflow Runner 2025-05-21 15:22:54 +00:00
  • 6aa41740df SP dataloader patching + removing custom sampler / dataloader logic (#2686) Dan Saunders 2025-05-21 11:20:20 -04:00
  • f9e5e22e6b User-agent on CI snapshot download axolotl-ci-hf Wing Lian 2025-05-20 08:52:33 -07:00
  • 348409c2ff fix: num_items_in_batch wrong type in kd trainer loss fix/kd-trainer-num-items NanoCode012 2025-05-20 16:56:24 +07:00
  • 9bdf4b1c23 improve handling and error if fa3 requested but not installeD fa3-hopper Wing Lian 2025-05-19 10:11:14 -07:00
  • d6f64a3684 handle args to drop dropout Wing Lian 2025-05-18 13:11:56 -07:00
  • 0735454782 move fa3 tests to multigpu since we only run those on hopper Wing Lian 2025-05-18 10:08:08 -07:00
  • bb6464c4c6 use get_device_capability since CI setting in cfg is unreliable Wing Lian 2025-05-18 10:04:25 -07:00
  • 323a9cb153 handle return sig change for fa3 Wing Lian 2025-05-18 08:28:52 -07:00
  • b22150751f check for fa first Wing Lian 2025-05-18 07:04:48 -07:00
  • 8c4bc59bfc fa3 doesn't support dropout_p, fix unpatching Wing Lian 2025-05-18 06:26:08 -07:00
  • a064f1c9b4 ci for fa3 Wing Lian 2025-05-18 00:49:15 -07:00
  • fb5ef6d445 use updated package name for fa3 Wing Lian 2025-05-18 00:40:56 -07:00
  • 34b68ddaae curl with apt instead of pip Wing Lian 2025-05-17 23:58:13 -07:00
  • 9a3d0c919b make sure curl is installed Wing Lian 2025-05-17 20:41:54 -04:00
  • bd34d0b861 install for hopper from pre-built wheel Wing Lian 2025-05-17 09:49:26 -04:00
  • 37220ab90a install pybind11 for fa3 build Wing Lian 2025-05-17 08:26:31 -04:00
  • e1b74d710b update docker args to minimums used and use MAX_JOBS already set as arg Wing Lian 2025-05-17 08:12:25 -04:00
  • 79daf5b934 reduce max jobs for build of fa3 Wing Lian 2025-05-17 06:11:00 -04:00
  • ddd7c55576 build hopper w fa3 on torch 2.6 Wing Lian 2025-05-16 17:29:57 -04:00
  • 65c6c98a76 whitespace fix in dockerfile Wing Lian 2025-05-16 16:40:44 -04:00
  • 4ef2e8293f fix the bash in docker base Wing Lian 2025-05-16 15:45:52 -04:00
  • c126d5cd04 fix suffix for tag Wing Lian 2025-05-16 14:56:16 -04:00
  • 9b0be4f15c fix 12.8 image and add flash-attn v3 hopper base image Wing Lian 2025-05-16 14:54:24 -04:00
  • ccf6259c1b Built site for gh-pages Quarto GHA Workflow Runner 2025-05-16 19:49:24 +00:00
  • a27b909c5c GRPO fixes (peft) (#2676) Wing Lian 2025-05-16 15:47:03 -04:00
  • 6cb07b9d12 Fix for setting adam_beta3 and adam_epsilon2 for CAME Optimizer (#2654) [skip ci] xzuyn 2025-05-16 15:46:50 -04:00
  • 288653adb6 Fix: Make MLflow config artifact logging respect hf_mlflow_log_artifa… (#2675) [skip ci] C080 2025-05-16 21:46:31 +02:00
  • c25990fd4f additional RL trainers SP support rl-trainers-sp Dan Saunders 2025-05-14 02:09:20 +00:00
  • 16437d5e7b Built site for gh-pages Quarto GHA Workflow Runner 2025-05-16 17:10:05 +00:00
  • 3a5b495a74 Fix: improve doc on merge/inference cli visibility (#2674) NanoCode012 2025-05-17 00:07:40 +07:00
  • f661858fc4 Print dataset name (#2668) [skip ci] xzuyn 2025-05-16 13:06:58 -04:00
  • c837c4a424 Add missing init file to liger plugin (#2670) [skip ci] Eric Meier 2025-05-16 10:06:46 -07:00
  • c9797de6bb Add num_proc to fix data set slow processing issue (#2681) [skip ci] michelyang 2025-05-16 10:06:20 -07:00
  • 8f8a7afb05 Add ci and images for CUDA 12.8 for B200s (#2683) [skip ci] Wing Lian 2025-05-16 13:06:08 -04:00
  • 86472715da fix: remove doc string imports in monkeypatches (#2671) [skip ci] NanoCode012 2025-05-17 00:05:55 +07:00
  • fe12aa79c8 jagged lr restart scheduler jagged-restart-lr-scheduler-v3 Wing Lian 2025-05-15 20:20:59 -04:00
  • 459f407e69 avoid crash/oom on train end wait-distributed-close Wing Lian 2025-05-15 15:53:35 -04:00
  • e23a5c9fda 📝 Add docstrings to 775-option-to-drop-vs-truncate-on-rows-longer-than-context-length coderabbitai/docstrings/QVUilv72ojQNaYsCLVNpUpfo2rK1ZU5x90oPNXYz0ZfsWzWSHca36pjgaU5JOtZOA4gNjbjVYxShdRmkm7fGSlW coderabbitai[bot] 2025-05-15 11:02:45 +00:00
  • 5d7a61576d Refactor sequence length overflow handling in pretraining module mhenrhcsen 2025-05-15 12:55:09 +02:00
  • 5ecf22b54e Merge branch 'main' of github.com:axolotl-ai-cloud/axolotl into 775-option-to-drop-vs-truncate-on-rows-longer-than-context-length mhenrhcsen 2025-05-14 13:36:43 +02:00
  • 9c5b8da22f fix merge conflicts mhenrhcsen 2025-05-14 13:33:42 +02:00
  • 22684ec98f feat: add draft wizard cli feat/wizard NanoCode012 2025-05-13 20:40:52 +07:00
  • 6db60ac520 fix: add missing config to schema NanoCode012 2025-05-13 14:24:33 +07:00
  • 54bbc9bb72 set v0.9.2 version for tag v0.9.2 release-v0.9.x Wing Lian 2025-05-13 17:52:33 -04:00
  • 5aefebe1fe Activation checkpointing with offloading to disk with prefetch (#2663) Wing Lian 2025-05-13 16:39:39 -04:00
  • 5a36b6ff2d Atropos support (#2666) [skip ci] Wing Lian 2025-05-13 08:30:58 -04:00
  • 224da88fa2 fix: disable auto lora kernel if dropout nonzero (#2655) [skip ci] NanoCode012 2025-05-13 03:23:53 +07:00
  • 493eb8e5c6 update doc and use P2P=LOC for brittle grpo test (#2649) Wing Lian 2025-05-12 14:17:25 -04:00
  • 4780ac7c4d guard on deleting secrets from env (#2653) [skip ci] Wing Lian 2025-05-12 14:18:42 -04:00
  • cf69de2eb9 Various fixes for CI, save_only_model for RL, prevent packing multiprocessing deadlocks (#2661) Wing Lian 2025-05-12 10:51:18 -04:00
  • 71128160d5 Built site for gh-pages Quarto GHA Workflow Runner 2025-05-13 20:42:00 +00:00
  • c0a0c7534c Activation checkpointing with offloading to disk with prefetch (#2663) Wing Lian 2025-05-13 16:39:39 -04:00
  • 7fa1089cea Atropos support (#2666) [skip ci] Wing Lian 2025-05-13 08:30:58 -04:00
  • fea6649518 increased test coverage mhenrhcsen 2025-05-13 08:58:34 +02:00
  • 124ad2b968 lint mhenrhcsen 2025-05-13 08:35:16 +02:00
  • 2350a33417 Built site for gh-pages Quarto GHA Workflow Runner 2025-05-12 21:54:54 +00:00
  • 80304c26a7 SP GRPO support + batch SP fixes (#2643) Dan Saunders 2025-05-12 17:52:40 -04:00
  • 767c2340f1 docstring for tests mhenrhcsen 2025-05-12 22:57:43 +02:00
  • f6623c34cc Linting fix mhenrhcsen 2025-05-12 22:53:30 +02:00
  • 5dd8f0b2b8 Fixes comments from winglian mhenrhcsen 2025-05-12 22:43:15 +02:00
  • 103edc7211 refactor build() into smaller fns model-loader-refactor Dan Saunders 2025-05-12 20:36:52 +00:00
  • 67c4ea9c7c fix: disable auto lora kernel if dropout nonzero (#2655) [skip ci] NanoCode012 2025-05-13 03:23:53 +07:00
  • 411422098c Built site for gh-pages Quarto GHA Workflow Runner 2025-05-12 18:19:33 +00:00
  • 526ddb886d guard on deleting secrets from env (#2653) [skip ci] Wing Lian 2025-05-12 14:18:42 -04:00
  • f34eef546a update doc and use P2P=LOC for brittle grpo test (#2649) Wing Lian 2025-05-12 14:17:25 -04:00
  • d0d0ebd77c Built site for gh-pages Quarto GHA Workflow Runner 2025-05-12 14:53:25 +00:00
  • c7b6790614 Various fixes for CI, save_only_model for RL, prevent packing multiprocessing deadlocks (#2661) Wing Lian 2025-05-12 10:51:18 -04:00
  • be3c6bbd85 fix linting issues mhenrhcsen 2025-05-12 14:46:57 +02:00
  • f07db4f853 Refactor truncation logic in drop_long_rl_seq function mhenrhcsen 2025-05-12 14:40:10 +02:00
  • 17a5838d38 lint mhenrhcsen 2025-05-12 14:36:43 +02:00
  • 9f68918f13 Implement configurable handling of excess tokens in datasets mhenrhcsen 2025-05-12 14:08:43 +02:00
  • 6100baea0d offload activations to disk instead of CPU RAM offload-activations-disk Wing Lian 2025-05-11 14:19:49 -04:00
  • 27e3329273 .post1 version release for multipack fix v0.9.1.post1 Wing Lian 2025-05-09 21:54:04 -04:00
  • 27fec49083 don't sort multipack sampler (#2657) Dan Saunders 2025-05-09 20:28:58 -04:00
  • 9068e03109 Built site for gh-pages Quarto GHA Workflow Runner 2025-05-10 00:31:14 +00:00
  • 47e0e71bc8 don't sort multipack sampler (#2657) Dan Saunders 2025-05-09 20:28:58 -04:00
  • e910e3e164 Revert "Multipack parallel bin packing (#2631)" revert-multipack-changes Dan Saunders 2025-05-09 17:33:31 +00:00
  • 5e50d1e8f0 batch flattening with xformers too xformers-wo-packing Wing Lian 2025-05-08 18:23:25 -04:00
  • 7fb01f0461 also support xformers w/o packing Wing Lian 2025-05-08 15:22:48 -04:00
  • 2b59546c3e Create CNAME NanoCode012 2025-05-08 09:23:43 +07:00
  • 6cd75c840c Delete CNAME NanoCode012 2025-05-08 09:23:38 +07:00
  • 8cda9e93c1 set version for v0.9.1 v0.9.1 Wing Lian 2025-05-07 11:27:35 -04:00
  • 17d715c2b3 swap tinymodels that have safetensors for some ci tests (#2641) Wing Lian 2025-05-07 15:06:07 -04:00
  • f943306263 Add CAME Optimizer (#2385) xzuyn 2025-05-07 10:31:46 -04:00
  • 3c8b9b33d6 fix(doc): clarify instruction to delinearize llama4 similar to cli doc (#2644) [skip ci] NanoCode012 2025-05-07 21:29:47 +07:00
  • 8b0c2a71ad Fix: improve error message on failed dataset load (#2637) [skip ci] NanoCode012 2025-05-07 21:29:05 +07:00
  • 493910559a Configurable embeddings upcast (#2621) Wing Lian 2025-05-06 23:40:44 -04:00
  • c54534dbfa Fix cut_cross_entropy plugin install (#2642) [skip ci] Eric Meier 2025-05-06 19:56:00 -07:00
  • cae5cebb59 xformers attention with packing (#2619) Wing Lian 2025-05-06 22:49:22 -04:00
  • fcbd7477d0 Multipack parallel bin packing (#2631) Wing Lian 2025-05-06 20:08:08 -04:00
  • 038db85a40 allow plugins to return their own dataset (#2617) [skip ci] Wing Lian 2025-05-06 20:05:51 -04:00
  • 680dcc5a4d feat(doc): add split_thinking docs (#2613) [skip ci] NanoCode012 2025-05-07 07:05:32 +07:00
  • fed5ca8254 bump liger dep to 0.5.9 (#2640) [skip ci] Wing Lian 2025-05-06 20:05:19 -04:00
  • 7a2d017c88 Update lr_scheduler options in config.qmd to include additional scheduling strategies for improved training flexibility. (#2636) [skip ci] mhenrichsen 2025-05-06 17:24:07 +02:00
  • 8c0303aa5e Print axolotl art if train is called outside of cli: (#2627) [skip ci] Wing Lian 2025-05-06 11:18:45 -04:00