Commit Graph

1179 Commits

Author SHA1 Message Date
Wing Lian
81893c775c Accelerate 1.8.1 and BNB 0.46.0 update (#2815)
* update accelerate to v1.8.0

* update bnb also

* fix multigpu ci timeout

* fix test set size

* use latest accelerate 1.8.1

* disable default dtype
2025-06-28 15:29:19 -04:00
Wing Lian
a1a740608d add assertion for packing patch to _get_unpad_data (#2840) 2025-06-27 11:20:23 -04:00
kallewoof
ec15a7a691 Support --lora-on-cpu flag for DPO model merging (#2766) [skip ci]
* Support --lora-on-cpu flag for DPO model merging

* fix: use device=cpu in _convert_embedding_modules_dtype when lora_on_cpu is set
2025-06-27 11:19:24 -04:00
Wing Lian
0a7a216b60 allow for different sequence_len for evaluations (#2836) [skip ci]
* allow for different sequence_len for evaluations

* reversed 🤦

* add more information to filter msg
2025-06-27 11:02:51 -04:00
NanoCode012
d8280d45c1 feat: add chat_template kwargs (#2837) 2025-06-27 10:38:46 -04:00
Wing Lian
24f2887e87 don't fail during preprocess for sampling from iterable dataset (#2825) [skip ci] 2025-06-27 10:37:53 -04:00
Wing Lian
a24957fa04 fix for iterable datasets and pickling (#2831) [skip ci]
* fix for iterable datasets and pickling

* more fixes for pretraining

* can't pickle mock generator dataset
2025-06-27 10:35:23 -04:00
Wing Lian
d8cf66edbd use fork for multiprocess start method for packing in parallel (#2830) 2025-06-25 13:17:33 -04:00
NanoCode012
181cc3106b fix: catch httperror from ratelimiting hf when checking user token (#2827) 2025-06-25 09:50:13 -04:00
NanoCode012
20106116da fix: 'NoneType' object has no attribute 'column_names' (#2822) [skip ci]
* fix: 'NoneType' object has no attribute 'column_names'

* chore: typing
2025-06-25 09:49:55 -04:00
Younes B
a27c4f8771 feat: add falcon-h1 into axolotl (#2811) [skip ci]
* feat: add falcon-h1 into axolotl

* fix pre-commit

* review

* fix: remove packing
2025-06-25 09:49:42 -04:00
NanoCode012
bb1109b81d feat: update CCE to use axolotl's fork (#2813) [skip ci]
* feat: update CCE to use axolotl's fork

* chore: improve error message

* feat: add eot token for gemma3 configs

* fix: only warn on more than 1 image

* fix: re-add gemma3 patch

* Revert "fix: re-add gemma3 patch"

This reverts commit f04db5e873.

* feat: add qwen25 vl example

* feat: point to upstream fork cce package

* feat: update cce commit
2025-06-25 09:49:22 -04:00
Dan Saunders
8c69ec3a1e gating _gather_outputs (causes increased vram usage) (#2829)
* SP vram fix

* gating _gather_outputs (causes increased vram usage)

* reverting unneeded change
2025-06-25 08:33:55 -04:00
Dan Saunders
46675496a3 log config (#2819)
* log config

* moving text art; adding sensitive value redaction + sorting

* revert pre-commit changes

* remove none-valued config before dumping

* just redact api keys
2025-06-24 14:59:30 -04:00
NanoCode012
c6b5d35e5d fix: re-add gemma3 patch (#2817) 2025-06-24 10:51:30 +07:00
Wing Lian
12c826816d chunked cross entropy loss (#2625)
* chunked cross entropy loss

* refactor so we can add test

* use relative import

* update schema description
2025-06-23 23:08:46 -04:00
Dan Saunders
1d8f500709 deepspeed fix (#2820) 2025-06-23 09:07:57 -04:00
Dan Saunders
45adf1bfb9 get_logger use_environ fix (#2808)
* get_logger use_environ fix

* rethinking

* replacing old logger imports

* simplify

* fix boolean cond
2025-06-19 11:16:52 -04:00
Carsten Kragelund Jørgensen
eb3a57eb17 Ignore generation/endgeneration tags when analyzing Jinja chat template (#2787)
* ignore generation/endgeneration tags

Axolotl handles calculating the mask for assistant turns on its own, and as such these tags are not needed, however currently the analyzer does not recognize them at all and throws an error.

* feat: add phi4 tokenizer test and unblock gemma2

* fix: improve template

* chore: refactor

* chore: lint

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>
Co-authored-by: Wing Lian <wing@axolotl.ai>
2025-06-18 15:59:07 -04:00
Wing Lian
34da391391 Set dev version (#2807) [skip ci] 2025-06-18 15:49:05 -04:00
NanoCode012
0bb9077553 Fix: logging on py310 (#2802)
* feat: encourage py311

* fix: logging import on py310

* fix: do upper and simplify handling
2025-06-18 15:46:27 -04:00
Wing Lian
a85efffbef bump transformers==4.52.4 (#2800) [skip ci]
* bump transformers==4.52.4

* don't use hf offline for qwen tokenizer

* increase timeout

* don't use methodtype

* increase timeout

* better assertion logging

* upgrade deepspeed version too
2025-06-18 15:46:14 -04:00
Dan Saunders
9d5bfc127e Config doc autogen (#2718)
* config reference doc autogen

* improvements

* cleanup; still ugly but working

* reformat

* remove autogen config ref from git

* factor out validations

* rewrite

* rewrite

* cleanup

* progress

* progress

* progress

* lint and minifying somewhat

* remove unneeded

* coderabbit

* coderabbit

* update preview-docs workflow triggers

* installing with deps

* coderabbit

* update refs

* overwrote file accidentally
2025-06-18 15:36:53 -04:00
Wing Lian
88c0e8d048 release tag (#2799)
Some checks failed
ci-cd / build-axolotl (<nil>, 124, 12.4.1, 3.11, 2.5.1) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl (vllm, 124, 12.4.1, true, 3.11, 2.6.0) (push) Has been cancelled
publish pypi / Create Release (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, 3.11, 2.5.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, true, 3.11, 2.6.0) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud-no-tmux (<nil>, 124, 12.4.1, 3.11, 2.6.0) (push) Has been cancelled
publish pypi / Upload release to PyPI (push) Has been cancelled
2025-06-17 12:13:27 -04:00
NanoCode012
d8e8cd8558 feat: remove evalfirst callback with built-in trainer arg (#2797) 2025-06-17 12:09:33 -04:00
Wing Lian
ccc94da8ad KD fix w/ online distillation (#2700) [skip ci]
* kd fixes

* fix collator setup

* fix input args

* better handling to drop string fields for kd with raw dataset

* kd trainer has kd temp as part of the init

* drop top_k before softmax

* simplfy and remove zscore

* WIP chunked KD loss with autograd wrapper

* more fixes and liger-type chunked loss

* collator cls for plugins

* remove debugging

* additional plugin collator kwargs, don't scale up kd loss by t^2

* don't need temp arg to distill method

* online kd wip

* add close to comment block

* suport sampling params/max new tokens

* handle when no custom collator is used in plugins

* logsumexp trick:

* fix check

* shift off the first empty token

* fix length of padding

* use max not min

* temp scale kd loss at end

* support for dynamic plugin training args mixins and symmetric kl

* chore: lint

* fix trainer callback base class

* Fix decay

* accept compressed responses for smaller wire payload

* post-rebase lint

* more KD updates

* increase hyperparams_count for gradients for added normalize_topk

* fix to remove attention_mask

* rename vars for consistency

* fix rebase issues

* default to dropping last batch in multipack batch sampler

* improve handling of train len

* init collator_cls_and_kwargs

* explicit drop_last=False when checking for multipack completeness

* use separate v2 loader for kd

* fix kd tests to use subprocess so it picks up kd training args

* default value for kd_beta arg

* use updated dataset for ci

* longer timeout for e2e
2025-06-17 12:09:13 -04:00
NanoCode012
21388cf615 Fix: lora kernel pre-patch applied despite post-patch not applied (#2772)
* fix: do not pre-patch self attention if lora dropout non-zero

* fix: add test to check patch not applied

* fix: test

* fix: test config check

* fix where we check so that tests don't break

* fix: test

---------

Co-authored-by: Wing Lian <wing@axolotl.ai>
2025-06-14 11:54:06 -07:00
NanoCode012
80d5b066ec Fix: adding magistral fsdp config, fixing not eval with test_datasets, handle mllama attention (#2789) [skip ci]
* feat: add fsdp config for magistral

* fix: add mllama self attention handling for lora kernels

* fix: no eval if val_set_size 0 despite having test_datasets

* fix: add note for cce for vlm in newer model
2025-06-14 11:53:43 -07:00
Wing Lian
b2274d430b support for QAT w RL (DPO) (#2776) 2025-06-13 10:00:35 -04:00
NanoCode012
eac4a61f55 Feat: Add Magistral and mistral-common tokenizer support (#2780) 2025-06-12 19:18:33 -04:00
JZacaroli
f5fbc82f2b Fix logging import in evaluate.py (#2782) (#2783)
* Fix logging import in evaluate.py (#2782)

* chore: lint

---------

Co-authored-by: Joe Zacaroli <jaz@cyberscience.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
2025-06-12 13:23:31 -04:00
Wing Lian
468580d18e limit multipack sampler processes (#2771) [skip ci]
* limit to 16 packing processes

* make num_processes properly reflect configured dataset_processes
2025-06-12 13:22:58 -04:00
Dan Saunders
00cda8cc70 Data loader refactor (#2707)
* data loading refactor (wip)

* updates

* progress

* pytest

* pytest fix

* lint

* zero_first -> filelock, more simplifications

* small simplification

* import change

* nit

* lint

* simplify dedup

* couldnt resist

* review comments WIP

* continued wip

* minor changes

* fix; remove contrived test

* further refactor

* set default seed in pydantic config

* lint

* continued simplication

* lint

* renaming and nits

* filelock tests

* fix

* fix

* lint

* remove nullable arg

* remove unnecessary code

* moving dataset save fn to shared module

* remove debug print

* matching var naming

* fn name change

* coderabbit comments

* naming nit

* fix test
2025-06-10 19:53:07 -04:00
NanoCode012
83632f71d8 Feat: add tool calling support via tools column (#2774)
* feat: add tool_calling field support

* fix: add tests
2025-06-09 21:42:05 -07:00
Qingyang Wu
92afa4fa27 Fix the bug of position ids padding (#2739) [skip ci]
* Update batching.py: fix the bug of position ids padding

if position ids is padded with a long sequence of zeros, it will cause flash attention to crash

* use alternate calculation for padding position_ids with a range

---------

Co-authored-by: Wing Lian <wing@axolotl.ai>
2025-06-09 21:26:36 -07:00
Wing Lian
dd660c2ed0 handle when unable to save optimizer state when using ao optimizer with FSDP (#2773) [skip ci]
* handle when unable to save optimizer state when using ao optimizer with FSDP1

* improve messaging

Co-authored-by: salman <salman.mohammadi@outlook.com>

---------

Co-authored-by: salman <salman.mohammadi@outlook.com>
2025-06-09 21:26:14 -07:00
Wing Lian
09c685fd2c fix worker_init_fn signature handling (#2769) 2025-06-08 23:14:10 -07:00
Timofey Klyubin
4440b4a1ce remove unused field for chat_template.default for DPO training (#2755) [skip ci]
* remove unused field for chat_template.default

"messages" field present in final dataset causes issues with DPO
training otherwise

* lint and fix tests for new return value

* remove unused field for chat_template.default

"messages" field present in final dataset causes issues with DPO
training otherwise

lint and fix tests for new return value

fix for updated expected fields for dpo

remove unused field for chat_template.default

"messages" field present in final dataset causes issues with DPO
training otherwise

fix test still expecting "messages" field

* chore: lint

---------

Co-authored-by: Wing Lian <wing@axolotl.ai>
2025-06-05 07:22:58 -07:00
Wing Lian
c67910fa6f bump hf deps (#2735) [skip ci]
* bump hf deps

* upgrade liger-kernel too

* install cce from fork for transformers fix

* fix reference to vocab size in gemma3 patch

* use padding_idx instead of pad_token_id

* remove fixed gemma3 patch

* use updated cce fork

* fix local mllama cce patches w docstring

* add test for multipack with trainer setup and fix trainer for trainer refactor upstream

* bump modal version

* guard for iterable datasetS

* mllama model arch layout changed in latest transformers

* fix batch sampler with drop_last

* fix: address upstream vlm changes for lora

* fix: update references to old lora target path

* fix: remove mllama fa2 patch due to upstream fix

* fix: lora kernel patch path for multimodal models

* fix: removed mllama from quarto

* run test for came optim on 2.6.0+

* fix fsdp2 patch and remove deprecated patch

* make sure to set sequence_parallel_degree for grpo

* Add SP test for GRPO

* add sp to grpo config for trainer

* use reward_funcs as kwarg to grpo trainer

* fix the comprehension for reward funcs

* reward funcs already passed in as args

* init sp_group right before training

* fix check for adding models to SP context

* make sure to pass args to super

* upgrade deepspeed

* use updated trl and add reasoning flags for vllm

* patch the worker

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>
2025-06-05 07:20:33 -07:00
NanoCode012
787880215b fix(deepspeed): deepspeed config not being set for z3 (#2754)
* fix(deepspeed): deepspeed config not being set for z3

* fix: comments
2025-06-03 14:27:09 -07:00
NanoCode012
4b1a29c694 feat(modal): update docker tag to use torch2.6 from torch2.5 (#2749) [skip ci] 2025-06-03 14:26:07 -07:00
NanoCode012
d7fa60662e feat: add chat_template kwargs (#2694) [skip ci] 2025-06-03 14:25:26 -07:00
Dan Saunders
1d91d905c9 remove deprecated wandb env var (#2751)
* remove deprecated wandb env var

* remove os.environ wandb setting; unused loggers

* remove os.environ wandb setting; unused loggers
2025-06-03 14:04:15 -07:00
github-actions[bot]
94219f6ee8 chore: update pre-commit hooks (#2745)
* chore: update pre-commit hooks

* trigger linter when pre commit hooks are updated

* fix type checks from upgraded pre-commit

---------

Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
2025-06-02 15:54:29 -07:00
NanoCode012
d5d0dc5938 fix: suppress non-axolotl logs unless it's warning or higher (#2724)
* fix: increase log level for root loggers and axolotl's

* fix: BasePlugin using wrong logger

* fix: update logger to take name from module

* feat: change logger class to AxolotlLogger to filter non-axolotl infos or below

* fix: change behavior to not disable existing loggers

* fix: update logging to respect correct env

* chore: fix comment

* fix: suppress accelerate log to LOG_LEVEL if not set

---------

Co-authored-by: salman <salman.mohammadi@outlook.com>
2025-05-31 12:13:43 +07:00
NanoCode012
5e86c35322 fix(log): remove duplicate merge_lora param (#2742) [skip ci] 2025-05-31 12:13:31 +07:00
NanoCode012
6778856804 Fix: RL base feature parity (#2133)
* feat: add num_proc and load from cache for rl mapping

* fix: refactor sft and rl trainer to set same base args

* feat: add report_to to set run name

* fix: consolidate handling of fp16, bf16, tf32 kwarg

* chore: consolidate eval_strat, loraplus, lr sched, max_length

* fix: deprecate old types

* fix: adding missing Any

* fix: max_steps incorrectly set

* fix: remove unnecessary datacollator kwarg insert and pop

* fix: update default max_steps

* fix: add missing weight_decay handling

* fix: ignore max_length for grpo

* feat: update CI on trainer_builder

* fix: comments

* improve handling of warmup/logging steps

* use transformers default for logging steps, not None

* fix: remove redundant override

* fix: lint

* feat: allow custom optim for rl methods

* fix: duplicate optim setting

* fix(test): set sequence_parallel_degree default in base cfg

* feat: add handling for seed and SP/ring-attn config

* chore: add back return typing from rebase

* fix(test): use RLType directly to skip needing to validate

* feat: split training builder into sub modules

* fix: remove deprecated clause

* chore: add missing config to doc

* fix: update quarto autodoc

* fix: import path for trainer builder and submodules

* fix: remove redundant configs from rebase mistake

* chore: simplify dynamo check

* fix: optimizer_cls_and_kwargs to be passed into trainer_kwargs

* fix: add missing rex from rebase

* fix: move pop optimizer_cls_and_kwargs

* fix: pop optimizer cls in rl too

* fix: leftover bug from rebase

* fix: update handling of trainer_cls in RL

* fix: address pr feedback

* feat: call hook_pre_create_trainer for rl

* chore: lint

* fix: return notimplemented for ppo

* feat: moved torch compile to base and refactor collator setting

* chore: remove unused importlib.util import

* fix: optimizer cls not being popped

* feat: move epoch setting to base

* fix: catch unhandled custom optimizer

* fix: remove duplicate lora plus setting

* chore: refactor if condition

* chore: refactor set_base_training_args into smaller modules

* fix: address TrainerBuilderBase class variables to instance var

* fix: add handling for beta3 and episilon2

* fix: change to pass dict via arg instead of updating dict

* chore: simplify if condition

* fix: force access to lr & weight decay in case not provided to early error

* fix: remove log sweep

* chore: refactor if condition

* fix: address renamed cfg

* fix: improve handling of cosine hyp

* fix: remove unused params

* chore: refactor

* chore: clarify doc safetensors

* fix: update import path to be unified following comments

* fix: duplicate kwargs passed

* feat: return separate trainer_kwargs

* chore: refactor

* chore: refactor based on comments

* chore: refactor based on comments

* fix: move gpustats callback to base

* chore: create trainer_cls_args first based on comments

* fix: ipo label smoothing passed incorrectly

* feat: add optimizer parity for RL methods with test

* feat: add parity for optimizer in RM/PRM and add test

* fix: remove redundant function override for orpo/cpo batch metrics

* fix: improve handling of dpo_label_smoothing and merge issue

* fix: test fixture returning wrong field

* fix: address avoid direct modify fixture

* chore: minor refactor

* Revert "chore: refactor"

This reverts commit 99c8859eb0.

* feat: rename trainer_builder to builders

---------

Co-authored-by: Wing Lian <wing@axolotl.ai>
2025-05-30 11:21:47 +07:00
Wing Lian
ec4ebfd997 Add a few items to faq (#2734)
* Add a few items to faq

* formatting

* chore: lint
2025-05-28 16:20:19 -04:00
Dan Saunders
bde8b5b6bd fix dist state init before deepspeed setup (#2737) 2025-05-28 14:59:57 -04:00
Dan Saunders
2962a398b7 Lora kernels fix (#2732)
* fix lora kernel patching and improve test

* simplification
2025-05-28 10:03:43 -04:00