Wing Lian
a85efffbef
bump transformers==4.52.4 ( #2800 ) [skip ci]
...
* bump transformers==4.52.4
* don't use hf offline for qwen tokenizer
* increase timeout
* don't use methodtype
* increase timeout
* better assertion logging
* upgrade deepspeed version too
2025-06-18 15:46:14 -04:00
Dan Saunders
06a648263b
Config doc autogen: follow-up fix docs build ( #2806 )
...
* config reference doc autogen
* improvements
* cleanup; still ugly but working
* reformat
* remove autogen config ref from git
* factor out validations
* rewrite
* rewrite
* cleanup
* progress
* progress
* progress
* lint and minifying somewhat
* remove unneeded
* coderabbit
* coderabbit
* update preview-docs workflow triggers
* installing with deps
* coderabbit
* update refs
* overwrote file accidentally
* docs install deps
2025-06-18 15:42:54 -04:00
Dan Saunders
9d5bfc127e
Config doc autogen ( #2718 )
...
* config reference doc autogen
* improvements
* cleanup; still ugly but working
* reformat
* remove autogen config ref from git
* factor out validations
* rewrite
* rewrite
* cleanup
* progress
* progress
* progress
* lint and minifying somewhat
* remove unneeded
* coderabbit
* coderabbit
* update preview-docs workflow triggers
* installing with deps
* coderabbit
* update refs
* overwrote file accidentally
2025-06-18 15:36:53 -04:00
Wing Lian
da8f6c32b9
update favicon ( #2801 )
...
* update favicon
* correct size favicon
2025-06-17 18:09:24 -04:00
Wing Lian
88c0e8d048
release tag ( #2799 )
ci-cd / build-axolotl (<nil>, 124, 12.4.1, 3.11, 2.5.1) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl (vllm, 124, 12.4.1, true, 3.11, 2.6.0) (push) Has been cancelled
publish pypi / Create Release (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, 3.11, 2.5.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, true, 3.11, 2.6.0) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud-no-tmux (<nil>, 124, 12.4.1, 3.11, 2.6.0) (push) Has been cancelled
publish pypi / Upload release to PyPI (push) Has been cancelled
v0.10.0
2025-06-17 12:13:27 -04:00
NanoCode012
d8e8cd8558
feat: remove evalfirst callback with built-in trainer arg ( #2797 )
2025-06-17 12:09:33 -04:00
Wing Lian
ccc94da8ad
KD fix w/ online distillation ( #2700 ) [skip ci]
...
* kd fixes
* fix collator setup
* fix input args
* better handling to drop string fields for kd with raw dataset
* kd trainer has kd temp as part of the init
* drop top_k before softmax
* simplfy and remove zscore
* WIP chunked KD loss with autograd wrapper
* more fixes and liger-type chunked loss
* collator cls for plugins
* remove debugging
* additional plugin collator kwargs, don't scale up kd loss by t^2
* don't need temp arg to distill method
* online kd wip
* add close to comment block
* suport sampling params/max new tokens
* handle when no custom collator is used in plugins
* logsumexp trick:
* fix check
* shift off the first empty token
* fix length of padding
* use max not min
* temp scale kd loss at end
* support for dynamic plugin training args mixins and symmetric kl
* chore: lint
* fix trainer callback base class
* Fix decay
* accept compressed responses for smaller wire payload
* post-rebase lint
* more KD updates
* increase hyperparams_count for gradients for added normalize_topk
* fix to remove attention_mask
* rename vars for consistency
* fix rebase issues
* default to dropping last batch in multipack batch sampler
* improve handling of train len
* init collator_cls_and_kwargs
* explicit drop_last=False when checking for multipack completeness
* use separate v2 loader for kd
* fix kd tests to use subprocess so it picks up kd training args
* default value for kd_beta arg
* use updated dataset for ci
* longer timeout for e2e
2025-06-17 12:09:13 -04:00
Matt Cummins
ba62aa65ee
fixed the lora_target_modules syntax ( #2793 )
2025-06-15 16:47:02 -04:00
NanoCode012
21388cf615
Fix: lora kernel pre-patch applied despite post-patch not applied ( #2772 )
...
* fix: do not pre-patch self attention if lora dropout non-zero
* fix: add test to check patch not applied
* fix: test
* fix: test config check
* fix where we check so that tests don't break
* fix: test
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-06-14 11:54:06 -07:00
NanoCode012
80d5b066ec
Fix: adding magistral fsdp config, fixing not eval with test_datasets, handle mllama attention ( #2789 ) [skip ci]
...
* feat: add fsdp config for magistral
* fix: add mllama self attention handling for lora kernels
* fix: no eval if val_set_size 0 despite having test_datasets
* fix: add note for cce for vlm in newer model
2025-06-14 11:53:43 -07:00
NanoCode012
a3c82e8cbb
fix: grpo doc link ( #2788 ) [skip ci]
2025-06-13 12:03:47 -07:00
Wing Lian
b2274d430b
support for QAT w RL (DPO) ( #2776 )
2025-06-13 10:00:35 -04:00
NanoCode012
eac4a61f55
Feat: Add Magistral and mistral-common tokenizer support ( #2780 )
2025-06-12 19:18:33 -04:00
Wing Lian
ace9287c96
update loss value for flakey e2e test ( #2786 ) [skip ci]
...
* update loss value for flakey e2e test
* use pytest skip
* parametrize combinations
2025-06-12 18:06:14 -04:00
JZacaroli
f5fbc82f2b
Fix logging import in evaluate.py ( #2782 ) ( #2783 )
...
* Fix logging import in evaluate.py (#2782 )
* chore: lint
---------
Co-authored-by: Joe Zacaroli <jaz@cyberscience.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-06-12 13:23:31 -04:00
NanoCode012
706c677cad
feat(doc): update readme to include changelog and remove matrix ( #2775 ) [skip ci]
...
* feat(doc): update readme to include changelog and remove matrix
* chore: improve wording
* chore: wording
* Update README.md
Co-authored-by: salman <salman.mohammadi@outlook.com >
* Update README.md
Co-authored-by: salman <salman.mohammadi@outlook.com >
* Update README.md
Co-authored-by: salman <salman.mohammadi@outlook.com >
* Update README.md
Co-authored-by: salman <salman.mohammadi@outlook.com >
* chore: address comment remove muon
* chore: address comments
* fix: address final comments
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-06-12 13:23:18 -04:00
Wing Lian
468580d18e
limit multipack sampler processes ( #2771 ) [skip ci]
...
* limit to 16 packing processes
* make num_processes properly reflect configured dataset_processes
2025-06-12 13:22:58 -04:00
salman
3634d8ff9d
QAT docfix ( #2778 ) [skip ci]
...
* nits
* Update docs/qat.qmd
Co-authored-by: NanoCode012 <nano@axolotl.ai >
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-06-12 13:22:40 -04:00
Wing Lian
bcc108efc1
build 2.7.1 images too ( #2784 ) [skip ci]
2025-06-12 13:22:20 -04:00
Wing Lian
581dd324cc
build base images for torch 2.7.1 ( #2764 )
...
* build base images for torch 2.7.1
* fix: update base docker to use torch 2.7.1
* fix: update doc for main base to use 2.7.1
* make sure to install fa2 in base uv too
* use no build isolation for uv+flashattn
* install psutil also for fa2
* longer timeout for flash attn build
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-06-11 17:11:06 -04:00
Dan Saunders
00cda8cc70
Data loader refactor ( #2707 )
...
* data loading refactor (wip)
* updates
* progress
* pytest
* pytest fix
* lint
* zero_first -> filelock, more simplifications
* small simplification
* import change
* nit
* lint
* simplify dedup
* couldnt resist
* review comments WIP
* continued wip
* minor changes
* fix; remove contrived test
* further refactor
* set default seed in pydantic config
* lint
* continued simplication
* lint
* renaming and nits
* filelock tests
* fix
* fix
* lint
* remove nullable arg
* remove unnecessary code
* moving dataset save fn to shared module
* remove debug print
* matching var naming
* fn name change
* coderabbit comments
* naming nit
* fix test
2025-06-10 19:53:07 -04:00
Dan Saunders
52a0452acb
magistral small placeholder ( #2777 )
2025-06-10 13:03:41 -04:00
NanoCode012
83632f71d8
Feat: add tool calling support via tools column ( #2774 )
...
* feat: add tool_calling field support
* fix: add tests
2025-06-09 21:42:05 -07:00
Qingyang Wu
92afa4fa27
Fix the bug of position ids padding ( #2739 ) [skip ci]
...
* Update batching.py: fix the bug of position ids padding
if position ids is padded with a long sequence of zeros, it will cause flash attention to crash
* use alternate calculation for padding position_ids with a range
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-06-09 21:26:36 -07:00
Wing Lian
dd660c2ed0
handle when unable to save optimizer state when using ao optimizer with FSDP ( #2773 ) [skip ci]
...
* handle when unable to save optimizer state when using ao optimizer with FSDP1
* improve messaging
Co-authored-by: salman <salman.mohammadi@outlook.com >
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-06-09 21:26:14 -07:00
Wing Lian
09c685fd2c
fix worker_init_fn signature handling ( #2769 )
2025-06-08 23:14:10 -07:00
Dan Saunders
345a159796
coderabbit comments
2025-06-07 04:50:29 +00:00
Dan Saunders
657bffd85f
update posthog dep
2025-06-05 23:46:20 +00:00
Dan Saunders
f0dde8e2d5
lint
2025-06-05 23:41:46 +00:00
Dan Saunders
25fa4df70f
fix
2025-06-05 23:33:46 +00:00
Dan Saunders
e735f4270b
slight changes
2025-06-05 23:33:46 +00:00
Dan Saunders
035e7a2f4c
simplifying
2025-06-05 23:33:46 +00:00
Dan Saunders
2d36c11264
minor fixes
2025-06-05 23:33:46 +00:00
Dan Saunders
b8ec5bdccf
doc update
2025-06-05 23:33:44 +00:00
Dan Saunders
249405b46e
docs fix
2025-06-05 23:31:44 +00:00
Dan Saunders
d3be84fec2
enable / disable logic update
2025-06-05 23:31:44 +00:00
Dan Saunders
1c74ab175f
opt-in version of telemetry
2025-06-05 23:31:44 +00:00
Dan Saunders
b2f1fc109a
distributed fix
2025-06-05 23:31:44 +00:00
Dan Saunders
5a2a80cc48
fix issue with tests in ci
2025-06-05 23:31:44 +00:00
Dan Saunders
4033fe74f8
fixes
2025-06-05 23:31:44 +00:00
Dan Saunders
e9df4444be
remove duplicate info
2025-06-05 23:31:44 +00:00
Dan Saunders
ffd2985750
adding runtime metrics / system info additional accelerator support, etc.
2025-06-05 23:31:44 +00:00
Dan Saunders
17310f9acc
adding runtime metrics / system info additional accelerator support, etc.
2025-06-05 23:31:44 +00:00
Dan Saunders
71ae6f9f87
improved redaction, send system info during model config load telemetry, etc.
2025-06-05 23:31:08 +00:00
Dan Saunders
9dd1092f8f
doc update
2025-06-05 23:27:29 +00:00
Dan Saunders
2c2f2647a9
fix
2025-06-05 23:27:29 +00:00
Dan Saunders
98313a6b3f
adding back in base_model redaction w/ whitelist
2025-06-05 23:27:29 +00:00
Dan Saunders
8b75205d3b
sleep on all ranks in distributed setting
2025-06-05 23:27:29 +00:00
Dan Saunders
ef4990f304
simplifying path redaction
2025-06-05 23:27:29 +00:00
Dan Saunders
db3297b090
small update / fix
2025-06-05 23:27:27 +00:00