tgoab
530a0c0bf0
Changes from dataset_processes to dataset_num_proc ( #3352 ) [skip ci]
...
* changes from dataset_processes to dataset_num_proc
* deprecation message improved
---------
Co-authored-by: Juliana Nieto Cárdenas <jnietoca@purdue.edu >
2026-02-10 17:44:17 +07:00
VED
0343a72cc9
add glm support + patch ( #3329 ) [skip ci]
...
* add glm support + patch
* lint
* lint
* Update examples/glm4/glm-4-6v-flash-qlora.yaml
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update examples/glm4/glm-4-6v-flash-qlora.yaml
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update src/axolotl/processing_strategies.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* patch removed
* lint
* lint2
* docs + rename
* rmv moe
* docs
* removed processor
* sdpa T_T"
* ddp_find_unused_parameters: true
* muti gpu yaml tested both
* muti gpu yaml tested both
* Update examples/glm46v/README.md
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update examples/glm46v/README.md
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update examples/glm46v/README.md
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* rmv text only section + v5 comments
* rename
---------
Co-authored-by: Ved <ved.work2024@gmail.com >
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2026-02-10 17:43:53 +07:00
Wing Lian
3738978394
Add support for batched_mm, grouped_mm and scattermoe for MoE models ( #3377 )
...
* kernels plugin for moe for v5
* add support for native batched_mm or grouped_mm
2026-01-29 14:25:47 -05:00
Wing Lian
6132a30cda
handle warnings from v5 upgrade ( #3376 )
2026-01-28 06:45:01 -05:00
NanoCode012
3dd86d35b8
feat: add new cce support for glm series and exaone4 ( #3373 ) [skip ci]
2026-01-28 06:44:44 -05:00
salman
dd9ebaeba1
EAFT ( #3366 ) [skip ci]
...
* wip eaft
* fix eaft loss fn
* adding ref
---------
Co-authored-by: Salman Mohammadi <“salman.mohammadi@outlook.com ”>
2026-01-28 06:44:15 -05:00
Wing Lian
fc4e37920b
transformers v5 upgrade ( #3272 )
...
* Prepare for transformers v5 upgrade
* fix hf cli
* update for hf hub changes
* fix tokenizer apply_chat_template args
* remap include_tokens_per_second
* fix tps
* handle migration for warmup
* use latest hf hub
* Fix scan -> ls
* fix import
* fix for renaming of mistral common tokenizer -> backend
* update for fixed tokenziation for llama
* Skip phi35 tests for now
* remove mistral patch fixed upstream in huggingface/transformers#41439
* use namespacing for patch
* don't rely on sdist for e2e tests for now
* run modal ci without waiting too
* Fix dep for ci
* fix imports
* Fix fp8 check
* fsdp2 fixes
* fix version handling
* update fsdp version tests for new v5 behavior
* Fail multigpu tests after 3 failures
* skip known v5 broken tests for now and cleanup
* bump deps
* unmark skipped test
* re-enable test_fsdp_qlora_prequant_packed test
* increase multigpu ci timeout
* skip broken gemma3 test
* reduce timout back to original 120min now that the hanging test is skipped
* fix for un-necessary collator for pretraining with bsz=1
* fix: safe_serialization deprecated in transformers v5 rc01 (#3318 )
* torch_dtype deprecated
* load model in float32 for consistency with tests
* revert some test fixtures back
* use hf cache ls instead of scan
* don't strip fsdp_version
more fdsp_Version fixes for v5
fix version in fsdp_config
fix aliasing
fix fsdp_version check
check fsdp_version is 2 in both places
* Transformers v5 rc2 (#3347 )
* bump dep
* use latest fbgemm, grab model config as part of fixture, un-skip test
* import AutoConfig
* don't need more problematic autoconfig when specifying config.json manually
* add fixtures for argilla ultrafeedback datasets
* download phi4-reasoning
* fix arg
* update tests for phi fast tokenizer changes
* use explicit model types for gemma3
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
* fix: AutoModelForVision2Seq -> AutoModelForImageTextToText
* chore: remove duplicate
* fix: attempt fix gemma3 text mode
* chore: lint
* ga release of v5
* need property setter for name_or_path for mistral tokenizer
* vllm not compatible with transformers v5
* setter for chat_template w mistral too
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
Co-authored-by: salman <salman.mohammadi@outlook.com >
2026-01-27 17:08:24 -05:00
VED
d0d26d5064
feat: Add GDPO Support ( #3353 )
...
* gdpo support - test left
* lint
* fixxes for vllm serv
* test advantages
* docss
* lint
* lint =
* gdpo simple + lint
* lint nit
* example
* lint
* trl 0.27.0
* blocklist
* test assert rmv
* add validation check for GDPO + sum_then_normalize
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2026-01-21 17:22:45 -05:00
Wing Lian
8ab9d9ea88
Version dev ( #3365 )
2026-01-20 22:58:29 -05:00
Wing Lian
6e42def14b
set version to v0.13.1 ( #3363 )
ci-cd / build-axolotl (<nil>, 128, 12.8.1, linux/amd64, 3.11, 2.8.0) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.11, 2.9.0) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 128, 12.8.1, true, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled
publish pypi / Create Release (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, linux/amd64, 3.11, 2.8.0) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, linux/amd64,linux/arm64, 3.11, 2.9.0) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, true, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 130, 13.0.0, linux/amd64,linux/arm64, 3.11, 2.9.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud-no-tmux (<nil>, 128, 12.8.1, true, 3.11, 2.9.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud-no-tmux (<nil>, 130, 13.0.0, <nil>, 3.11, 2.9.1) (push) Has been cancelled
publish pypi / Upload release to PyPI (push) Has been cancelled
2026-01-20 08:58:32 -05:00
NanoCode012
359b7ad85e
fix: gemma3_text model loading vision config ( #3354 )
...
* fix: gemma3-text mode loading vision config
* fix: improve defaults to use lora kernels
2026-01-13 09:49:23 -05:00
VED
258ce8d4fa
feat : scaled softmax support ( #3338 )
...
* scaled softmax
* comment
* lint
* remove egear
* validation for flash
* lint
* val imporve + neet
* fix correct softmax scale val(learned)
* learned scale val 4 ssm
* lint
* fix model_type rmv
* sdpa_atten
* test fix + lint
* test fix
* sdp_a val rmv
* flex fix
* main flash
* lint
* flex attn
* lint comment
* fix score_mod
* Update src/axolotl/utils/schemas/validation.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
---------
Co-authored-by: Ved <ved.work2024@gmail.com >
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2026-01-13 14:33:11 +07:00
VED
7bf6f70e96
fix total/trainable tokens log ( #3344 )
...
* fix total/trainable tokens log
* fix total/trainable tokens log
2026-01-06 09:25:17 -05:00
PraMamba
8aab807e67
feat: Add SwanLab integration for experiment tracking ( #3334 )
...
* feat(swanlab): add SwanLab integration for experiment tracking
SwanLab integration provides comprehensive experiment tracking and monitoring for Axolotl training.
Features:
- Hyperparameter logging
- Training metrics tracking
- RLHF completion logging
- Performance profiling
- Configuration validation and conflict detection
Includes:
- Plugin in src/axolotl/integrations/swanlab/
- Callback in src/axolotl/utils/callbacks/swanlab.py
- Tests in tests/integrations/test_swanlab.py
- Examples in examples/swanlab/
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com >
* fix(swanlab): address PR #3334 review feedback from winglian and CodeRabbit
- Change use_swanlab default to True (winglian)
- Clear buffer after periodic logging to prevent duplicates (CodeRabbit Major)
- Add safe exception handling in config fallback (CodeRabbit)
- Use context managers for file operations (CodeRabbit)
- Replace LOG.error with LOG.exception for better debugging (CodeRabbit)
- Sort __all__ alphabetically (CodeRabbit)
- Add language specifiers to README code blocks (CodeRabbit)
- Fix end-of-file newline in README (pre-commit)
Resolves actionable comments and nitpicks from CodeRabbit review.
Addresses reviewer feedback from @winglian.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com >
* only run swanlab integration tests if package is available
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2026-01-06 09:19:18 -05:00
Wing Lian
afe18ace35
deprecate torch 2.7.1 ( #3339 )
2026-01-01 06:52:45 -05:00
Wing Lian
11c0b5b256
bartch upgrade dependencies ( #3299 )
...
* upgrade dependencies
* don't use reset sessions
* downgrade transformers, upgrade other deps
* upgrade bnb to 0.49.0
* restore s3 cache
* explicit use local files w hub
* decompress and strip top level dir
* use 2 levels for strip components
* try to preserve permissions for symlinks
* use updated tar
* fix #3293 for distributed
* downgrade bnb
* fast fail after 4
* fix total tokens device
* patch accelerate CP/SP (#3309 )
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-12-30 09:02:49 -05:00
VED
a6080df73c
compute loss only if training and update token metric naming ( #3293 ) [skip ci]
...
* compute loss only if training
* save total_tokens for checkpiont
* check if string
* refactor total_tokens/ num_tokens
* refactor 2
* rplc trainable_step/trian_per_sec_per_gpu
* lint + log trainable/tokens
* consolidate it in the callback.
* test for total_tokes aftr remuse
* check if tokenstate exist after ckpt
---------
Co-authored-by: Ved <ved.work2024@gmail.com >
2025-12-25 18:38:17 +07:00
NanoCode012
418933f0d1
feat: add internvl3_5 ( #3141 ) [skip-ci]
...
* feat: add internvl3_5
* fix: add timm instructions
* chore: add kimi-linear to cce doc
* feat: update internvl example
* chore: pin revision
* chore: remove from multipack
* fix: add to multimodal array
* fix: internvl use hf version
* feat: update cce
* chore: lint
* fix: list for image_size
* chore: add docs vram usage
* feat: enable cce
* fix: no need trust remote code
* fix: inconsistent timm version
2025-12-25 18:07:59 +07:00
NanoCode012
372f664c63
feat: cleanup old flex mask patch, suppress Matmul bnb warn, and misc ( #3330 ) [skip-ci]
...
* feat: add pos id to flex attention for packing part 1
* feat: update to include sliding window mask patch
* fix: suppress MatMul8bitLt: inputs will be cast from warnings
* fix: remove redundant flex attention patch
* chore: update olmo docs
* feat: add validator patch for cross entropy
2025-12-25 17:56:20 +07:00
NanoCode012
97f1b1758d
Feat: add kimi linear support ( #3257 )
...
* feat: add custom kimi linear patch [skip ci]
* feat: add configuration file and fix import [skip ci]
* fix: hijack tokenizer temporarily [skip ci]
* chore: remove accidental commit
* fix: attempt patch kimi remote
* fix: kwargs passsed
* fix: device for tensor
* fix: aux loss calculation
* feat: cleaned up patches order
* fix: remove duplicate tokenizer patch
* chore: add debug logs
* chore: add debug logs
* chore: debug
* Revert "chore: add debug logs"
This reverts commit da372a5f67 .
* Revert "chore: add debug logs"
This reverts commit 97d1de1d7c .
* fix: KeyError: 'tokenization_kimi'
* fix: support remote_model_id in cce patch
* feat: add config preload patch
* fix: use standard aux loss calc and updated modeling
* fix: import
* feat: add kimi-linear docs and example
* chore: add note about moe kernels
* feat: update cce to include kimi-linear
* chore: lint
* chore: update main readme
* fix: patch mechanism to address comments
* chore: lint
* fix: tests
* chore: cleanup comment
2025-12-25 17:53:52 +07:00
Abubakar Abid
f2155eaf79
feat: add trackio as experiment tracking integration ( #3253 )
...
* feat: add trackio as experiment tracking integration
- Add TrackioConfig to integrations schema with project_name, run_name, and space_id
- Create trackio_.py module for environment setup
- Add is_trackio_available() utility function
- Integrate trackio with report_to in trainer builder
- Add trackio callback for experiment tracking
- Add trackio config keys to gpt-oss example YAMLs
- Trackio runs locally by default, syncs to HF Space if space_id provided
* changes
* changes
* changes
* changes
* changes
* changes
* changes
* Update requirements.txt
* don't allow pydantic 2.12 for now
---------
Co-authored-by: Abubakar Abid <aaabid93@gmail.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-12-23 08:49:07 -05:00
kallewoof
92ee4256f7
feature: raise on long sequence drop ( #3321 )
...
* feature: raise on long sequence drop
It is sometimes not desired that sequences are silently dropped from the dataset, especially when the dataset has been carefully crafted and pre-fitted for the training context. This would then suggest that an error occurred somewhere in the process. This feature adds a third value for excess_length_strategy called 'raise', which will raise a ValueError if a sequence is encountered that is too long and would have normally been dropped/truncated.
* tests: add excess_length_strategy tests
* doc: updated return value description for drop_long_seq_in_dataset
* add @enable_hf_offline
* fixed cfg modified after validate_config called
* hf offline fix
* fix tqdm desc when raise is used
* test: added test for non-batched case
* accidental code change revert
* test: use pytest.raises
* test: simplified drop_seq_len tests
* test: moved excess_length_strat test to test_data.py
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-12-22 13:59:49 -05:00
Wing Lian
efeb5a4e41
fix check for fp8 capability ( #3324 )
...
* fix check for fp8 capability
* handle non-cuda compute
* reduce concurrency of tests
2025-12-22 13:58:25 -05:00
VED
faaff6c792
allow users to set ndigits for rounding of metrics when logging ( #3325 )
...
* METRIC_PRECISION-> 8
* use ndigits and move env getter to top of log function
---------
Co-authored-by: Ved <ved.work2024@gmail.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-12-22 08:54:43 -05:00
Alexander Kozhevnikov
43cef27458
Fix typo in densemixer RuntimeError ( #3327 ) [skip ci]
...
It offers installing densemizer while it should be densemixer
2025-12-22 08:53:58 -05:00
salman
bbd3486f57
Distributed Muon Optimizer ( #3264 )
...
* init
* working
* updating configs
* removing unneeded files
* lint
* comments
* lint
* fix regex match
* bump contribs version
* comments
* fixing tests and imports
* muon imports in test v2
* test cleanup
* bump contribs version
---------
Co-authored-by: Salman Mohammadi <“salman.mohammadi@outlook.com ”>
2025-12-19 10:43:47 -05:00
VED
3750d7dd64
add liger support kernal for dpo ( #3302 )
...
* add liger kernal 4 dpo
* revert grpo changes,add support in dpo
* revert grpo changes,add support in dpo
* dpo_use_liger_kernal
* fix liger_dpo
---------
Co-authored-by: Ved <ved.work2024@gmail.com >
2025-12-18 11:11:06 -05:00
xzuyn
2197b0bf89
feat: cheap ppl metric ( #3317 )
...
* Import math and compute perplexity from loss values
* lint
* coderabbit changes
* lint
* fix: add rounding to ppl
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-12-18 09:02:41 -05:00
Seung Hyun Cho
3e51a680c2
fix: Fix evaluation loss in KD trainer ( #3271 )
...
* fix: Fix evaluation loss in KD trainer
* Fix v2 strategy super() call
* fix: Add safety check for total_tokens in log method
* fix: simplified num items and outputs return handling
* fix: add missing model forward pass in compute_loss
* refactor: Use Template Method pattern for chat template strategies
* refactor: use pop(None) and remove v2 override
* chore: lint
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-12-17 13:40:36 -05:00
xzuyn
2cf254b4af
Add peft_autocast_adapter_dtype config option ( #3311 ) [skip ci]
...
* Add `peft_autocast_adapter_dtype` field to schema
* Add `autocast_adapter_dtype` to `model_kwargs`
* chore: docs
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-12-17 10:09:39 -05:00
NanoCode012
a1d07f42e4
Fix(misc): address PYTORCH_CUDA_ALLOC_CONF deprecate ( #3313 )
...
* fix: leftover ministral docs changes
* fix: pytorch_cuda_alloc_conf deprecation
* fix: set old PYTORCH_CUDA_ALLOC_CONF env too
* handle 2.9 separately
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-12-17 09:12:18 -05:00
Wing Lian
2a664dc8ad
support for xformers wheels for torch 2.9 ( #3308 )
...
* support for xformers wheels for torch 2.9
* fix hf cache?
* don't use hf cache from s3
* show disk free space in ci
2025-12-11 11:56:40 -05:00
NanoCode012
4ac78aa562
fix: update qwen3 jinja tokenization off a few tokens ( #3295 )
...
* fix: update qwen3 jinja tokenization off a few tokens
* fix: add note on tokenization issue
* fix: pop last index for mistral tokenizer
2025-12-09 14:31:03 +07:00
VED
b3f4aa149f
fix bin size ( #3307 )
...
* fix bin size
* lint
---------
Co-authored-by: Ved <ved.work2024@gmail.com >
2025-12-08 09:16:18 -05:00
salman
75b20fb66f
Save processor in quantizer CLI ( #3290 )
2025-12-06 16:27:18 +00:00
NanoCode012
2b66ee189c
Feat: add ministral3 ( #3297 )
...
* feat: add ministral and mistral3
* chore: lint
* feat: update cce for ministral
* fix: add vram usage
* feat: update for release
* fix: save_pretrained issue in v5
* fix: add instructions to use v5 branch
* fix: add to multipack
* fix: improve instructions
* fix: add model to readme
2025-12-04 08:32:08 -05:00
NanoCode012
86d8cca149
Feat: add trinity by ArceeAI ( #3292 )
2025-12-02 13:12:55 -05:00
Yohan Na
c6ddcdd06a
feat: add exaone4 chat template and update enums ( #3279 )
...
* feat: add exaone4 chat template and update enums
* fix: handle first message as system or tools in exaone4 chat template
* Update src/axolotl/utils/chat_templates/templates/exaone4.jinja
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* fix: lint
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-12-01 15:52:45 +07:00
NanoCode012
b234532d9f
Feat: add peft_ensure_weight_tying ( #3278 )
...
* feat: upgrade peft to 0.18.0
* feat: add peft_ensure_weight_tying
* fix: default
* chore: adjust kwarg per feedback
2025-11-28 18:54:48 +07:00
NanoCode012
006f226270
Feat: add Olmo3 (BC with Olmo and Olmo2) ( #3275 )
...
* feat: update cce to include olmo family
* chore: update docs following feedback
* feat: add olmo3 config
* fix: clarify 3 methods
* chore: add olmo to readme
2025-11-24 10:21:31 +07:00
NanoCode012
4e55871112
feat: Add opt-out Telemetry ( #3237 )
...
* initial telemetry manager impl
* adding todo
* updates
* updates
* progress on telemetry: config load, process, model load, train start / end, error tracking
* update error file path sanitization function; adding more error tracking
* updated sanitization logic, tests
* adding runtime metrics (cpu + gpu memory, steps/s, etc.)
* tests for runtime metrics telemetry and assoc. callback
* small update / fix
* simplifying path redaction
* sleep on all ranks in distributed setting
* adding back in base_model redaction w/ whitelist
* fix
* doc update
* improved redaction, send system info during model config load telemetry, etc.
* adding runtime metrics / system info additional accelerator support, etc.
* adding runtime metrics / system info additional accelerator support, etc.
* remove duplicate info
* fixes
* fix issue with tests in ci
* distributed fix
* opt-in version of telemetry
* enable / disable logic update
* docs fix
* doc update
* minor fixes
* simplifying
* slight changes
* fix
* lint
* update posthog dep
* coderabbit comments
* fix: opt-in model
* fix: increase time since last
* fix: increase whitelist orgs
* fix: posthog init and shutdown
* fix: imports
* fix: also check grad norm
* fix: duplicate plugin_manager calls
* fix: bad merge
* chore: update docs
* fix: cache process per comment
* fix: error handling
* fix: tests
* Revert "fix: error handling"
This reverts commit 22d1ea5755 .
* fix: test telemetry error_handled bool
* fix: revert test
* chore: final doc fixes
---------
Co-authored-by: Dan Saunders <danjsaund@gmail.com >
Co-authored-by: Dan Saunders <dan@axolotl.ai >
2025-11-18 11:35:25 +07:00
VED
dcf24fd24e
feat: save checkpoint after training started ( #3233 )
...
* add:config parameters for checkpoint
* callback main
* test file_type fix
* lint
* unit
* simplify dict/obj handeling
* Update src/axolotl/utils/schemas/dynamic_checkpoint.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Delete tests/e2e/integrations/__init__.py
* remove hard code path in test
* device check
* lint
* Update src/axolotl/utils/callbacks/dynamic_checkpoint.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update src/axolotl/utils/callbacks/dynamic_checkpoint.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* Update src/axolotl/utils/schemas/dynamic_checkpoint.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
* lint-2
* remove: singal based checkpoints
* lint
* remove signal tests
* add:is_main_process
* lint
* addis_d:istributed() for tests
* remove nested is_main_process
* Update src/axolotl/utils/schemas/dynamic_checkpoint.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* Update src/axolotl/utils/schemas/dynamic_checkpoint.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* add user_defined_filename
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2025-11-13 10:21:05 -05:00
NanoCode012
9901ee5602
fix: voxtralprocessor broken ( #3255 ) [skip ci]
...
* fix: voxtralprocessor broken
* chore: add todo
* chore: wording
2025-11-13 10:18:42 -05:00
xzuyn
dd78f2e0cc
Fix: warmup_steps: 0 & warmup_ratio: 0 not disabling warmup ( #3254 )
...
* fix unintentional falsy checks
* chore: lint
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-11-11 10:32:06 +07:00
Eduard Zl
b54f9c942b
_get_tools in ChatTemplateStrategy : function "parameters" can be dict or string ( #3238 )
...
* When training of function calls, "tools" elements of a dataset can contain same parameter name but with different types. Datasets fails to load such training set. This fix allows "parameters" element of function call to be string( by running "json.dumps" in preparation of training data set). The _get_tools function will iterate over tool definitions, if "parameters" element is dict, it will keep that way, if it is a string, it will be converted to dict by invoking "json.loads" on string value.
* feat: add doc on tool parameters json loading
* feat: add tests for parameters json string
---------
Co-authored-by: ezlotnik <eduard_zlotnik@intuit.com >
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-11-11 09:04:28 +07:00
NanoCode012
11eb36585a
feat: add arg to enable dft in liger ( #3125 )
...
* feat: add arg to enable dft in liger
* feat: add tests use_token_scaling
* fix: test
* fix: move check to args
2025-11-10 21:37:47 +07:00
NanoCode012
d0c846fc5e
feat: add granitemoeshared and granitemoehybrid ( #3158 )
2025-11-10 21:35:45 +07:00
Wing Lian
b5fcc2f14b
log cumulative total trained tokens ( #3252 )
...
* log cumulative total trained tokens
* use is_distributed helper
2025-11-07 16:04:00 -05:00
VED
ed2e8cacd6
feat:openenv rollout_func ( #3239 ) [skip ci]
...
* feat:openenv rollout_func
* chore lint
* docs
* add:docs processing_class
* tests
* lint
2025-11-07 08:51:40 -05:00
Lê Nam Khánh
80270a92fa
Fix typos in some files ( #3250 ) [skip ci]
2025-11-07 08:21:20 -05:00