axolotl

Author	SHA1	Message	Date
Sunny Liu	7d479348ee	custom reward function loading, proeprly done	2025-04-17 09:47:14 -04:00
bursteratom	ce0259db13	add outputdir	2025-04-17 09:47:14 -04:00
Sung Ching Liu	2798817cf9	Update tests/e2e/solo/test_grpo.py Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-04-17 09:47:14 -04:00
Sunny Liu	0e1b081e49	add unit test	2025-04-17 09:47:14 -04:00
Wing Lian	69eda209a6	re-enable DS zero3 ci with updated transformers (#2533 )	2025-04-16 14:48:40 -07:00
Dan Saunders	b8c633aa97	batch api HF adapter for ring-flash-attn; cleanup and improvements (#2520 ) * batch api HF adapter for ring-flash-attn; cleanup and improvements * update * adding all batch ring-flash-attn methods via single adapter * removing pad_to_sequence_len=False for now * fix * updating docs to include batch SP * review comments * fixes for batch API funcs, simplify * fixes * fix * updates * add batch_zigzag smoke test	2025-04-16 13:50:48 -04:00
NanoCode012	682a9cf79b	Fix: add delinearization and make qlora work with fsdp2 (#2515 ) * fixes for delinearization, and make qlora work with fsdp2 * Add back mistakenly removed lm_eval * typo [skip ci] * patch evals for torch.compile + fsdp2 * also check torch_compile w fsdp2 * lots of fixes for flex attn with llama4 * fix patch check and patch llama4 too * attempt to make the patches stick * use transformers 4.51.2 * update configs and README for llama4 * remove torch.compile for CI test * cleanup any existing singletons * set singleton cache to None instead of deleting * use importlib reload with monkeypatch * don't worry about transformers version, mark inputs with grads, fix regex * make sure embeds aren't on cpu * logging and mem improvements * vllm version and add to docker, make sure to save processor on conversion * fix ambiguous tensor bool check * fix vllm to not use v1, upgrade hf transformers * fix tests * make flex_attn_compile_kwargs configurable, since this depends on model params --------- Co-authored-by: Wing Lian <wing@axolotl.ai> Co-authored-by: Salman Mohammadi <salman.mohammadi@outlook.com>	2025-04-15 23:31:39 -07:00
Wing Lian	198d775d6d	make sure the all of the model is on the same device, so this test will pass on multigpu (#2524 ) [skip ci]	2025-04-15 22:15:42 -07:00
Wing Lian	de8a625dd7	make e2e tests a bit faster by reducing test split size (#2522 ) [skip ci] * [ci] make e2e tests a bit faster by reducing test split size * use 10% split of alpaca dataset to speed up dataset loading/tokenization * reduce gas 4->2 for most e2e tests * increase val set size for packing	2025-04-12 07:24:43 -07:00
Wing Lian	7e7180fa10	add mocks for loading datasets in cli train tests (#2497 ) [skip ci] * add mocks for loading datasets in cli train tests * Apply suggestions from code review to fix patched module for preprocess Co-authored-by: NanoCode012 <nano@axolotl.ai> --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-04-11 09:51:59 -04:00
Dan Saunders	59cd472504	SP cu_seqlens fix, refactor (#2495 ) * working on masking fix * refactor and fix multipack seqlens * pre-commit fix * adding smoke test * using existing packed seqlens util * log warning re: logged losses / gradient scaling per rank	2025-04-07 14:47:57 -04:00
Wing Lian	8bbad21bfd	llama4 support (#2493 ) * llama4 support * add xet support [skip ci] * be flexible on transformers version and skip test on version * don't use deepspeed for the fix_untrained_tokens test * reordering to trigger torch 2.6.0 tests first * slightly smaller train set * use 4.51.0 for now * remove stray print, add llama4 chat template to schema, bump peft to 0.15.1 * patches to make llama4 performant * add preliminary fp8 support	2025-04-07 10:49:15 -04:00
Wing Lian	5f4af3665d	FSDP2 support (#2469 ) * fsdp2 support * use accelerate release 1.6.0 * allow 8bit optims with fsdp2 * liger + torch compile fix * add fsdp2 e2e tests * use transformers commit with fsdp2 support * skip zero3 tests for this PR for now * fix fsdp2 config for ci * make sure both flex and flash attn work with fsdp2, skip fix untrained tokens * okay, actually use fdsp2... * more fixes to flex for fsdp2 * make sure to patch all the loaded models * additional validation for fsdp2, bump dep versions	2025-04-06 17:08:01 -04:00
Sung Ching Liu	a8f38c367c	Flex Attention + Packing with BlockMask support (#2363 )	2025-04-05 18:02:57 -04:00
Wing Lian	e7e0cd97ce	Update dependencies and show slow tests in CI (#2492 ) * use latest torchao, gradio, schedule-free * get info on slow tests * speed up tests by avoiding gradient checkpointing and reducing eval size	2025-04-05 17:41:31 -04:00
Wing Lian	949471039f	fix tokenizer overrides w gemma3 (#2488 ) * fix tokenizer overrides w gemma3 * fix offline wrapping	2025-04-05 01:25:44 -04:00
Wing Lian	dd66fb163c	check if fixture exists in the cache already (#2485 ) * check if fixture exists in the cache already * add docstring explaining what is going on	2025-04-04 13:47:01 -04:00
Wing Lian	df119e3724	Validation for Muon optimizer with DS/FSDP (#2464 )	2025-04-01 09:39:12 -04:00
Wing Lian	e0aba74dd0	Release update 20250331 (#2460 ) [skip ci] * make torch 2.6.0 the default image * fix tests against upstream main * fix attribute access * use fixture dataset * fix dataset load * correct the fixtures + tests * more fixtures * add accidentally removed shakespeare fixture * fix conversion from unittest to pytest class * nightly main ci caches * build 12.6.3 cuda base image * override for fix from huggingface/transformers#37162 * address PR feedback	2025-04-01 08:47:50 -04:00
Wing Lian	328d598114	gemma3 packing fixes (#2449 ) * make gemma3 work with packing * multi-gpu e2e for ci * update gemma3 model namespace to use mirror * add gradient checkpointing to multigpu e2e ci * update gemma3 examples for use_reentrant and fix ddp find unused params * fix tests for gemma3 * fix import for test utils * set correct train loss for gemma3 e2e	2025-03-31 17:15:23 -04:00
DreamGenX	4d36ecc724	Sequential sample packing (#2404 ) [skip ci] * add sequential sample packing * chore: lint --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-03-31 15:48:20 -04:00
Wing Lian	b6fc46ada8	Updates for trl 0.16.0 - mostly for GRPO (#2437 ) [skip ci] * add grpo scale_rewards config for trl#3135 * options to connect to vllm server directly w grpo trl#3094 * temperature support trl#3029 * sampling/generation kwargs for grpo trl#2989 * make vllm_enable_prefix_caching a config param trl#2900 * grpo multi-step optimizeations trl#2899 * remove overrides for grpo trainer * bump trl to 0.16.0 * add cli to start vllm-serve via trl * call the python module directly * update to use vllm with 2.6.0 too now and call trl vllm serve from module * vllm 0.8.1 * use python3 * use sys.executable * remove context and wait for start * fixes to make it actually work * fixes so the grpo tests pass with new vllm paradigm * explicit host/port and check in start vllm * make sure that vllm doesn't hang by setting quiet so outouts go to dev null * also bump bnb to latest release * add option for wait from cli and nccl debugging for ci * grpo + vllm test on separate devices for now * make sure grpo + vllm tests runs single worker since pynccl comms would conflict * fix cli * remove wait and add caching for argilla dataset * refactoring configs * chore: lint * add vllm config * fixup vllm grpo args * fix one more incorrect schema/config path * fix another vlllm reference and increase timeout * make the tests run a bit faster * change mbsz back so it is correct for grpo * another change mbsz back so it is correct for grpo * fixing cli args * nits * adding docs * docs * include tensor parallel size for vllm in pydantic schema * moving start_vllm, more docs * limit output len for grpo vllm * vllm enable_prefix_caching isn't a bool cli arg * fix env ordering in tests and also use pid check when looking for vllm --------- Co-authored-by: Salman Mohammadi <salman.mohammadi@outlook.com>	2025-03-31 15:47:11 -04:00
Dan Saunders	5410195e0b	Sequence parallelism quick follow-ups; remove ModelCallback (#2450 ) * guard return if ring attn alrady registered * add docs link, bits in multi-gpu docs, remove save model callback (subsumed by HF trainers) * configurable heads_k_stride from ring-flash-attn hf adapter	2025-03-31 09:13:42 -04:00
NanoCode012	cf0c79d52e	fix: minor patches for multimodal (#2441 ) * fix: update chat_template * fix: handle gemma3 showing a lot of no content for turn 0 * fix: remove unknown config from examples * fix: test * fix: temporary disable gemma2 test * fix: stop overwriting config.text_config unnecessarily * fix: handling of set cache to the text_config section * feat: add liger gemma support and bump liger to 0.5.5 * fix: add double use_cache setting * fix: add support for final_logit_softcap in CCE for gemma2/3 * fix: set use_cache before model load * feat: add missing layernorm override * fix: handle gemma3 rmsnorm * fix: use wrapper to pass dim as hidden_size * fix: change dim to positional * fix: patch with wrong mlp * chore: refactor use_cache handling * fix import issues * fix tests.e2e.utils import --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-03-31 13:40:12 +07:00
Wing Lian	4ba80a0e5a	fix streaming packing test (#2454 ) * fix streaming packing test * constrain amount of text generated	2025-03-29 08:30:06 -04:00
Wing Lian	c49682132b	use offline for precached stream dataset (#2453 )	2025-03-28 23:39:09 -04:00
Wing Lian	05f03b541a	hf offline decorator for tests to workaround rate limits (#2452 ) [skip ci] * hf offline decorator for tests to workaround rate limits * fail quicker so we can see logs * try new cache name * limit files downloaded * phi mini predownload * offline decorator for phi tokenizer * handle meta llama 8b offline too * make sure to return fixtures if they are wrapped too * more fixes * more things offline * more offline things * fix the env var * fix the model name * handle gemma also * force reload of modules to recheck offline status * prefetch mistral too * use reset_sessions so hub picks up offline mode * more fixes * rename so it doesn't seem like a context manager * fix backoff * switch out tinyshakespeare dataset since it runs a py script to fetch data and doesn't work offline * include additional dataset * more fixes * more fixes * replace tiny shakespeaere dataset * skip some tests for now * use more robust check using snapshot download to determine if a dataset name is on the hub * typo for skip reason * use local_files_only * more fixtures * remove local only * use tiny shakespeare as pretrain dataset and streaming can't be offline even if precached * make sure fixtures aren't offline improve the offline reset try bumping version of datasets reorder reloading and setting prime a new cache run the tests now with fresh cache try with a static cache * now run all the ci again with hopefully a correct cache * skip wonky tests for now * skip wonky tests for now * handle offline mode for model card creation	2025-03-28 19:20:46 -04:00
NanoCode012	2c34a4634e	feat: add CCE for gemma3, cohere, and cohere2 (#2443 ) * feat: add CCE for gemma3 and cohere1/2 * fix: change from relative import to absolute * feat: add multipack for cohere&cohere2 * chore: improve comments * fix: add gemma3_text * feat: add cohere2 example * fix: cohere forward * fix: patch for cohere2 * feat: add command r v01 qlora sample * chore: lint * feat: upgrade gemma3 and gemma2 patch to use logits_to_keep * chore: lint * fix: add deprecate_kwarg decorator * fix: add cce for gemma3 conditionalgeneration * fix: gemma3 patch to defer logits calculation * fix: patch gemma3 if given as model * fix: remove not working config * fix: update comments to clarify changes * feat(doc): add supported models to readme * fix: address difference in our cohere patch * feat: add mistral3 * feat: add gemma * feat(doc): update README to include gemma and mistral3 in supported models * fix: gemma patch * fix: import * fix: gemma patch to be standalone * fix: gemma3 warn about not support final_logit_softcapping * feat: add mllama CCE * chore: add abbireviation to doc * fix: remove unneeded gemma3 eager warning * fix: save processor if available * fix: enable save processor on merge * fix: wrong env meaning	2025-03-26 18:13:51 -04:00
NanoCode012	a9b0733f2c	Feat: Rework multimodal support (mllama, llava, pixtral, qwen2, qwen25, gemma3, mistral3) (#2435 )	2025-03-23 11:08:51 -04:00
NanoCode012	9f00465a5c	Feat: Add support for gemma3_text and add e2e for gemma2 (#2406 )	2025-03-22 20:33:21 -04:00
Dan Saunders	86bac48d14	cleanup for failing test (#2436 )	2025-03-22 17:53:29 -04:00
Dan Saunders	23f0c51d88	Sequence parallelism (#2412 ) * adding easy_context as integration for now * progress on ring attn impl * progress on ring attn impl * cleanup * remove errant file * fix req * removing unused code * updates * pytest * update * updates * fixes * precommit fixes * working multi-group SP * fixing sample packing * remove debug logs and simplify * eval dataloader and sampler changes * removing some obvious comments * update config.qmd and rename option * scoping down problematic import * another import scoping change * pernicious Fire CLI bugfix * isolate cli tests * actually isolate CLI tests * gracefully handle no ring-flash-attn * fix * fix * move ring flash attn to extras with flash-attn (#2414) * removing flash-attn from requirements.txt (in setup.py extras already) * rename file, delete another * using field validator instead of model validator * test fix * sampler / dataloader refactor * non-seq2se1 collator fix * removing print statement * bugfix * add SP doc, review comments * small changes * review comments, docstrings * refactors, SP mixin * small updates * fix tests * precommit * precommit --------- Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: Dan Saunders <dan@axolotl.ai>	2025-03-21 12:43:55 -04:00
Dan Saunders	113e9cd193	Autodoc generation with quartodoc (#2419 ) * quartodoc integration * quartodoc progress * deletions * Update docs/.gitignore to exclude auto-generated API documentation files * Fix * more autodoc progress * moving reference up near the top of the sidebar * fix broken link * update to reflect recent changes * pydantic models refactor + add to autodoc + fixes * fix * shrinking header sizes * fix accidental change * include quartodoc build step * update pre-commit version * update pylint * pre-commit --------- Co-authored-by: Dan Saunders <dan@axolotl.ai>	2025-03-21 12:26:47 -04:00
Dan Saunders	c907ac173e	adding pre-commit auto-update GH action and bumping plugin versions (#2428 ) * adding pre-commit auto-update GH action and bumping plugin versions * running updated pre-commit plugins * sorry to revert, but pylint complained * Update .pre-commit-config.yaml Co-authored-by: Wing Lian <wing.lian@gmail.com> --------- Co-authored-by: Dan Saunders <dan@axolotl.ai> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2025-03-21 11:02:43 -04:00
Wing Lian	38df5a36ea	bump HF versions except for trl (#2427 )	2025-03-20 10:22:05 -04:00
Wing Lian	59899b9817	pass additional info for fix untrained tokens when using distributed + offloading (#2388 ) * pass additional info for fix untrained tokens when using distributed + offloading * use latest version of vendored lib * use v0.0.5 of contribs lgpl * fix for no bad tokens and add tests * use release * add multigpu test too * make sure the multigpu zero3 test actually uses zero3	2025-03-11 12:02:43 -04:00
xzuyn	60a11a6410	Use Latest Cut Cross Entropy (#2392 ) * Update __init__.py * Update README.md * Update cutcrossentropy_install.py * add test	2025-03-10 16:26:40 +07:00
Wing Lian	ae66374156	Optimizer refactor and add Muon support (#2367 ) * add muon optimizer optimizer_cls_and_kwargs is on trainer_kwargs only add adamw_kwargs if they're non-null fix mocks better handling of override and check the optimizer unwrap optimizer * fix import	2025-03-06 11:49:19 -05:00
mhenrichsen	575e5f28ec	Update Tokenizer Overrides Handling in models.py (#1549 ) * override special tokens mock code * fix(doc): remove duplicate config * feat: replace added_tokens in tokenizer and add test * make sure to run tokenizer modification on rank 0 only * use is local main process instead * feat: rename config --------- Co-authored-by: NanoCode012 <nano@axolotl.ai> Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-03-05 11:15:12 -05:00
xzuyn	0134093acc	Add REX LR Scheduler (#2380 ) * Update trainer_builder.py * Update base.py * Update __init__.py * Update base.py * Update base.py * Update config.qmd * Update base.py * Update base.py * Update base.py * Update base.py * Update base.py * Update base.py * Update base.py * lint * lint * lint * lint * lint * lint * Update base.py * Update base.py * lint * Update base.py * Update base.py * Move RexLR to `schedulers.py` * Remove RexLR from `base.py` * Fix tooltip formatting * lint * Create test_schedulers.py * Use a default optimizer in test * lint * lint * Add `warmup_steps` and `cosine_min_lr_ratio` to test * lint	2025-03-05 10:26:11 -05:00
NanoCode012	1110a37e21	feat: add deepseek_v3 sample packing (#2230 )	2025-02-24 15:03:15 -05:00
Tobias	8dfadc2b3c	Fix sample packing producing longer sequences than specified by `sequence_len` (#2332 ) * Extend MultiPackBatchSampler test to include shorter sequence length and drop long sequences filter * Fix get_dataset_lengths for datasets that were previously filtered (e.g., with drop_long_seq_in_dataset) * Update src/axolotl/utils/samplers/utils.py Fix get_dataset_lengths for datasets that do not have position_ids or length attributes Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2025-02-19 12:02:35 +07:00
Wing Lian	23a9fcb0a7	make sure chatml dpo dataset loading works (#2333 )	2025-02-18 16:08:40 -05:00
NJordan72	b194e17c28	feat: add config for optional parameters in a chat message (#2260 ) * feat: add config for optional parameters in a chat message * chore: cleanup * chore: fix nits and add light docs * docs: update docs/dataset-formats/conversation.qmd Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * feat: configurable message mappings, jinja template analyzer * chore: handle bradley terry * docs: update docs * refactor: change order of mappings, improve message transform * refactor: make chat awware of property mappings * chore: remove .python-version * chore: revert change * chore: add dataset validation to tests where appropriate * chore: add dataset validation to tests where appropriate * chore: clean up handling of ds_cfg * chore: recursively serialize config * make sure to use the return value from validate_config * DefaultDict pickle/unpickle fix * fix super call for override * refactor: message fields * chore: empty commit * tests: validate config before using * chore: add config validation to all e2e tests * chore: add unneeded logging * chore: add missed config validation * chore: pass field_messages to prompter * test: fix borked test * chore: remove uninteded file * chore: add deprecation warning and update chat_datasets script * chore: lint * refactor: message fields * feat: update axolotlinputconfig and test_models - add configdict import in axolotl/utils/config/models/input/v0_4_1/__init__.py - remove unnecessary line breaks in sftdataset, dpodataset, ktodataset, stepwisesuperviseddataset classes - update model_dump method in axolotlinputconfig to exclude none values - correct typo in test_models.py comment * feat: simplify dpodataset and ktodataset classes in config models removed several optional fields from dpodataset and ktodataset classes in axolotl/utils/config/models/input/v0_4_1. this simplifies the configuration subsets for these datasets. * feat: improve readability and structure in dataset configuration models this commit enhances the readability and structure of the dataset configuration models in the `axolotl/utils/config/models/input/v0_4_1` module. it removes unused `configdict` import and adds line breaks to separate class definitions for better clarity. additionally, a minor documentation fix is included to ensure a newline at the end of the `stepwise_supervised.qmd` file. * feat: change log level from info to debug in chattemplatestrategy * feat(prompt_strategies): refactor chattemplateprompter and chattemplatestrategy - Make `chat_template` a required parameter in `ChatTemplatePrompter` constructor - Add default value for `message_property_mappings` in `ChatTemplatePrompter` constructor - Add `messages_array_name` property to `ChatTemplatePrompter` - Change `processor` type to Optional in `ChatTemplatePrompter` - Add TypeError check for `processor` in `ChatTemplatePrompter.build_prompt` - Remove `_messages` property from `ChatTemplateStrategy` - Make `prompter` a required parameter and add type hint in `ChatTemplateStrategy` constructor - Remove `messages` getter and setter from `ChatTemplateStrategy` - Use `prompter.messages_array_name` in `ChatTemplateStrategy.get_conversation_thread` - Remove condition to set `messages` field in `load` function * feat(tests/utils): ignore type check in load_model call in test_models.py * feat: improve type handling and test structure in chat templates - Add return type hint for `get_chat_template` function in `chat_templates.py` - Remove unnecessary assignment of `strategy.messages` in several test cases - Add `messages_array_name` parameter to various test configurations in `test_chat_templates.py` and `test_chat_templates_advanced.py` - Remove redundant `strategy.messages` assignment in `test_chat_templates_advanced.py` * feat(axolotl): enhance chat strategy with datasetconfig support This commit introduces support for DatasetConfig in the ChatTemplateStrategy. It also refines the strategy loader to handle different types of ds_cfg inputs and improves the clarity of the code by formatting and reordering. The key changes include: - Importing Union from typing and BaseModel from pydantic. - Adding DatasetConfig as an optional type for ds_cfg in StrategyLoader. - Adjusting the handling of ds_cfg in StrategyLoader to account for BaseModel instances. - Refactoring the prompter_params and strategy_params for better readability. - Changing the reference from prompt[self.messages] to prompt[self.prompter.messages_array_name] in the is_prompt_batched method. * feat: update message handling in btchattemplatestrategy * Replace `self.messages` with direct string references to "chosen_messages" and "rejected_messages" * Append system, user, and assistant content directly to "chosen_messages" and "rejected_messages" * Add a new attribute "messages_array_name" to the `load` function parameters * Remove the conditional attribute assignment for "field_messages" in the `load` function * feat: add config validation in test_kd.py - Import `validate_config` from `axolotl.utils.config` - Validate the configuration in `test_llama_kd` and another function in `TestKnowledgeDistillation` class * feat: enhance config validation and capabilities handling * Import `EnvCapabilities` and `GPUCapabilities` from `axolotl.utils.config.models.internals` * Update `validate_config` function to create `KTODataset` and `SFTDataset` instances using `dict(ds_cfg)` * Replace `capabilities` and `env_capabilities` with instances of `GPUCapabilities` and `EnvCapabilities` respectively in `AxolotlConfigWCapabilities` model dump * feat: update config validation in axolotl utils - Remove import of `EnvCapabilities` and `GPUCapabilities` from `axolotl.utils.config.models.internals` - Update `validate_config` function to use `capabilities` and `env_capabilities` directly instead of creating new instances of `GPUCapabilities` and `EnvCapabilities` * feat: refactor strategyloader in chat_template.py - Extracted the creation of strategy parameters into a separate function, `_get_strategy_params(cfg, dataset_config)` - Created a new function, `_get_strategy_cls()`, to obtain the strategy class - Replaced `ChatTemplateStrategy` with `strategy_cls` for strategy instantiation * trigger CI * chore: revert dataset config changes for kto/dpo * subject: refactor: rename 'messages_array_name' to 'field_messages' Body: - Renamed 'messages_array_name' to 'field_messages' in 'ChatTemplatePrompter' class and its usages in 'chat_template.py' - Updated 'load' function in 'bradley_terry/chat_template.py' to reflect the change - Adjusted 'get_chat_template_msg_variables' and 'get_message_vars' methods in 'jinja_template_analyzer.py' to use the new variable name - Modified 'StrategyLoader' in 'chat_template.py' to use 'field_messages' - Updated tests in 'test_chat_templates.py' and 'test_chat_templates_advanced.py' to use 'field_messages' instead of 'messages_array_name' * feat: refactor prompt strategies and update config models * Remove redundant 'return None' in `axolotl/prompt_strategies/__init__.py` * Simplify message handling in `axolotl/prompt_strategies/bradley_terry/chat_template.py` by using a single 'messages' list instead of separate 'chosen_messages' and 'rejected_messages' lists * Update default 'message_property_mappings' in `axolotl/prompt_strategies/bradley_terry/chat_template.py` * Add 'field_messages' field to `axolotl/utils/config/models/input/v0_4_1/__init__.py` configuration model * chore: remove unused input * chore: remove redundant type ignore * fix: remove old configs and update examples * fix: type check * fix: remove loading old config in ChatMessage * fix: update faq with potential new undefinederror * fix: add debug if property mapped is not found * chore: improve explanation for unmapped properties * fix: update docs with new config * chore: add note for deprecation config and del old config from dict --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> Co-authored-by: Wing Lian <wing@axolotl.ai> Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-02-18 09:59:27 +07:00
Dan Saunders	3d8425fa91	Activation function Triton kernels, LoRA custom autograd functions (#2324 ) * LoRA + activation fn Triton kernels: initial commit * implementing optims * finalizing MLP LoRA kernels and progress on QKV / W kernels * updates * O projection optim * adding monkey patching logic * doc strings, typing, pre-commit fixes * updates * adding lora 8b kernels example * working on fsdp support * tests and fixes * small fixes, getting tests to pass, adding doc strings * integration tests for LoRA patching * config.qmd * remove unneeded pytest fixture * fix * review comments first pass * improving tests, attention class agnostic patching * adding support for more archs * wip SiLU / GELU impls * improved testing, small updates, etc. * slightly updating docs * rebase * fixing test_attention_patching_integration * additional review comments, fixing test in CI (hopefully) * isolating problematic patching test * relaxing allclose threshold to reduce flakiness * fixing accidental change * adding model arch agnostic attention class fetching * removing unused activations	2025-02-17 14:23:15 -05:00
Wing Lian	ffae8d6a95	GRPO (#2307 )	2025-02-13 16:01:01 -05:00
Wing Lian	30046315d9	disable ray tests for latest torch release (#2328 ) * disable ray tests for latest torch release * move decorator from class to method	2025-02-12 18:29:02 -05:00
NanoCode012	a620d481e2	fix: drop long seq even if not sample packing (#2211 ) * fix: drop long seq even if not sample packing * fix: logging import * fix: cfg passed being none * fix: try to fix logging * fix: refactor call to not use accelerate log * fix: try to fix circular import issue * fix: don't drop when skip prepare * chore: remove duplicate line * fix: update warning to mention that sequences will be trimmed * fix: do not drop seq if input_ids don't exist * fix: increase RM unittest sequence length to reduce trim warnings * fix: solve conflicts * fix: default min_seq_len in case of None	2025-02-04 09:43:35 -05:00
Wing Lian	158330ab60	[feature] sweeps (#2171 )	2025-02-01 21:11:18 -05:00
Wing Lian	78ce268848	KD Trainer w logprobs (#2303 ) * refactor trainer to prevent circular dependencies later fix loader default KD dataset loading and KD with logprobs filter bad rows make batch smaller handle padding/collation for KD datasets make it work flipped the slice cross entropy loss coefficient during KD make sure to multiply against the correct loss chore: lint triton wip no where support v2 trial no torch.exp inside triton kernel no log etc no torch.tensor v3 fix kwarg don't use triton for now better rescaling for temperatures hash for temperature too use kd_alpha in the correct loss method fix kd loss so it's causal (fixes repeating tokens) var naming and add todo chore: lint refactor so we can easily add new loss functions add license block remove references to triton kd for now handle token/logprob shifting support for custom trainer classes from plugins refactor kd chat template loader move more things to kd plugin remove moved class from import make plugin setup concise increase logging around loading plugins add copyrights remove duplicate code more info on preprocess for kd and fix import be a bit pickier about loading dynamic prompt strategies kd sample packing make loss torch script compat support streaming for processing sft datasts? improve iterable support ensure that batch vs single is done properly tweak check for batched prompt data reward can use same batch check fix reward trainer calls for tokenization improve check for batched reward model doesn't work well with batched add kd trainer e2e test linting rename test files so it gets picked up make the kd e2e fit in vram for ci and add lora version set lora_dropout explicitly lower lr make sure to set tokenizer from l3 70b and save safetensors make sure to use the correct tokenizer fix adapter model check make sure to use tensorboard to capture loss for checks chore: lint chore: lint improve logprob masking and shift in trainer more fixes try tests for kd on l40s don't shift student logits for kd no batching for kd chat templates make sure to truncate logprobs if there are more than top_k change up logic so we always truncate to top_k use iter instead of tuple fix finding the top-k rather than assuming first position has the correct val apply z-score scaling to kd kd loss needs to be calculated in full precision Always re-normalize teacher distribution various fixes * support for configurable top-k/softmax ordering * add attribute check for filter rows and lint * fix logic * handle none case for conversion to int * fix student logit off by one * set kd_temp to 1.0 for test loss * address PR feedback	2025-01-31 20:18:52 -05:00

1 2 3 4 5 ...

281 Commits