axolotl

Author	SHA1	Message	Date
Sung Ching Liu	328bb0466b	Merge branch 'main' into flx_attn_support	2025-02-21 11:27:25 -05:00
Sunny Liu	e792b54bab	remove unnecessary components	2025-02-21 11:23:21 -05:00
NanoCode012	bf842730a5	fix(doc): add missing auto_find_batch_size (#2339 ) [skip ci]	2025-02-21 11:56:38 +07:00
Wing Lian	1db6ad60a7	support for passing init_lora_weights to lora_config (#2352 )	2025-02-20 22:56:34 -05:00
salman	29b366b2e1	Bumping 0.15.1 TRL version for GRPO+PEFT fix (#2344 ) * bumping TRL version * apply upstream fixes to our custom fix --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-02-20 22:56:04 -05:00
NanoCode012	b53a41372f	feat: update transformers version to 4.49.0 (#2340 )	2025-02-20 21:12:06 -05:00
Wing Lian	02f45e94be	calculate sample length fixes and SFT splitting fixes (#2351 ) * fix chat template splitting long samples across multiple rows * make the preprocessing faster	2025-02-20 14:29:58 -05:00
Dan Saunders	954e192f38	quick formatting fix for LoRA optims doc (#2349 )	2025-02-19 09:23:31 -05:00
Tobias	8dfadc2b3c	Fix sample packing producing longer sequences than specified by `sequence_len` (#2332 ) * Extend MultiPackBatchSampler test to include shorter sequence length and drop long sequences filter * Fix get_dataset_lengths for datasets that were previously filtered (e.g., with drop_long_seq_in_dataset) * Update src/axolotl/utils/samplers/utils.py Fix get_dataset_lengths for datasets that do not have position_ids or length attributes Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2025-02-19 12:02:35 +07:00
Wing Lian	23a9fcb0a7	make sure chatml dpo dataset loading works (#2333 )	2025-02-18 16:08:40 -05:00
Dan Saunders	c3d4f6e295	Doc fix: TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL not necessary to use Triton kernel patches (#2343 ) * removing note about TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL * suggest using TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL for memory efficient attn	2025-02-18 10:06:31 -05:00
Wing Lian	7fa690fac8	bump dev version (#2342 )	2025-02-18 04:30:59 -05:00
Wing Lian	3c743c4bfb	v0.7.0 for release (#2341 ) Some checks failed ci-cd / build-axolotl (<nil>, 124, 12.4.1, 3.11, 2.4.1) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 124, 12.4.1, 3.11, 2.6.0) (push) Has been cancelled Details ci-cd / build-axolotl (vllm, 124, 12.4.1, true, 3.11, 2.5.1) (push) Has been cancelled Details publish pypi / Create Release (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, 3.11, 2.4.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, true, 3.11, 2.5.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-no-tmux (<nil>, 124, 12.4.1, 3.11, 2.4.1) (push) Has been cancelled Details publish pypi / Upload release to PyPI (push) Has been cancelled Details v0.7.0	2025-02-18 04:26:21 -05:00
NJordan72	91bb95685a	chore: cleanup deprecated config elements (#2309 ) * feat: update metadata fields and refactor config class in axolotlinputconfig - Replace `metadata` fields with `json_schema_extra` in RayConfig class. - Replace `Config` class with `ConfigDict` in AxolotlInputConfig. - Set `populate_by_name` to `True` directly in `ConfigDict` instance. * feat: update axolotlinputconfig in utils * Replace `conlist` with `Annotated` for `datasets`, `test_datasets`, and `pretraining_dataset` fields * Change default values for `lr_scheduler` and `optimizer` fields in `HyperparametersConfig` class * Remove unnecessary Union from `evals_per_epoch` field in `AxolotlInputConfig` class * Import `MinLen` from `annotated_types` module * Remove import of `conlist` from `pydantic` module * feat: update modelinputconfig and axolotlinputconfig in v0_4_1 - Removed ConfigDict import from pydantic in `src/axolotl/utils/config/models/input/v0_4_1/__init__.py` - Added `model_config` with `protected_namespaces` to ModelInputConfig - Replaced `config: ConfigDict` with `model_config` in AxolotlInputConfig - Set `populate_by_name` to True in `model_config` for AxolotlInputConfig * chore: get rid of unused import	2025-02-18 15:39:24 +07:00
NJordan72	b194e17c28	feat: add config for optional parameters in a chat message (#2260 ) * feat: add config for optional parameters in a chat message * chore: cleanup * chore: fix nits and add light docs * docs: update docs/dataset-formats/conversation.qmd Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * feat: configurable message mappings, jinja template analyzer * chore: handle bradley terry * docs: update docs * refactor: change order of mappings, improve message transform * refactor: make chat awware of property mappings * chore: remove .python-version * chore: revert change * chore: add dataset validation to tests where appropriate * chore: add dataset validation to tests where appropriate * chore: clean up handling of ds_cfg * chore: recursively serialize config * make sure to use the return value from validate_config * DefaultDict pickle/unpickle fix * fix super call for override * refactor: message fields * chore: empty commit * tests: validate config before using * chore: add config validation to all e2e tests * chore: add unneeded logging * chore: add missed config validation * chore: pass field_messages to prompter * test: fix borked test * chore: remove uninteded file * chore: add deprecation warning and update chat_datasets script * chore: lint * refactor: message fields * feat: update axolotlinputconfig and test_models - add configdict import in axolotl/utils/config/models/input/v0_4_1/__init__.py - remove unnecessary line breaks in sftdataset, dpodataset, ktodataset, stepwisesuperviseddataset classes - update model_dump method in axolotlinputconfig to exclude none values - correct typo in test_models.py comment * feat: simplify dpodataset and ktodataset classes in config models removed several optional fields from dpodataset and ktodataset classes in axolotl/utils/config/models/input/v0_4_1. this simplifies the configuration subsets for these datasets. * feat: improve readability and structure in dataset configuration models this commit enhances the readability and structure of the dataset configuration models in the `axolotl/utils/config/models/input/v0_4_1` module. it removes unused `configdict` import and adds line breaks to separate class definitions for better clarity. additionally, a minor documentation fix is included to ensure a newline at the end of the `stepwise_supervised.qmd` file. * feat: change log level from info to debug in chattemplatestrategy * feat(prompt_strategies): refactor chattemplateprompter and chattemplatestrategy - Make `chat_template` a required parameter in `ChatTemplatePrompter` constructor - Add default value for `message_property_mappings` in `ChatTemplatePrompter` constructor - Add `messages_array_name` property to `ChatTemplatePrompter` - Change `processor` type to Optional in `ChatTemplatePrompter` - Add TypeError check for `processor` in `ChatTemplatePrompter.build_prompt` - Remove `_messages` property from `ChatTemplateStrategy` - Make `prompter` a required parameter and add type hint in `ChatTemplateStrategy` constructor - Remove `messages` getter and setter from `ChatTemplateStrategy` - Use `prompter.messages_array_name` in `ChatTemplateStrategy.get_conversation_thread` - Remove condition to set `messages` field in `load` function * feat(tests/utils): ignore type check in load_model call in test_models.py * feat: improve type handling and test structure in chat templates - Add return type hint for `get_chat_template` function in `chat_templates.py` - Remove unnecessary assignment of `strategy.messages` in several test cases - Add `messages_array_name` parameter to various test configurations in `test_chat_templates.py` and `test_chat_templates_advanced.py` - Remove redundant `strategy.messages` assignment in `test_chat_templates_advanced.py` * feat(axolotl): enhance chat strategy with datasetconfig support This commit introduces support for DatasetConfig in the ChatTemplateStrategy. It also refines the strategy loader to handle different types of ds_cfg inputs and improves the clarity of the code by formatting and reordering. The key changes include: - Importing Union from typing and BaseModel from pydantic. - Adding DatasetConfig as an optional type for ds_cfg in StrategyLoader. - Adjusting the handling of ds_cfg in StrategyLoader to account for BaseModel instances. - Refactoring the prompter_params and strategy_params for better readability. - Changing the reference from prompt[self.messages] to prompt[self.prompter.messages_array_name] in the is_prompt_batched method. * feat: update message handling in btchattemplatestrategy * Replace `self.messages` with direct string references to "chosen_messages" and "rejected_messages" * Append system, user, and assistant content directly to "chosen_messages" and "rejected_messages" * Add a new attribute "messages_array_name" to the `load` function parameters * Remove the conditional attribute assignment for "field_messages" in the `load` function * feat: add config validation in test_kd.py - Import `validate_config` from `axolotl.utils.config` - Validate the configuration in `test_llama_kd` and another function in `TestKnowledgeDistillation` class * feat: enhance config validation and capabilities handling * Import `EnvCapabilities` and `GPUCapabilities` from `axolotl.utils.config.models.internals` * Update `validate_config` function to create `KTODataset` and `SFTDataset` instances using `dict(ds_cfg)` * Replace `capabilities` and `env_capabilities` with instances of `GPUCapabilities` and `EnvCapabilities` respectively in `AxolotlConfigWCapabilities` model dump * feat: update config validation in axolotl utils - Remove import of `EnvCapabilities` and `GPUCapabilities` from `axolotl.utils.config.models.internals` - Update `validate_config` function to use `capabilities` and `env_capabilities` directly instead of creating new instances of `GPUCapabilities` and `EnvCapabilities` * feat: refactor strategyloader in chat_template.py - Extracted the creation of strategy parameters into a separate function, `_get_strategy_params(cfg, dataset_config)` - Created a new function, `_get_strategy_cls()`, to obtain the strategy class - Replaced `ChatTemplateStrategy` with `strategy_cls` for strategy instantiation * trigger CI * chore: revert dataset config changes for kto/dpo * subject: refactor: rename 'messages_array_name' to 'field_messages' Body: - Renamed 'messages_array_name' to 'field_messages' in 'ChatTemplatePrompter' class and its usages in 'chat_template.py' - Updated 'load' function in 'bradley_terry/chat_template.py' to reflect the change - Adjusted 'get_chat_template_msg_variables' and 'get_message_vars' methods in 'jinja_template_analyzer.py' to use the new variable name - Modified 'StrategyLoader' in 'chat_template.py' to use 'field_messages' - Updated tests in 'test_chat_templates.py' and 'test_chat_templates_advanced.py' to use 'field_messages' instead of 'messages_array_name' * feat: refactor prompt strategies and update config models * Remove redundant 'return None' in `axolotl/prompt_strategies/__init__.py` * Simplify message handling in `axolotl/prompt_strategies/bradley_terry/chat_template.py` by using a single 'messages' list instead of separate 'chosen_messages' and 'rejected_messages' lists * Update default 'message_property_mappings' in `axolotl/prompt_strategies/bradley_terry/chat_template.py` * Add 'field_messages' field to `axolotl/utils/config/models/input/v0_4_1/__init__.py` configuration model * chore: remove unused input * chore: remove redundant type ignore * fix: remove old configs and update examples * fix: type check * fix: remove loading old config in ChatMessage * fix: update faq with potential new undefinederror * fix: add debug if property mapped is not found * chore: improve explanation for unmapped properties * fix: update docs with new config * chore: add note for deprecation config and del old config from dict --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> Co-authored-by: Wing Lian <wing@axolotl.ai> Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-02-18 09:59:27 +07:00
Dan Saunders	3aac3b1da9	Move sweeps code to another module (#2338 )	2025-02-17 15:46:04 -05:00
Dan Saunders	3d8425fa91	Activation function Triton kernels, LoRA custom autograd functions (#2324 ) * LoRA + activation fn Triton kernels: initial commit * implementing optims * finalizing MLP LoRA kernels and progress on QKV / W kernels * updates * O projection optim * adding monkey patching logic * doc strings, typing, pre-commit fixes * updates * adding lora 8b kernels example * working on fsdp support * tests and fixes * small fixes, getting tests to pass, adding doc strings * integration tests for LoRA patching * config.qmd * remove unneeded pytest fixture * fix * review comments first pass * improving tests, attention class agnostic patching * adding support for more archs * wip SiLU / GELU impls * improved testing, small updates, etc. * slightly updating docs * rebase * fixing test_attention_patching_integration * additional review comments, fixing test in CI (hopefully) * isolating problematic patching test * relaxing allclose threshold to reduce flakiness * fixing accidental change * adding model arch agnostic attention class fetching * removing unused activations	2025-02-17 14:23:15 -05:00
Seungduk Kim	97a2fa2781	Select input_ids explicitly after panda conversion (#2335 ) Without selecting the column, applying `len` counts the whole row as 1 which resulting the total number of the samples instead of the token counts.	2025-02-17 00:07:27 -05:00
Wing Lian	a98526ef78	add support for include_tokens_per_second in training args (#2269 ) * add support for include_tokens_per_second in training args * Update docs/config.qmd Co-authored-by: NanoCode012 <nano@axolotl.ai> * Update src/axolotl/core/trainer_builder.py Co-authored-by: NanoCode012 <nano@axolotl.ai> --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-02-13 17:39:19 -05:00
NanoCode012	2e57391bf8	fix: add missing shards_idx, preprocess_shards to docs and validator (#2331 )	2025-02-13 17:28:21 -05:00
minpeter	aa45fed451	Add `bos_token` and `add_generation_prompt` to the alpaca chat template (#2322 ) * fix alpaca add_generation_prompt * Alpaca template considering multi-turn Co-authored-by: xzuyn <xzuyn@users.noreply.github.com> --------- Co-authored-by: xzuyn <xzuyn@users.noreply.github.com>	2025-02-13 17:27:55 -05:00
NanoCode012	a09a5cfd1c	feat(doc): add tensorboard config to docs (#2329 )	2025-02-13 16:02:16 -05:00
NanoCode012	40362d60e0	feat(doc): Improve guide to dataset types with better examples (#2286 )	2025-02-13 16:01:41 -05:00
Wing Lian	ffae8d6a95	GRPO (#2307 )	2025-02-13 16:01:01 -05:00
Lee Park	fdbb1a207c	[Fixing #2149 ] load_from_disk for RL-type training (#2193 ) * Update rl.py * Update rl.py * Update rl.py * refactor pref dataset loading to reuse load_dataset_w_config * refactor again after rebase from main * chore: add docstring and types --------- Co-authored-by: Wing Lian <wing@axolotl.ai> Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-02-13 08:31:07 -05:00
bursteratom	82d04ea060	test v2batch w/ flex attn	2025-02-13 00:11:45 -05:00
Wing Lian	30046315d9	disable ray tests for latest torch release (#2328 ) * disable ray tests for latest torch release * move decorator from class to method	2025-02-12 18:29:02 -05:00
Wing Lian	e37a4a536a	lint docs (#2327 )	2025-02-12 10:04:26 -05:00
Sung Ching Liu	0ef1f011fe	Merge branch 'main' into flx_attn_support	2025-02-11 23:31:56 -05:00
Sung Ching Liu	44f64ab627	Update faq.qmd (#2319 ) * Update faq.qmd Added Q&A for being stuck on saving preprocessed datasets * Update faq.qmd added details on preprocessing on cpu * Update faq.qmd * Update faq.qmd	2025-02-11 13:18:31 -05:00
NanoCode012	826f1b1494	feat(doc): Add multi-node torchrun info (#2304 )	2025-02-08 06:02:02 -05:00
NanoCode012	526e5ee8b8	fix(config): missing config not being documented and fix model_ override (#2317 ) * fix(config): missing config not being documented and fix model_ space override * fix: delete redundant field	2025-02-08 06:01:48 -05:00
NanoCode012	fd8cb32547	chore: remove redundant py310 from tests (#2316 )	2025-02-07 21:34:16 -05:00
NanoCode012	e48e2df4dd	feat: update FA to 2.7.4.post1 which includes torch2.6 binary (#2315 )	2025-02-07 21:34:01 -05:00
Wing Lian	b7616022ab	bump transformers to 4.48.3 (#2318 )	2025-02-07 21:33:44 -05:00
Wing Lian	1faf1a5c5a	batch add of spectrum snr results (#2320 )	2025-02-07 21:33:14 -05:00
Sunny Liu	c0a1d205c7	packed doc mask starts at 1, 0 means masked out	2025-02-07 14:44:52 -05:00
NanoCode012	5bbad5ef93	feat: add torch2.6 to ci (#2311 )	2025-02-07 07:28:54 -05:00
Wing Lian	a971eb4ce6	Torch 2.6 support for base docker image (#2312 )	2025-02-05 09:24:02 -05:00
Sunny Liu	d0e739da24	attempt at getting around bf16 error	2025-02-04 21:57:21 -05:00
Sunny Liu	3f6be519d5	stack	2025-02-04 21:25:13 -05:00
Sunny Liu	adcbc7459b	misc	2025-02-04 21:17:50 -05:00
Sunny Liu	470ba65c44	make doc mask instead of the whole block mask in collator	2025-02-04 20:27:39 -05:00
NanoCode012	a620d481e2	fix: drop long seq even if not sample packing (#2211 ) * fix: drop long seq even if not sample packing * fix: logging import * fix: cfg passed being none * fix: try to fix logging * fix: refactor call to not use accelerate log * fix: try to fix circular import issue * fix: don't drop when skip prepare * chore: remove duplicate line * fix: update warning to mention that sequences will be trimmed * fix: do not drop seq if input_ids don't exist * fix: increase RM unittest sequence length to reduce trim warnings * fix: solve conflicts * fix: default min_seq_len in case of None	2025-02-04 09:43:35 -05:00
Sunny Liu	8e1adc154d	stuff	2025-02-02 20:36:14 -05:00
Sunny Liu	e5b36900e4	misc	2025-02-02 20:32:03 -05:00
Sunny Liu	9f6c89b12b	undo my stupidity	2025-02-02 20:25:53 -05:00
Sunny Liu	b0871c8d3b	attempt - mask padding	2025-02-02 20:18:49 -05:00
bursteratom	d3ea379a23	figure out slight diff from flash result	2025-02-02 01:45:54 -05:00
bursteratom	0ebab63309	test	2025-02-02 01:27:15 -05:00

1 2 3 4 5 ...

1950 Commits