Commit Graph

84 Commits

Author SHA1 Message Date
Dan Saunders
954e192f38 quick formatting fix for LoRA optims doc (#2349) 2025-02-19 09:23:31 -05:00
Dan Saunders
c3d4f6e295 Doc fix: TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL not necessary to use Triton kernel patches (#2343)
* removing note about TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL

* suggest using TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL for memory efficient attn
2025-02-18 10:06:31 -05:00
NJordan72
b194e17c28 feat: add config for optional parameters in a chat message (#2260)
* feat: add config for optional parameters in a chat message

* chore: cleanup

* chore: fix nits and add light docs

* docs: update docs/dataset-formats/conversation.qmd

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* feat: configurable message mappings, jinja template analyzer

* chore: handle bradley terry

* docs: update docs

* refactor: change order of mappings, improve message transform

* refactor: make chat awware of property mappings

* chore: remove .python-version

* chore: revert change

* chore: add dataset validation to tests where appropriate

* chore: add dataset validation to tests where appropriate

* chore: clean up handling of ds_cfg

* chore: recursively serialize config

* make sure to use the return value from validate_config

* DefaultDict pickle/unpickle fix

* fix super call for override

* refactor: message fields

* chore: empty commit

* tests: validate config before using

* chore: add config validation to all e2e tests

* chore: add unneeded logging

* chore: add missed config validation

* chore: pass field_messages to prompter

* test: fix borked test

* chore: remove uninteded file

* chore: add deprecation warning and update chat_datasets script

* chore: lint

* refactor: message fields

* feat: update axolotlinputconfig and test_models

- add configdict import in axolotl/utils/config/models/input/v0_4_1/__init__.py
- remove unnecessary line breaks in sftdataset, dpodataset, ktodataset, stepwisesuperviseddataset classes
- update model_dump method in axolotlinputconfig to exclude none values
- correct typo in test_models.py comment

* feat: simplify dpodataset and ktodataset classes in config models

removed several optional fields from dpodataset and ktodataset classes in axolotl/utils/config/models/input/v0_4_1. this simplifies the configuration subsets for these datasets.

* feat: improve readability and structure in dataset configuration models

this commit enhances the readability and structure of the dataset configuration models in the `axolotl/utils/config/models/input/v0_4_1` module. it removes unused `configdict` import and adds line breaks to separate class definitions for better clarity. additionally, a minor documentation fix is included to ensure a newline at the end of the `stepwise_supervised.qmd` file.

* feat: change log level from info to debug in chattemplatestrategy

* feat(prompt_strategies): refactor chattemplateprompter and chattemplatestrategy

- Make `chat_template` a required parameter in `ChatTemplatePrompter` constructor
- Add default value for `message_property_mappings` in `ChatTemplatePrompter` constructor
- Add `messages_array_name` property to `ChatTemplatePrompter`
- Change `processor` type to Optional in `ChatTemplatePrompter`
- Add TypeError check for `processor` in `ChatTemplatePrompter.build_prompt`
- Remove `_messages` property from `ChatTemplateStrategy`
- Make `prompter` a required parameter and add type hint in `ChatTemplateStrategy` constructor
- Remove `messages` getter and setter from `ChatTemplateStrategy`
- Use `prompter.messages_array_name` in `ChatTemplateStrategy.get_conversation_thread`
- Remove condition to set `messages` field in `load` function

* feat(tests/utils): ignore type check in load_model call in test_models.py

* feat: improve type handling and test structure in chat templates

- Add return type hint for `get_chat_template` function in `chat_templates.py`
- Remove unnecessary assignment of `strategy.messages` in several test cases
- Add `messages_array_name` parameter to various test configurations in `test_chat_templates.py` and `test_chat_templates_advanced.py`
- Remove redundant `strategy.messages` assignment in `test_chat_templates_advanced.py`

* feat(axolotl): enhance chat strategy with datasetconfig support

This commit introduces support for DatasetConfig in the ChatTemplateStrategy. It also refines the strategy loader to handle different types of ds_cfg inputs and improves the clarity of the code by formatting and reordering. The key changes include:

- Importing Union from typing and BaseModel from pydantic.
- Adding DatasetConfig as an optional type for ds_cfg in StrategyLoader.
- Adjusting the handling of ds_cfg in StrategyLoader to account for BaseModel instances.
- Refactoring the prompter_params and strategy_params for better readability.
- Changing the reference from prompt[self.messages] to prompt[self.prompter.messages_array_name] in the is_prompt_batched method.

* feat: update message handling in btchattemplatestrategy

* Replace `self.messages` with direct string references to "chosen_messages" and "rejected_messages"
* Append system, user, and assistant content directly to "chosen_messages" and "rejected_messages"
* Add a new attribute "messages_array_name" to the `load` function parameters
* Remove the conditional attribute assignment for "field_messages" in the `load` function

* feat: add config validation in test_kd.py

- Import `validate_config` from `axolotl.utils.config`
- Validate the configuration in `test_llama_kd` and another function in `TestKnowledgeDistillation` class

* feat: enhance config validation and capabilities handling

* Import `EnvCapabilities` and `GPUCapabilities` from `axolotl.utils.config.models.internals`
* Update `validate_config` function to create `KTODataset` and `SFTDataset` instances using `dict(ds_cfg)`
* Replace `capabilities` and `env_capabilities` with instances of `GPUCapabilities` and `EnvCapabilities` respectively in `AxolotlConfigWCapabilities` model dump

* feat: update config validation in axolotl utils

- Remove import of `EnvCapabilities` and `GPUCapabilities` from `axolotl.utils.config.models.internals`
- Update `validate_config` function to use `capabilities` and `env_capabilities` directly instead of creating new instances of `GPUCapabilities` and `EnvCapabilities`

* feat: refactor strategyloader in chat_template.py

- Extracted the creation of strategy parameters into a separate function, `_get_strategy_params(cfg, dataset_config)`
- Created a new function, `_get_strategy_cls()`, to obtain the strategy class
- Replaced `ChatTemplateStrategy` with `strategy_cls` for strategy instantiation

* trigger CI

* chore: revert dataset config changes for kto/dpo

* subject: refactor: rename 'messages_array_name' to 'field_messages'

Body:
- Renamed 'messages_array_name' to 'field_messages' in 'ChatTemplatePrompter' class and its usages in 'chat_template.py'
- Updated 'load' function in 'bradley_terry/chat_template.py' to reflect the change
- Adjusted 'get_chat_template_msg_variables' and 'get_message_vars' methods in 'jinja_template_analyzer.py' to use the new variable name
- Modified 'StrategyLoader' in 'chat_template.py' to use 'field_messages'
- Updated tests in 'test_chat_templates.py' and 'test_chat_templates_advanced.py' to use 'field_messages' instead of 'messages_array_name'

* feat: refactor prompt strategies and update config models

* Remove redundant 'return None' in `axolotl/prompt_strategies/__init__.py`
* Simplify message handling in `axolotl/prompt_strategies/bradley_terry/chat_template.py` by using a single 'messages' list instead of separate 'chosen_messages' and 'rejected_messages' lists
* Update default 'message_property_mappings' in `axolotl/prompt_strategies/bradley_terry/chat_template.py`
* Add 'field_messages' field to `axolotl/utils/config/models/input/v0_4_1/__init__.py` configuration model

* chore: remove unused input

* chore: remove redundant type ignore

* fix: remove old configs and update examples

* fix: type check

* fix: remove loading old config in ChatMessage

* fix: update faq with potential new undefinederror

* fix: add debug if property mapped is not found

* chore: improve explanation for unmapped properties

* fix: update docs with new config

* chore: add note for deprecation config and del old config from dict

---------

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
Co-authored-by: NanoCode012 <nano@axolotl.ai>
2025-02-18 09:59:27 +07:00
Dan Saunders
3d8425fa91 Activation function Triton kernels, LoRA custom autograd functions (#2324)
* LoRA + activation fn Triton kernels: initial commit

* implementing optims

* finalizing MLP LoRA kernels and progress on QKV / W kernels

* updates

* O projection optim

* adding monkey patching logic

* doc strings, typing, pre-commit fixes

* updates

* adding lora 8b kernels example

* working on fsdp support

* tests and fixes

* small fixes, getting tests to pass, adding doc strings

* integration tests for LoRA patching

* config.qmd

* remove unneeded pytest fixture

* fix

* review comments first pass

* improving tests, attention class agnostic patching

* adding support for more archs

* wip SiLU / GELU impls

* improved testing, small updates, etc.

* slightly updating docs

* rebase

* fixing test_attention_patching_integration

* additional review comments, fixing test in CI (hopefully)

* isolating problematic patching test

* relaxing allclose threshold to reduce flakiness

* fixing accidental change

* adding model arch agnostic attention class fetching

* removing unused activations
2025-02-17 14:23:15 -05:00
Wing Lian
a98526ef78 add support for include_tokens_per_second in training args (#2269)
* add support for include_tokens_per_second in training args

* Update docs/config.qmd

Co-authored-by: NanoCode012 <nano@axolotl.ai>

* Update src/axolotl/core/trainer_builder.py

Co-authored-by: NanoCode012 <nano@axolotl.ai>

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>
2025-02-13 17:39:19 -05:00
NanoCode012
2e57391bf8 fix: add missing shards_idx, preprocess_shards to docs and validator (#2331) 2025-02-13 17:28:21 -05:00
NanoCode012
a09a5cfd1c feat(doc): add tensorboard config to docs (#2329) 2025-02-13 16:02:16 -05:00
NanoCode012
40362d60e0 feat(doc): Improve guide to dataset types with better examples (#2286) 2025-02-13 16:01:41 -05:00
Wing Lian
e37a4a536a lint docs (#2327) 2025-02-12 10:04:26 -05:00
Sung Ching Liu
44f64ab627 Update faq.qmd (#2319)
* Update faq.qmd

Added Q&A for being stuck on saving preprocessed datasets

* Update faq.qmd

added details on preprocessing on cpu

* Update faq.qmd

* Update faq.qmd
2025-02-11 13:18:31 -05:00
NanoCode012
826f1b1494 feat(doc): Add multi-node torchrun info (#2304) 2025-02-08 06:02:02 -05:00
NanoCode012
526e5ee8b8 fix(config): missing config not being documented and fix model_ override (#2317)
* fix(config): missing config not being documented and fix model_ space override

* fix: delete redundant field
2025-02-08 06:01:48 -05:00
Wing Lian
78ce268848 KD Trainer w logprobs (#2303)
* refactor trainer to prevent circular dependencies later

fix loader default
KD dataset loading and KD with logprobs
filter bad rows
make batch smaller
handle padding/collation for KD datasets
make it work
flipped the slice
cross entropy loss coefficient during KD
make sure to multiply against the correct loss
chore: lint
triton wip
no where support
v2 trial
no torch.exp inside triton kernel
no log etc
no torch.tensor
v3
fix kwarg
don't use triton for now
better rescaling for temperatures
hash for temperature too
use kd_alpha in the correct loss method
fix kd loss so it's causal (fixes repeating tokens)
var naming and add todo
chore: lint
refactor so we can easily add new loss functions
add license block
remove references to triton kd for now
handle token/logprob shifting
support for custom trainer classes from plugins
refactor kd chat template loader
move more things to kd plugin
remove moved class from import
make plugin setup concise
increase logging around loading plugins
add copyrights
remove duplicate code
more info on preprocess for kd and fix import
be a bit pickier about loading dynamic prompt strategies
kd sample packing
make loss torch script compat
support streaming for processing sft datasts?
improve iterable support
ensure that batch vs single is done properly
tweak check for batched prompt data
reward can use same batch check
fix reward trainer calls for tokenization
improve check for batched
reward model doesn't work well with batched
add kd trainer e2e test
linting
rename test files so it gets picked up
make the kd e2e fit in vram for ci and add lora version
set lora_dropout explicitly
lower lr
make sure to set tokenizer from l3 70b and save safetensors
make sure to use the correct tokenizer
fix adapter model check
make sure to use tensorboard to capture loss for checks
chore: lint
chore: lint
improve logprob masking and shift in trainer
more fixes
try tests for kd on l40s
don't shift student logits for kd
no batching for kd chat templates
make sure to truncate logprobs if there are more than top_k
change up logic so we always truncate to top_k
use iter instead of tuple
fix finding the top-k rather than assuming first position has the correct val
apply z-score scaling to kd
kd loss needs to be calculated in full precision
Always re-normalize teacher distribution
various fixes

* support for configurable top-k/softmax ordering

* add attribute check for filter rows and lint

* fix logic

* handle none case for conversion to int

* fix student logit off by one

* set kd_temp to 1.0 for test loss

* address PR feedback
2025-01-31 20:18:52 -05:00
Wing Lian
cf17649ef3 Misc fixes 20250130 (#2301)
* misc fixes for garbage collection and L40S w NCCL P2P

* patch bnb fix for triton check

* chore: lint

* change up import

* try patching differently

* remove patch for bnb fix for now

* more verbose checks and tweak train loss threshold
2025-01-31 08:58:04 -05:00
Dan Saunders
6f294c3d8d refactor README; hardcode links to quarto docs; add additional quarto doc pages (#2295)
* refactor README; hardcode links to quarto docs; add additional quarto doc pages

* updates

* review comments

* update

---------

Co-authored-by: Dan Saunders <dan@axolotl.ai>
2025-01-30 12:49:21 -05:00
Wing Lian
8779997ba5 native support for modal cloud from CLI (#2237)
* native support for modal cloud from CLI

* do lm_eval in cloud too

* Fix the sub call to lm-eval

* lm_eval option to not post eval, and append not extend

* cache bust when using branch, grab sha of latest image tag, update lm-eval dep

* allow minimal yaml for lm eval

* include modal in requirements

* update link in README to include utm

* pr feedback

* use chat template

* revision support

* apply chat template as arg

* add wandb name support, allow explicit a100-40gb

* cloud is optional

* handle accidental setting of tasks with a single task str

* document the modal cloud yaml for clarity [skip ci]

* cli docs

* support spawn vs remote for lm-eval

* Add support for additional docker commands in modal image build

* cloud config shouldn't be a dir

* Update README.md

Co-authored-by: Charles Frye <cfrye59@gmail.com>

* fix annotation args

---------

Co-authored-by: Charles Frye <cfrye59@gmail.com>
2025-01-30 11:34:02 -05:00
Eric Tang
268543a3be Ray Train Axolotl Integration (#2251)
* current

not clean working version
move torch trainer to do_cli
update code with config changes and clean up
edit config
cleanup
add run name to trainer

* address comments

* use axolotl train in multigpu tests and add ray tests for multi-gpu

* accelerate uses underscores for main_process_port arg

* chore: lint

* fix order of accelerate args

* include ray train in docker images

* current

not clean working version
move torch trainer to do_cli
update code with config changes and clean up
edit config
cleanup
add run name to trainer

* address comments

* use axolotl train in multigpu tests and add ray tests for multi-gpu

* accelerate uses underscores for main_process_port arg

* chore: lint

* fix order of accelerate args

* include ray train in docker images

* fix bf16 resolution behavior

* move dtype logic

* x

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

* rename

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

* add to sidebar

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

* Apply suggestions from code review

Co-authored-by: Eric Tang <46737979+erictang000@users.noreply.github.com>

* Update docs/ray-integration.qmd

Co-authored-by: Eric Tang <46737979+erictang000@users.noreply.github.com>

* pre-commit fixes

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>

* use output_dir instead of hardcoded saves path

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* bugfix storage dir

* change type\ for resources_per_worker

---------

Signed-off-by: SumanthRH <sumanthrh@anyscale.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
Co-authored-by: SumanthRH <sumanthrh@anyscale.com>
Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
2025-01-29 00:10:19 -05:00
salman
54dd7abfc1 Process reward models (#2241)
* adding model_cfg to set num_labels

* using a num_labels field instead

* linting

* WIP stepwise prompt tokenizer

* this should work?

* trainer working?

* pushing to runpod

* fixing saving

* updating conf

* updating config, adding docs

* adding stepwise supervision docpage

* updating tests

* adding test for dataset

* fixing tests

* linting

* addressing some comments

* adding additional cfg fields support

* updating tests, fixing cfg

* fixing tests

* updating loss

* Update test_process_reward_model_smollm2.py

* updating loss values and seed

* dumb pre-commit
2025-01-29 00:08:33 -05:00
Wing Lian
887513285d support for custom lr groups for non-embedding modules (#2213)
* support for custom lr groups for non-embedding modules

invert name check for group modules
include lr_groups in training args
additional conditional for creating optimizer
fix regular params as w weight decay
fix lookup and add docs

* address pr feedback
2025-01-24 12:56:28 -05:00
NanoCode012
6086162488 chore(doc): improve explanation for *_steps and *_strategy (#2270) 2025-01-24 10:07:02 -05:00
Wing Lian
af727eedf7 option to not concatenate during pretraining (#2263)
* option to not concatenate during pretraining

* simplify conditional and add doc to config.qmd
2025-01-20 14:07:34 -05:00
Wing Lian
f89e962119 skip over rows in pretraining dataset (#2223)
* skip over rows in pretraining dataset

* update docs
2025-01-13 10:44:45 -05:00
Wing Lian
bd2a594b89 use DataCollatorWithFlattening when not sample packing (#2167) 2024-12-17 17:46:44 -05:00
Wing Lian
3798229d85 handle torch_compile set to auto (#2172) [skip ci]
* handle torch_compile set to auto

* update docs [skip ci]

* add tests
2024-12-17 16:42:41 -05:00
NanoCode012
10cfecf02e fix: use apply_chat_template to find turn boundaries and allow tool_calling field (#2179) [skip ci]
* fix: use apply_chat_template to find turn boundaries and allow tool_calling field

* fix: keys to include in turn

* feat(doc): explicitly recommend setting train_on_eos and roles_to_train

* fix: eos not being masked for tool due to template padding

* chore: clear up docs

* fix: default messages format, train_on_eos: turn, and train on all assistant msg

* fix: properly warn if empty content

* feat: parametrize chat_template tests to test different tokenizers

* fix: set proper default for message key

* fix: update defaults to match load function

* fix: change defaults to use new

* feat: add tool_calling dataset

* feat: add tool_calling test

* fix: add handling of edge case of mistral tokenizer with only system prompt

* feat: refactor all test to follow source code

* fix: remove unnecessary eos_token from phi35

* fix test for phi3.5 since eos was dropped from chat_template

---------

Co-authored-by: Wing Lian <wing@axolotl.ai>
2024-12-17 16:42:21 -05:00
Wing Lian
33090486d7 [feature] add pytorch profiling (#2182)
* add pytorch profiling

* kick off the profiler asap since things may get allcoated before train start

* document feature

* add url for visualizer [skip ci]
2024-12-16 12:38:43 -05:00
Wing Lian
d009ead101 fix build w pyproject to respect insalled torch version (#2168)
* fix build w pyproject to respect insalled torch version

* include in manifest

* disable duplicate code check for now

* move parser so it can be found

* add checks for correct pytorch version so this doesn't slip by again
2024-12-10 16:25:25 -05:00
NanoCode012
c78de6f214 feat: add kto example (#2158) [skip ci] 2024-12-09 08:17:27 -05:00
Sunny Liu
d5f58b6509 Check torch version for ADOPT optimizer + integrating new ADOPT updates (#2104)
* added torch check for adopt, wip

* lint

* gonna put torch version checking somewhere else

* added ENVcapabilities class for torch version checking

* lint + pydantic

* ENVCapabilities -> EnvCapabilities

* forgot to git add v0_4_1/__init__.py

* removed redundancy

* add check if env_capabilities not specified

* make env_capabilities compulsory [skip e2e]

* fixup env_capabilities

* modified test_validation.py to accomodate env_capabilities

* adopt torch version test [skip e2e]

* raise error

* test correct torch version

* test torch version above requirement

* Update src/axolotl/utils/config/models/input/v0_4_1/__init__.py

Co-authored-by: Wing Lian <wing.lian@gmail.com>

* removed unused is_totch_min

---------

Co-authored-by: Wing Lian <wing@axolotl.ai>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-12-02 20:15:39 -05:00
Oliver Molenschot
b620ed94d0 Add Exact Deduplication Feature to Preprocessing Pipeline (#2072)
* Add example YAML file for training Mistral using DPO

* added deduplication code

* Add exact deduplication feature and update examples

* Improve deduplication for train/eval overlap

Changed the deduplication function to use a more memory-efficient hashing method. Applied Git suggestions to improve clarity and maintainability.\n\nThe deduplication now handles cases where train and eval datasets have overlapping elements.

* Improve deduplication for train/eval overlap

Changed the deduplication function to use a more memory-efficient hashing method. Applied Git suggestions to improve clarity and maintainability.\n\nThe deduplication now handles cases where train and eval datasets have overlapping elements.

* Apply suggestions from code review

To handle the original case where we do not do deduplication

Co-authored-by: Wing Lian <wing.lian@gmail.com>

* Improve false collision detection to ensure dataset integrity

- Added test cases to simulate and verify handling of forced hash collisions between datasets.
- Ensured that datasets with identical hashes but different content are correctly identified, preventing incorrect deduplication.
- Updated unit tests to include scenarios where collisions occur across both training and evaluation datasets, as well as within a single dataset.

* Moved the constants file to the tests folder

- Relocated `constants.py` to the `tests` folder to improve modularity and maintain a clear separation between source and test files.
- Renamed `cicd/tests.py` to `cicd/cicd_tests.py` to resolve a conflict with `tests/__init__.py`, which caused Mypy to fail due to duplicate module names.
- Updated all references to `cicd.tests` in the codebase to `cicd.cicd_tests` to reflect the renaming and ensure compatibility.
- These changes ensure Mypy passes the pre-commit hook and maintain alignment with the project's structure.

* revert some changes from previous commit and fix relative import

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
2024-12-02 08:47:10 -05:00
Wing Lian
2e99bb303e fix inference when no chat_template is set, fix unsloth dora check (#2092)
* fix inference when no chat_template is set, fix unsloth dora check

* remove old unsloth version check

* update docs on installing unsloth
2024-11-20 14:07:54 -05:00
Wing Lian
15f1462ccd support passing trust_remote_code to dataset loading (#2050) [skip ci]
* support passing trust_remote_code to dataset loading

* add doc for trust_remote_code in dataset config
2024-11-15 19:09:48 -05:00
Sunny Liu
1d7aee0ad2 ADOPT optimizer integration (#2032) [skip ci]
* adopt integration

* stuff

* doc and test for ADOPT

* rearrangement

* fixed formatting

* hacking pre-commit

* chore: lint

* update module doc for adopt optimizer

* remove un-necessary example yaml for adopt optimizer

* skip test adopt if torch<2.5.1

* formatting

* use version.parse

* specifies required torch version for adopt_adamw

---------

Co-authored-by: sunny <sunnyliu19981005@gmail.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
2024-11-13 17:10:17 -05:00
Wing Lian
234e94e9dd replace references to personal docker hub to org docker hub (#2036) [skip ci] 2024-11-11 15:09:29 -05:00
Wing Lian
fd3b80716a remove fastchat and sharegpt (#2021)
* remove fastchat and sharegpt

* remove imports

* remove more fastchat imports

* chore: remove unused functions

* feat: remove sharegpt and deprecate from docs

* chore: remove unused sharegpt checks

* fix: remove sharegpt type from tests

* feat: add sharegpt deprecation error

* feat: update readme

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>
2024-11-08 13:45:49 -05:00
Sunny Liu
3265b7095e Add weighted optimisation support for trl DPO trainer integration (#2016)
* trlv0.12.0  integration

* update trl version requirements

* linting

* commenting out

* trl version requirement
2024-11-08 11:29:11 -05:00
NanoCode012
8c3a727f9d feat: update yml chat_template to specify dataset field (#2001) [skip ci]
* feat: update yml chat_template to specify dataset field

* feat: replace sharegpt references with chat_template
2024-10-29 10:26:03 -04:00
NanoCode012
bfc77b0f36 Feat: Add support for tokenizer’s or custom jinja chat_template (#1970)
* Allow using tokenizer's default chat template with fallbacks

Summary of changes:

1. Adds `tokenizer_default` as option for `chat_template` in
   `chat_template` prompt strategy that allows using the chat template
   from tokenizer's config.json
2. Allows falling back to chat templates available in axolotl if
   tokenizer does not have a chat template
3. Adds a mistral chat template which supports system message - taken
   from https://github.com/chujiezheng/chat_templates/blob/main/chat_templates/mistral-instruct.jinja

---

Why?

Many popular models are not trained with chatml format. As a result for
the model to correctly learn chatml we have to turn on train_on_inputs
which requires more compute and time. If we can use the model's already
learned chat template we can just learn the output tokens

---

Todo:

- Write tests

* Add tests

* Fix lint and bug post merge from main

* Add option `chat_template_jinja` to provide a jinja template

* remove custom mistral template

* Address review comments and add docs

* Update docs/dataset-formats/conversation.qmd

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* fix: set default to tokenizer template

* Merge branch 'main' into cj_tokenizer_default_prompt_template

* chore: remove redundant function

* fix: re-arrange enum declaration position

* fix: refactor artifact left from main merge

* feat(doc): updated config with chat template options and clarified examples

* chore: clarify doc

* chore: added example for non-default template

* chore: refactor

* fix: test

* fix: config being dropped and unittest to catch that

* chore: lint

* chore: skip duplicate

* fix: rename var after merge

* feat: add test for levy's dpo case

* fix: remove default setting on edge case where chat template overriden in dataset section

* feat: handle sharegpt deprecation better in docs

* feat: add example using fallback

* feat: handles chat_template requiring specific user/assistant order

* fix: update test based on new defaults

* fix: imported name incorrectly updated on merge

* chore: lint

* fix: update dummy message to prevent potential overlap with real content

* fix(doc): formatting

* fix: update bradleyterry to use new chat_template

---------

Co-authored-by: Chirag Jain <jain.chirag925@gmail.com>
2024-10-29 10:14:51 +07:00
NanoCode012
ac128b7b1d fix: update eval causal lm metrics to add perplexity (#1951) [skip ci] 2024-10-12 21:41:13 -04:00
Adam Hazell
922db77521 Add MLFlow run name option in config (#1961)
Co-authored-by: Adam Hazell <adam.hazell@mindfoundry.ai>
2024-10-11 13:33:06 -04:00
Thomas Cleberg
e73b8dff8d Add Support for revision Dataset Parameter to specify reading from Huggingface Dataset Revision (#1912)
* Add support for `revision` dataset parameter

* only use revision on hf hub backed datasets

* use revision tied to head

* set download to use revision

* feat: add config to model validator class

* feat: add revision config to RL and tests for it

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: NanoCode012 <nano@axolotl.ai>
2024-10-11 13:32:50 -04:00
Boris Feld
6d3caadf90 Comet integration (#1939)
* Add first version of a Comet integration

* Remove debug prints

* Add test for Comet Configuration transformation to env variables

* Fix last lint warning

* Update Readme for Comet logging documentation

* Update Comet integration to be optional, update code and tests

* Add documentation for Comet configuration

* Add missing check
2024-10-09 16:03:37 -04:00
Wing Lian
e1915f5625 Multimodal Vision Llama - rudimentary support (#1940)
---------

Co-authored-by: Sunny <sunny@Sunnys-MacBook-Air.local>
Co-authored-by: sunny <sunnyliu19981005@gmail.com>
2024-10-02 21:02:48 -04:00
Alpay Ariyak
ab461d83c4 Fix documentation for pre-tokenized dataset (#1894)
It's currently asking to not add BOS and EOS, stating that Axolotl adds them, but this is not true
2024-09-05 23:11:31 +09:00
Wing Lian
93b769a979 lint fix and update gha regex (#1899) 2024-09-05 09:58:21 -04:00
Tijmen de Haan
f18f4268b5 Docs for AMD-based HPC systems (#1891)
* Add documentation for installing on AMD-based HPC systems.

* Accept suggestion to add note about deepspeed

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* Update _quarto.yml with amd_hpc doc

---------

Co-authored-by: Tijmen de Haan <tijmen.dehaan@gmail.com>
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
2024-09-05 18:33:19 +09:00
Aman Gupta Karmani
de4ea2d1f2 docs: minor syntax highlight fix (#1839) 2024-08-22 11:47:34 -04:00
Ben Feuer
b7665c26c8 Update conversation.qmd (#1788) [skip ci] 2024-08-05 12:44:26 -04:00
Wing Lian
e4063d60a7 bump transformers and set roundup_power2_divisions for more VRAM improvements, low bit ao optimizers (#1769)
* bump transformers and set roundup_power2_divisions for more VRAM improvements

* support for low bit optimizers from torch ao

* fix check for alternate optimizers and use nous models on hf for llama3

* add missing check for ao_adamw_fp8

* fix check when using custom optimizers w adamw
2024-07-19 00:47:07 -04:00
Wing Lian
7830fe04b5 Unsloth rope (#1767)
* Add unsloth rope embeddings support

* support for models weights in 4bit and do some memory gc

* use accelerate logger

* add unsloth llama rms norm optims

* update docs for unsloth

* more docs info
2024-07-18 14:54:41 -04:00