* Pass weakref to model in the SIGINT handler to free up model post train()
* Fix lint issues
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* FIX: TRL trainer preprocessing step was running in one process
* FIX: Changed so that dataset_num_proc is sent to CPO, KTO and ORPO trainer args and directly to the trainer when DPO
* FIX: Changed back to only support ORPO for now, since KTO is handled in another way
---------
Co-authored-by: Ali Mosavian <ali.mosavian@kry.se>
* PoSE wip
* fixes for pose splitting
* set pose context len so we can pick that up seperately from the usable training context len
* support min sample len and define num chunks
* fix chunk splitting
* support for curriculum/ordered learning with pose
* fix sequence len sort
* add curriculum_sampling to pydantic
* add example for mistral orpo
* sample_packing: false for orpo
* go to load_dataset (since load_rl_datasets require a transfom_fn, which only dpo uses currently)
* Add support for Gemma chat template
* Update fschat version to include its newest support for Gemma chat style
* pin fastchat to current HEAD
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* wrap prepared_ds_path in str() to avoid TypeError in fsspec package
`fsspec` calls `if "::" in path` on `prepared_ds_path`, which will throw an error if it is a `PosixPath` object.
* update test too
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* WIP use trl ORPOTrainer
* fixes to make orpo work with trl
* fix the chat template laoding
* make sure to handle the special tokens and add_generation for assistant turn too
* wip for dbrx finetuning
* add fastcore for parallel loading of sharded weights
* fix dtype for load, use PartialState instead of accelerator to init process group, remove redundant wandb callback
* update to use v2 of the converted model
* more fixes for dbrx loras
* make sure to enable fsdp activation checkpointing
* fix support for 8bit loras too for dbrx
* apply z3 leaf moe fix for DBRX with deepspeed
* don't raise value error since child module searches could fail and be ok
* revert a previous change to fix fsdp
* update mistral/mistral qlora+fsdp yamls
* fix qlora+fsdp quant storage type
* more edge cases for qlora-fsdp
* fixes for fsdp+qlora w optimizer in 8bit
* add bigstral z3 config and make sure to use full_state_dict for fsdp
* WIP: Support table logging for mlflow, too
Create a `LogPredictionCallback` for both "wandb" and "mlflow" if
specified.
In `log_prediction_callback_factory`, create a generic table and make it
specific only if the newly added `logger` argument is set to "wandb"
resp. "mlflow".
See https://github.com/OpenAccess-AI-Collective/axolotl/issues/1505
* chore: lint
* add additional clause for mlflow as it's optional
* Fix circular imports
---------
Co-authored-by: Dave Farago <dfarago@innoopract.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* Correctly handle splits for datasets.arrow_dataset.Dataset objects
The `load_tokenized_prepared_datasets` function currently has logic for loading a dataset from local path that always checks if a split is in the dataset. The problem is, if the dataset is loaded using `load_from_disk` and it is an Arrow-based dataset, *there is no* split information. Instead what happens is, by calling `split in ds`, it presumably searches through all the rows and columns of the arrow dataset object to find e.g., 'train' assuming `split == 'train'`. This causes the program to hang.
See https://chat.openai.com/share/0d567dbd-d60b-4079-9040-e1de58a4dff3 for context.
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* can configure name of split of pretraining dataset
* streaming data and dataset map
* text column customized
* allow text_column to be set in pretrain
* pretrain type
* load a bit of the dataset
* fix dataset where splits have separate configs
* ok name param here is the config
* whitespace
* add lisa support
* fix default and fix attribute traversal for layers
* improve lisa callback logging
* fix LISA by ensuring params are not frozen during __init__
* example config for lisa
---------
Co-authored-by: Aman Karmani <aman@tmm1.net>