Commit Graph

703 Commits

Author SHA1 Message Date
Wing Lian
2147cf6837 Llama3 dpo (#1610)
* add dpo llama3

* fix dpo bos and eos

* bos token gets added automatically by the tokenizer

* explicit <|end_of_text|> not needed, as eot_id is sufficient

---------

Co-authored-by: Nero10578 <owenarliawan@gmail.com>
2024-05-11 18:29:03 -04:00
Ram
50421c8b1d feat: Add LLaMA-3 instruct prompt strategies for fine-tuning (#1553)
* Add prompt strategies

* Update modified URL

* Update modified URL

* Update fastchat_conversation_turns.py

* Update register function

* Remove extra /n for system prompt

* Fix return

* Fix BOS

* Update requirements, pylint

* Linting

* Linting

* fix tuples, make sure to set system message in template

* tests for llama3 tokenization

* fix conditionals for loading chat template

---------

Co-authored-by: Ram <ram@Rams-MacBook-Pro.local>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-05-11 00:08:04 -04:00
Antoni-Joan Solergibert
b32c08f8cc adding llama3 fastchat conversation monkeypatch (#1539)
* adding llama3 fastchat conversation monkeypatch

* Updated conversation turns to work with PR3259 of FastChat

* fixed bos token

* bump fastchat version

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-05-10 10:40:05 -04:00
Wing Lian
fff06af8d0 ignore the fsdp_config section too (#1606) [skip ci] 2024-05-09 13:30:39 -04:00
Wing Lian
796a085b2f make sure to save the lora adapter at the end of RL/dpo training (#1573) 2024-05-08 10:39:33 -04:00
Wing Lian
cb78a36374 improve tool handling roles (#1587) 2024-05-07 11:30:40 -04:00
NanoCode012
8b9c15b17f feat: exclude mamba blocks for jamba (#1578) 2024-05-07 22:52:57 +09:00
Chirag Jain
9e1480e9ca Pass deepspeed and fsdp as None explicitly when merging adapters to allow custom device_map (#1575) 2024-05-07 22:47:55 +09:00
marijnfs
3367fca732 Gradio configuration parameters (#1591)
* Gradio Configuration Settings

* Making various Gradio variables configurable instead of hardcoded

* Remove overwriting behavour of 'default tokens' that breaks tokenizer for llama3

* Fix type of gradio_temperature

* revert un-necessary change and lint

---------

Co-authored-by: Marijn Stollenga <stollenga@imfusion.de>
Co-authored-by: Marijn Stollenga <stollenga@imfusion.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-05-06 15:43:42 -04:00
Wing Lian
29cf15a28c improve save callbacks (#1592) 2024-05-04 23:19:18 -04:00
Chirag Jain
dde02fcb94 Pass weakref to model in the SIGINT handler to free up model post train function (#1581)
* Pass weakref to model in the SIGINT handler to free up model post train()

* Fix lint issues

* chore: lint

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-05-03 11:05:28 -04:00
Ali Mosavian
b9bb169602 FIX: TRL trainer preprocessing step was running in one process (#1583)
* FIX: TRL trainer preprocessing step was running in one process

* FIX: Changed so that dataset_num_proc is sent to CPO, KTO and ORPO trainer args and directly to the trainer when DPO

* FIX: Changed back to only support ORPO for now, since KTO is handled in another way

---------

Co-authored-by: Ali Mosavian <ali.mosavian@kry.se>
2024-05-03 11:02:59 -04:00
JohanWork
601c08b4c2 ADD: warning hub model (#1301)
* update warning for save_strategy

* update

* clean up

* update

* Update test_validation.py

* fix validation step

* update

* test_validation

* update

* fix

* fix

---------

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
2024-05-01 01:05:12 +09:00
Abhinand
cc5d31e0d9 Add debug option for RL dataset preprocessing (#1404)
* adding debug option for RL dataset preprocessing

* Refine formatting of debugging code in RL dataset preprocessing

* Update __init__.py

* chore: fix lint

---------

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
2024-05-01 00:36:04 +09:00
Wing Lian
5294653a2d PoSE context length ext (#1567)
* PoSE wip

* fixes for pose splitting

* set pose context len so we can pick that up seperately from the usable training context len

* support min sample len and define num chunks

* fix chunk splitting

* support for curriculum/ordered learning with pose

* fix sequence len sort

* add curriculum_sampling to pydantic
2024-04-27 12:28:20 -04:00
Wing Lian
68601ec6ad make sure everything stays in the same dtype when using dpo + FSDP (#1559) 2024-04-22 16:00:05 -04:00
Haoxiang Wang
60f5ce0569 Add support for Gemma chat template (#1530)
* Add support for Gemma chat template

* Update fschat version to include its newest support for Gemma chat style

* pin fastchat to current HEAD

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-04-21 19:55:40 -04:00
Frank Ruis
7477a53287 wrap prepared_ds_path in str() to avoid TypeError in fsspec package (#1548)
* wrap prepared_ds_path in str() to avoid TypeError in fsspec package

`fsspec` calls `if "::" in path` on `prepared_ds_path`, which will throw an error if it is a `PosixPath` object.

* update test too

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-04-21 19:55:20 -04:00
Wing Lian
7d1d22f72f ORPO Trainer replacement (#1551)
* WIP use trl ORPOTrainer

* fixes to make orpo work with trl

* fix the chat template laoding

* make sure to handle the special tokens and add_generation for assistant turn too
2024-04-19 17:25:36 -04:00
Wing Lian
6319da1f9b Unsloth gradient checkpointing offload (#1528)
* unsloth gradient checkpointing

* fix validation too

* fixes to make it work with mistral

* monkeypatch the checkpoint fn earlier
2024-04-16 14:53:57 -04:00
Wing Lian
132eb740f0 DBRX Model Support (#1462)
* wip for dbrx finetuning

* add fastcore for parallel loading of sharded weights

* fix dtype for load, use PartialState instead of accelerator to init process group, remove redundant wandb callback

* update to use v2 of the converted model

* more fixes for dbrx loras

* make sure to enable fsdp activation checkpointing

* fix support for 8bit loras too for dbrx

* apply z3 leaf moe fix for DBRX with deepspeed

* don't raise value error since child module searches could fail and be ok

* revert a previous change to fix fsdp

* update mistral/mistral qlora+fsdp yamls

* fix qlora+fsdp quant storage type

* more edge cases for qlora-fsdp

* fixes for fsdp+qlora w optimizer in 8bit

* add bigstral z3 config and make sure to use full_state_dict for fsdp
2024-04-12 09:02:36 -04:00
Thomas Capelle
5ed29393e3 Update SaveAxolotlConfigtoWandBCallback to use artifact instead of save (#1483)
* deprecated wandb.save

* also use wandb.save for axolotl yaml

* chore: lint

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-04-09 18:58:38 -04:00
Wing Lian
da9b1a3196 use locale agnostic seperator to make large nums easier to read (#1503) 2024-04-09 17:28:43 -04:00
DavidFarago
057fa44191 WIP: Support table logging for mlflow, too (#1506)
* WIP: Support table logging for mlflow, too

Create a `LogPredictionCallback` for both "wandb" and "mlflow" if
specified.

In `log_prediction_callback_factory`, create a generic table and make it
specific only if the newly added `logger` argument is set to "wandb"
resp. "mlflow".

See https://github.com/OpenAccess-AI-Collective/axolotl/issues/1505

* chore: lint

* add additional clause for mlflow as it's optional

* Fix circular imports

---------

Co-authored-by: Dave Farago <dfarago@innoopract.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-04-09 17:28:27 -04:00
Scott Fleming
8fa0785f74 Correctly handle splits for datasets.arrow_dataset.Dataset objects (#1504)
* Correctly handle splits for datasets.arrow_dataset.Dataset objects

The `load_tokenized_prepared_datasets` function currently has logic for loading a dataset from local path that always checks if a split is in the dataset. The problem is, if the dataset is loaded using `load_from_disk` and it is an Arrow-based dataset, *there is no* split information. Instead what happens is, by calling `split in ds`, it presumably searches through all the rows and columns of the arrow dataset object to find e.g., 'train' assuming `split == 'train'`. This causes the program to hang.

See https://chat.openai.com/share/0d567dbd-d60b-4079-9040-e1de58a4dff3 for context.

* chore: lint

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-04-09 16:40:26 -04:00
Wing Lian
4313b1a6a0 Print versions (#1496)
* print out dependency versions for easier debugging

* improve readability
2024-04-09 11:05:15 -04:00
Wing Lian
ff01c45127 add field to sft dataset pydantic for completion support (#1497) 2024-04-08 21:37:54 -04:00
Wing Lian
2fa65b9599 ignore issues with calculating # params when printing (#1493) 2024-04-08 11:04:22 -04:00
xzuyn
9430b6e868 Remove validate_quantized_dora (#1485)
DoRA with quantized layers is supported with PEFT 0.10.0
2024-04-08 01:25:23 -04:00
Wing Lian
934fc851da drop empty token from beginning if tokenizer has no bos_token (in the case of qwen) (#1490) 2024-04-06 19:55:19 -07:00
NanoCode012
bda48f0150 fix: reduce sample_packing warning (#1484) 2024-04-06 21:04:07 +09:00
NanoCode012
bf4cd67252 feat: validate sample packing requires flash_attention (#1465)
* feat: validate sample packing requires flash_attention

* fix: check for sdp_attn per suggestion

* feat: add FA to tests
2024-04-05 12:47:32 +09:00
Wing Lian
05b0b7e8ca add support for cohere chat template (#1478) 2024-04-04 18:20:50 -07:00
Wing Lian
87ca3f98c6 don't use deepspeed or fsdp when merging loras (#1479) 2024-04-04 18:20:32 -07:00
Wing Lian
e0fcef403f refactor utils.data module for line count linter (#1476) 2024-04-04 16:33:42 -07:00
Wing Lian
5aa50974ce Pretrain multipack v2 (#1470) 2024-04-02 05:42:16 -07:00
Nick Doiron
586bd8d221 fix pretraining_ on odd datasets (#1463)
* can configure name of split of pretraining dataset

* streaming data and dataset map

* text column customized

* allow text_column to be set in pretrain

* pretrain type

* load a bit of the dataset

* fix dataset where splits have separate configs

* ok name param here is the config

* whitespace
2024-04-01 20:48:59 -07:00
Wing Lian
0b103775ad reduce verbosity of the special tokens (#1472) 2024-04-01 21:47:27 +09:00
Wing Lian
0ddfb24fcf LISA (#1469)
* add lisa support

* fix default and fix attribute traversal for layers

* improve lisa callback logging

* fix LISA by ensuring params are not frozen during __init__

* example config for lisa

---------

Co-authored-by: Aman Karmani <aman@tmm1.net>
2024-04-01 04:54:53 -07:00
Wing Lian
6086be85f7 qwen2_moe support w multipack (#1455) 2024-03-29 11:04:53 -04:00
Wing Lian
05b398a072 fix some of the edge cases for Jamba (#1452)
* fix some of the edge cases for Jamba

* update requirements for jamba
2024-03-29 02:38:02 -04:00
Keith Stevens
e634118f90 Support loading datasets saved via save_to_disk (#1432)
* Support loading datasetes saved via save_to_disk

* Adding comprehensive unittests

* Fix dataset tests due to new hash changes
2024-03-29 00:19:36 -04:00
Wing Lian
02af0820f7 Jamba (#1451)
* fixes for larger models

* add qlora example for deepspeed

* add readme for jamba
2024-03-28 21:03:22 -04:00
Wing Lian
4155e9988f fix layer_replication arg to peft (#1446) 2024-03-27 10:18:56 -04:00
Wing Lian
25afd35842 support layer replication for peft and fix rslora integration (#1445) 2024-03-27 10:16:47 -04:00
Wing Lian
da265dd796 fix for accelerate env var for auto bf16, add new base image and expand torch_cuda_arch_list support (#1413) 2024-03-26 16:46:19 -04:00
WenboPan
e07347b188 Remove seq_len arg in rotary_emb (#1443)
* remove seq_len in llama rotary_emb

* chore: lint

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-03-26 15:19:44 -04:00
Far El
bcdc9b1601 Fix falcon tokenization step (#1441) [skip ci]
* Fix falcon tokenization step

* chore: lint

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-03-26 15:19:34 -04:00
Wing Lian
601b77bc9d make sure to capture non-null defaults from config validation (#1415) 2024-03-26 15:18:47 -04:00
NanoCode012
ff939d8a64 fix(dataset): normalize tokenizer config and change hash from tokenizer class to tokenizer path (#1298)
* fix(dataset): normalize tokenizer config and change hash from tokenizer class to tokenizer path

* fix: normalize config
2024-03-25 15:34:54 +09:00