Commit Graph

1167 Commits

Author SHA1 Message Date
NanoCode012
043c3860cd fix: train_on_inputs: true ignored for sharegpt (#1045) [skip ci]
* fix: `train_on_inputs: true` ignored for sharegpt

* enable unit test for train_on_inputs for sharegpt

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-01-09 23:00:09 -05:00
Wing Lian
0f100800e3 be more robust about checking embedding modules for lora finetunes (#1074) [skip ci]
* be more robust about checking embedding modules for lora finetunes

* update dynamic error message
2024-01-09 22:58:54 -05:00
Wing Lian
ead34c516a swap the data collator for evals if not using sample packing (#1076)
* swap the data collator for evals if not using sample packing

* drop last from dataloader to help with issues with evals
2024-01-09 22:16:24 -05:00
Wing Lian
ec02b7cc4e Update FUNDING.yml [skip ci] 2024-01-09 22:15:27 -05:00
Wing Lian
3b4c646f87 Update FUNDING.yml with bitcoin (#1079) [skip ci] 2024-01-09 21:56:52 -05:00
Wing Lian
788649fe95 attempt to also run e2e tests that needs gpus (#1070)
* attempt to also run e2e tests that needs gpus

* fix stray quote

* checkout specific github ref

* dockerfile for tests with proper checkout

ensure wandb is dissabled for docker pytests
clear wandb env after testing
clear wandb env after testing
make sure to provide a default val for pop
tryin skipping wandb validation tests
explicitly disable wandb in the e2e tests
explicitly report_to None to see if that fixes the docker e2e tests
split gpu from non-gpu unit tests
skip bf16 check in test for now
build docker w/o cache since it uses branch name ref
revert some changes now that caching is fixed
skip bf16 check if on gpu w support

* pytest skip for auto-gptq requirements

* skip mamba tests for now, split multipack and non packed lora llama tests

* split tests that use monkeypatches

* fix relative import for prev commit

* move other tests using monkeypatches to the correct run
2024-01-09 21:23:23 -05:00
Casper
9be92d1448 Separate AutoGPTQ dep to pip install -e .[auto-gptq] (#1077)
* Separate AutoGPTQ dep to `pip install -e .[auto-gptq]`

* Fix code review
2024-01-09 23:39:25 +01:00
Wing Lian
d7057ccd36 paired kto support (#1069) 2024-01-09 13:30:45 -05:00
mtenenholtz
768d348f42 update peft to 0.7.0 (#1073) 2024-01-09 12:22:14 -05:00
Johan Hansson
090c24dcb0 Add: mlflow for experiment tracking (#1059) [skip ci]
* Update requirements.txt

adding mlflow

* Update __init__.py

Imports for mlflow

* Update README.md

* Create mlflow_.py (#1)

* Update README.md

* fix precommits

* Update README.md

Update mlflow_tracking_uri

* Update trainer_builder.py

update trainer building

* chore: lint

* make ternary a bit more readable

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-01-09 09:34:09 -05:00
Wing Lian
651b7a31fc fix double eos token for chatml (#1054) [skip ci]
* fix double eos token for chatml

* isolate fix to chatml conversation

* fix add special tokens to include rstrip

* add test for train_on_inputs for sharegpt

* don't use rstrip for chatml
2024-01-09 09:33:38 -05:00
Ricardo Dominguez-Olmedo
04b978b428 Cosine learning rate schedule - minimum learning rate (#1062)
* Cosine min lr

* Cosine min lr - warn if using deepspeed

* cosine_min_lr_ratio readme

* chore: lint

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-01-09 09:29:56 -05:00
NanoCode012
c3e8165f26 fix: torch_dtype mistral default to fp32 (#1050) 2024-01-09 07:48:15 -05:00
Wing Lian
7f381750d9 Update FUNDING.yml for Kofi link (#1067) 2024-01-08 19:26:51 -05:00
Wing Lian
14964417ee Sponsors (#1065)
* wip sponsors section in readme

* add ko-fi and contributors list
2024-01-08 18:52:00 -05:00
Ricardo Dominguez-Olmedo
81d384598e Efficiently get the length of the tokenized docs (#1063)
* Efficiently get the length of the tokenized docs

* chore: lint

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-01-08 15:48:30 -05:00
Wing Lian
732851f105 Phi2 rewrite (#1058)
* restore to current phi modeling code from phi-2

* enable gradient checkpointing

* don't cast everything to float32 all the time

* gradient checkpointing for phi2 ParallelBlock module too

* fix enabling flash attn for phi2

* add comment about import

* fix phi2 example

* fix model type check for tokenizer

* revert float32 -> bf16 casting changes

* support fused dense flash attn

* fix the repo for flash-attn

* add package name for subdir pkg

* fix the data collator when not using sample packing

* install packaging for pytests in ci

* also fix setup to not install flash attn fused dense subdir if not extras

* split out the fused-dense-lib in extra requires

* don't train w group_by_length for phi

* update integration test to use phi2

* set max steps and save steps for phi e2e tests

* try to workaround ssave issue in ci

* skip phi2 e2e test for now
2024-01-08 14:04:22 -05:00
Hamel Husain
9ca358b671 Simplify Docker Unit Test CI (#1055) [skip ci]
* Update tests-docker.yml

* Update tests-docker.yml

* run ci tests on ci yaml updates

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-01-06 08:20:33 -05:00
JinK
553c80f79a streaming multipack for pretraining dataset (#959)
* [Feat] streaming multipack

* WIP make continued pretraining work w multipack

* fix up hadrcoding, lint

* fix dict check

* update test for updated pretraining multipack code

* fix hardcoded data collator fix for multipack pretraining

* fix the collator to be the max length for multipack pretraining

* don't bother with latest tag for test

* cleanup docker build/test

---------

Co-authored-by: jinwonkim93@github.com <jinwonkim>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-01-05 22:13:21 -05:00
Hamel Husain
eb4c99431b Update tests-docker.yml (#1052) [skip ci] 2024-01-05 14:26:18 -05:00
NanoCode012
cbdbf9e6e5 feat: always push checkpoint to hub if set (#1049) [skip ci] 2024-01-05 13:09:42 -05:00
kallewoof
bdfefaf054 feature: better device mapping for large models (#918)
* fix: improved memory handling when model is bigger than existing VRAM

* feature: add lora_on_cpu flag to do LoRA loading on CPU (RAM)

For big models where the models are taking up the entire GPU VRAM, the LoRA part will fail unless it is loaded on CPU only.

* doc: add README

* fix: enable progress bars in do_merge_lora()

* doc: mention gpu_memory_limit and lora_on_cpu in merge part of README

* Update src/axolotl/utils/models.py

Co-authored-by: Wing Lian <wing.lian@gmail.com>

* fix: remove deletion of removed model_kwargs key

* fix: validate that gpu_memory_limit and max_memory are not both set

---------

Co-authored-by: Karl-Johan Alm <kalle@gmail.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-01-05 22:22:21 +09:00
Hamel Husain
63fb3eb426 set default for merge (#1044) 2024-01-04 18:14:20 -08:00
Hamel Husain
31d23504a5 fix model card upload for PEFT models (#1043) 2024-01-04 18:13:54 -08:00
Wing Lian
f243c2186d RL/DPO (#935)
* ipo-dpo trainer

* fix missing abstract method

* chatml template, grad checkpointing kwargs support

* fix steps calc for RL and add dataloader kwargs

* wip to fix dpo and start ppo

* more fixes

* refactor to generalize map fn

* fix dataset loop and handle argilla pref dataset

* set training args

* load reference model on seperate gpu if more than one device

* no auto upload to hub for dpo, don't add lora adapters to ref model for dpo

* fixes for rl training

* support for ipo from yaml

* set dpo training args from the config, add tests

* chore: lint

* set sequence_len for model in test

* add RLHF docs
2024-01-04 18:22:55 -05:00
xaviviro
59b2d302c8 Added chatglm3 conversation type for training models like TinyLLama (#1036)
* Added chatgml3 conversation type for training models like TinyLLama

* Added chatgml3 conversation type for training models like TinyLLama with lint

* Added chatgml3 conversation type for training models like TinyLLama with lint
2024-01-04 21:03:04 +09:00
Wing Lian
bcc78d8fa3 bump transformers and update attention class map name (#1023)
* bump transformers and update attention class map name

* also run the tests in docker

* add mixtral e2e smoke test

* fix base name for docker image in test

* mixtral lora doesn't seem to work, at least check qlora

* add testcase for mixtral w sample packing

* check monkeypatch for flash attn multipack

* also run the e2e tests in docker

* use all gpus to run tests in docker ci

* use privileged mode too for docker w gpus

* rename the docker e2e actions for gh ci

* set privileged mode for docker and update mixtral model self attn check

* use fp16/bf16 for mixtral w fa2

* skip e2e tests on docker w gpus for now

* tests to validate mistral and mixtral patches

* fix rel import
2024-01-03 12:11:04 -08:00
NanoCode012
74532ddc45 chore(config): clean up old log for Qwen (#1034) 2024-01-04 01:19:52 +09:00
NanoCode012
8ba27f3bde fix: lint (#1037) 2024-01-03 10:23:44 -05:00
Hamel Husain
a3e8783328 [Docs] delete unused cfg value lora_out_dir (#1029)
* Update README.md

* Update README.md

* Update README.md

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

---------

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
2024-01-02 21:35:20 -08:00
NanoCode012
b31038aae9 chore(readme): update instruction to set config to load from cache (#1030) 2024-01-03 11:56:19 +09:00
Tim Dolan
c75f916745 added tiny llama examples for lora and qlora (#1027)
* added tiny llama examples for lora and qlora

* corrected yml files and removed tiny-llama.yml from llama-2 example
2024-01-02 20:00:37 -05:00
Wing Lian
4d2e842e46 use recommended setting for use_reentrant w gradient checkpointing (#1021)
* use recommended setting for use_reentrant w gradient checkpointing

* add doc for gradient_checkpointing_kwargs
2024-01-01 22:17:27 -05:00
Tazik Shahjahan
3678a6c41d Fix: bf16 support for inference (#981)
* Fix: bf16 torch dtype

* simplify casting to device and dtype

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2023-12-29 16:15:53 -06:00
mhenrichsen
f8ae59b0a8 Adds chat templates (#1022) 2023-12-29 15:44:23 -06:00
Hamel Husain
4f4d638b84 [WandB] Push axolotl config to top level wandb files (#1014) 2023-12-29 10:52:12 -08:00
Wing Lian
ba043a361e add ultrachat prompt strategies (#996) 2023-12-29 12:23:29 -06:00
NanoCode012
41353d2ea0 feat: expose bnb kwargs (#1018)
* feat: expose bnb kwargs

* chore: added examples and link per suggestion

* Uncomment defaults per suggestion for readability

Co-authored-by: Hamel Husain <hamel.husain@gmail.com>

---------

Co-authored-by: Hamel Husain <hamel.husain@gmail.com>
2023-12-29 18:16:26 +09:00
NanoCode012
f6ecf14dd4 feat: remove need to add load_in* during merge (#1017) 2023-12-29 18:15:30 +09:00
Hamel Husain
dec66d7c53 [Docs] Nit: Remind people to auth to wandb if they are going to use it (#1013) 2023-12-28 18:00:16 -08:00
Hamel Husain
76357dc5da Update README.md (#1012) 2023-12-28 18:00:02 -08:00
Wing Lian
70b46ca4f4 remove landmark attn and xpos rope implementations (#1010) 2023-12-27 21:07:27 -08:00
Hamel Husain
85dd4d525b add config to model card (#1005)
* add config to model card

* rm space

* apply black formatting

* apply black formatting

* fix formatting

* check for cfg attribute

* add version

* add version

* put the config in a collapsible element

* put the config in a collapsible element
2023-12-27 21:25:33 -06:00
Kevin Sydney
384b817dc0 Set eval_sample_packing to false in mistral config.yaml (#1003)
Without eval_sampling_packing set to false, ValueError occurs with eval dataset split is too small for sample_packing.
2023-12-27 16:11:55 -08:00
Younes Belkada
db9094df0f FEAT: add tagging support to axolotl (#1004)
* add tagging support to axolotl

* chore: lint

* fix method w self

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2023-12-27 16:25:20 -06:00
Evan Griffiths
6ef46f8dca Add an example config for finetuning a 34B model on a 24GB GPU (#1000)
* Add an example config for finetuning a 34B model on a 24GB GPU

* Remore wandb project
2023-12-25 10:29:55 -08:00
Wing Lian
628b754824 set output_router_logits for mixtral config: (#995) 2023-12-22 12:57:02 -05:00
Wing Lian
37820f6540 support for cuda 12.1 (#989) 2023-12-22 11:08:22 -05:00
NanoCode012
7d4185ffcb chore: Update transformers to latest (#986) 2023-12-23 00:29:36 +09:00
mhenrichsen
93ebec1ac5 change val size (#992) 2023-12-22 16:18:16 +01:00