Commit Graph

1114 Commits

Author SHA1 Message Date
Hamel Husain
7bbaac98f7 fix mistral prompt assembly (#982)
* fix mistral prompts

* fix spacing

* remove elif
2023-12-21 08:00:55 -08:00
Wing Lian
161bcb6517 Dockerfile torch fix (#987)
* add torch to requirements.txt at build time to force version to stick

* fix xformers check

* better handling of xformers based on installed torch version

* fix for ci w/o torch
2023-12-21 09:38:20 -05:00
Ikko Eltociear Ashimine
d25c34caa6 Update README.md (#966) 2023-12-17 09:51:25 -05:00
NanoCode012
13e938149d fix: add lr scheduler kwargs to Trainer (#972) 2023-12-17 18:48:28 +09:00
Wing Lian
85de004dd4 fix for build for nccl in dockerfile (#970) 2023-12-16 19:12:01 -05:00
Wing Lian
80ec7af358 update to latest nccl in docker image (#965) 2023-12-16 18:31:25 -05:00
dumpmemory
f28e75513b update transformers to fix checkpoint saving (#963) 2023-12-15 21:03:17 -05:00
Hamel Husain
5ada140ff0 Fix prompt assembly for llama (#952)
* start at index 0

* add test to check for missing turns

* apply black

* Update test_prompt_tokenizers.py

* Update src/axolotl/monkeypatch/fastchat_conversation_turns.py

Co-authored-by: Motoki Wu <tokestermw@gmail.com>

* fix linting

* apply black

* add more tests for llama/sharegpt

* make logic clearer

---------

Co-authored-by: Motoki Wu <tokestermw@gmail.com>
2023-12-14 10:03:59 -08:00
Hamel Husain
712fd27b3f Add docs (#947)
* move section

* update README

* update README

* update README

* update README

* update README

* Update README.md

Co-authored-by: Wing Lian <wing.lian@gmail.com>

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2023-12-13 14:22:52 -08:00
kallewoof
ef24342538 fix: switch to using the HuggingFace Transformers NEFT implementation (#941)
* fix: switch to using the HuggingFace Transformers NEFT implementation

* linter

* add support for noisy_embedding_alpha with a warning about it being renamed

* restore pre/posttrain_hooks

* move validation of NEFT noise alpha into validate_config()

* linter
2023-12-13 17:15:34 -05:00
Wing Lian
5ea3aa31f0 Fix Deepspeed loading (#950)
* add check for zero3

* freeze parameters

* fixes for deepspeed loading

* fix model parameter check

* unfrozen parameters in example mixtral and logging when unfreezing
2023-12-13 16:03:23 -05:00
Wing Lian
f1f60cb5b2 Flash attn hotfix (#951)
* use previous  arg

* use eager to use legacy attention that can be patched
2023-12-13 13:42:23 -05:00
kallewoof
450e04d3c4 fix: remove excessive newlines in system prompt(s) for alpaca (#936) 2023-12-13 16:40:02 +09:00
Juraj Bednar
b0cf397ecb More hints on what to do with CUDA Out of memory errors (#925) 2023-12-13 16:38:38 +09:00
Wing Lian
5f79b8242f new evals_per_epoch and saves_per_epoch to make things cleaner (#944)
* new evals_per_epoch and saves_per_epoch to make things cleaner

* update per PR feedback
2023-12-12 15:35:23 -05:00
Hamel Husain
f1de29dd1e Respect sequence_len in config for type: llama2_chat (#926)
* Respect sequence_len in config for `type: llama2_chat`

It was hardcoded to `4096` I am not sure why?  This updates it to pull from the config. 

cc: @winglian

* Update llama2_chat.py

* apply black formatting

* fix tokenizer

* update test data

* lint fixtures
2023-12-12 09:39:22 -08:00
Wing Lian
7fabc4d95e Mixtral official (#942)
* multipack support for official mixtral implementation

* fix patch to load multipack for mixtral

* chore: lint
2023-12-11 23:44:33 -05:00
Motoki Wu
9a5eb3990c Update requirements.txt (#940) 2023-12-11 22:57:28 -05:00
Casper
86487c2e96 Mixtral: More correct MoE, lower loss (#932)
* More correct MoE

* Fix formatting
2023-12-10 10:34:25 -05:00
Wing Lian
35f9b0f149 update to latest transformers for mixstral support (#929)
* update to latest transformers for mixstral support

* pin transformers

* fix typo
2023-12-10 10:32:27 -05:00
Wing Lian
68b227a7d8 Mixtral multipack (#928)
* mixtral multipack

* use mixtral model

* sample yml

* calculate cu_seqlens properly

* use updated flash ettention setting

* attn var checks

* force use of flash attention 2 for packing

* lint

* disable future fix for now

* update support table
2023-12-09 21:26:30 -05:00
Timothy Lim
03c6318ba3 fixing prompt template of chatml by removal of linebreak (#922)
Co-authored-by: Timothy  Lim <timothyyonglee.lim@kxrdev.com>
2023-12-09 13:07:44 -05:00
Wing Lian
40a6362c92 support for mamba (#915)
* support for mamba

* more mamba fixes

* use fork for mamba kwargs fix

* grad checkpointing doesn't work

* fix extras for mamaba

* mamba loss fix

* use fp32 and remove verbose logging

* mamba fixes

* fix collator for mamba

* set model_type on training_args

* don't save safetensors for mamba

* update mamba config to disable safetensor checkpooints, install for tests

* no evals for mamba tests

* handle save_pretrained

* handle unused safetensors arg
2023-12-09 12:10:41 -05:00
NanoCode012
d339beb9d9 chore: clarify Readme on sharegpt system role 2023-12-08 11:35:53 +09:00
NanoCode012
fde091cb12 fix(tokenizer): handle fast tokenizer properly for bos/eos (#914) 2023-12-08 11:31:13 +09:00
Casper
06ae39200b Pin flash-attn to 2.3.3 (#919) 2023-12-07 07:36:52 +01:00
NanoCode012
a581e9f8f6 feat: add check for quantized model (#913)
* feat: add check for quantized model

* chore: refactor and add another check

* Update src/axolotl/utils/models.py

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2023-12-05 01:20:06 +09:00
Bryan Thornbury
992e742cdc Support device_map=sequential & max_memory config parameters (#903)
* Support device_map sequential (and others). Support max_memory in cfg.

* Update documentation in README accordingly.

* Update README.md

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2023-12-04 09:29:21 -05:00
NanoCode012
a1da39cd48 Feat(wandb): Refactor to be more flexible (#767)
* Feat: Update to handle wandb env better

* chore: rename wandb_run_id to wandb_name

* feat: add new recommendation and update config

* fix: indent and pop disabled env if project passed

* feat: test env set for wandb and recommendation

* feat: update to use wandb_name and allow id

* chore: add info to readme
2023-12-04 22:17:25 +09:00
kallewoof
58ec8b1113 feature: loss watchdog for terminating training runs that are failing (#899)
Co-authored-by: Karl-Johan Alm <kalle@gmail.com>
2023-12-04 07:54:34 -05:00
Haoxiang Wang
476a205cea Remove learning rate scheduler in deepspeed config to avoid conflict (#909) 2023-12-04 05:17:38 -05:00
Wing Lian
3e3229e2d9 fix for qwen w lora (#906) 2023-11-30 12:45:50 -05:00
Wing Lian
1d21aa6b0a ensure merged model matches the training dtype (#902)
* ensure merged model matches the training dtype

* Update src/axolotl/cli/__init__.py

* Update src/axolotl/cli/__init__.py
2023-11-29 09:55:19 -05:00
kallewoof
71b7ea3c05 Determine FSDP/deepspeed settings on device select. (#883)
* Determine FSDP/deepspeed settings on device select.

Without this, the OS env check for accelerate will fail.

* rename and move env setup call

* chore: lint

---------

Co-authored-by: Karl-Johan Alm <kalle@gmail.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2023-11-29 08:36:35 -05:00
NanoCode012
a48dbf6561 fix: remove FA for qwen examples (#900)
* fix: remove FA for qwen lora

* fix: remove FA for qlora
2023-11-27 21:23:54 +09:00
Wing Lian
6a4562ac08 update datasets version to cut down the warnings due to pyarrow arg change (#897)
* update datasets to cut down the warnings

* set versions for tokenizers and gradio

* upgrade transformers to latest version
2023-11-25 16:30:00 -05:00
NanoCode012
1115c501b8 Feat: Add Qwen (#894)
* Feat: Add Qwen

* feat: add qwen lora example

* feat: update matrix

* fix: add trust_remote_code

* fix: disable gradient checkpointing

* chore: add warning about gradient checkpointing

* fix: config

* fix: turn off sample packing for this example and reduce seq len

* chore: add comment on seq len
2023-11-26 00:05:01 +09:00
NanoCode012
7ee3c4cacb fix: warning should not show if eval_batch_size not provided (#896) 2023-11-25 16:04:00 +09:00
NanoCode012
fb12895a17 Feat: Add warmup_ratio (#893)
* Feat: Add warmup_ratio

* fix: update readme with more details on conflict
2023-11-25 12:15:43 +09:00
NanoCode012
9fc29e082b chore(doc): Add info on changing role in sharegpt (#886) 2023-11-22 15:32:50 +09:00
NanoCode012
575a082aae fix: revert local dir dataset load (#878) 2023-11-18 22:50:41 +09:00
Mark Saroufim
ddf815022a Install from git url (#874)
* Install from git url

* Update README.md
2023-11-17 12:50:51 -05:00
Wing Lian
9bf854e59c Phi update 202311 (#876)
* add phi modeling from hf

* update for packing and use new modeling class for phi

* update e2e tests for phi to use new model name

* update example phi to also use new phi model name

* use AutoModelForCausalLM for phi lora since sample packing isn't supported
2023-11-17 12:47:17 -05:00
Wing Lian
797f3dd1de don't train if eval split is too small (#873)
* allow zero len dataset

* better handling and warning of small eval splits

* raise error if eval split is too small

* don't mess with calculating total num steps in distributed context

* fix eval_sample_packing training args logic
2023-11-16 11:35:42 -05:00
Wing Lian
0de1457189 try #2: pin hf transformers and accelerate to latest release, don't reinstall pytorch (#867)
* isolate torch from the requirements.txt

* fix typo for removed line ending

* pin transformers and accelerate to latest releases

* try w auto-gptq==0.5.1

* update README to remove manual peft install

* pin xformers to 0.0.22

* bump flash-attn to 2.3.3

* pin flash attn to exact version
2023-11-16 10:42:36 -05:00
NanoCode012
3cc67d2cdd Feat: Add dataset loading from S3, GCS (#765)
* Feat: Add dataset loading from S3, GCS

* chore: update docs

* chore: add more info on cloud loading
2023-11-16 14:33:58 +09:00
Wing Lian
1bc11868eb allow overriding of model_config parameters from the YML (#853)
* allow overriding of model_config parameters from the YML

* remove old logging, update readme

* move the updating of model config to the load_model_config function

* add warning for deprecated rope_scaling in the root of the YML config
2023-11-15 23:47:08 -05:00
Wing Lian
b3a61e8ce2 add e2e tests for checking functionality of resume from checkpoint (#865)
* use tensorboard to see if resume from checkpoint works

* make sure e2e test is either fp16 or bf16

* set max_steps and save limit so we have the checkpoint when testing resuming

* fix test parameters
2023-11-15 23:05:55 -05:00
Wing Lian
8a8d1c4023 make docker command more robust (#861)
* make docker command more robust

* update readme with more info
2023-11-15 23:03:54 -05:00
Wing Lian
332984db18 lint fix that didn't get caught by linter (#866) 2023-11-15 14:36:40 -05:00