Commit Graph

1556 Commits

Author SHA1 Message Date
sunny
1f09f48d8f fixed small typo 2024-08-16 11:27:56 -04:00
sunny
b44546df6f fixedtypo 2024-08-16 11:22:50 -04:00
Sunny
967fbf8152 fixedtypo 2024-08-16 11:17:52 -04:00
Sunny
c144a1ae65 fixed small typo 2024-08-16 10:57:42 -04:00
NanoCode012
68a3c7678a fix: parse model_kwargs (#1825) 2024-08-16 07:51:19 -04:00
NanoCode012
f18925fb4b fix: parse eager_attention (#1824) 2024-08-14 09:46:46 -04:00
Wing Lian
1853d6021d bump hf dependencies (#1823)
* bump hf dependencies

* revert optimum version change

* don't bump tokenizers all the way to 0.20 yet since transformers doesn't support that
2024-08-11 16:27:41 -04:00
Chiwan Park
0801f239cc fix the incorrect max_length for chat template (#1818) 2024-08-09 11:50:31 -04:00
Wing Lian
54392ac8a6 Attempt to run multigpu in PR CI for now to ensure it works (#1815) [skip ci]
* Attempt to run multigpu in PR CI for now to ensure it works

* fix yaml file

* forgot to include multigpu tests

* fix call to cicd.multigpu

* dump dictdefault to dict for yaml conversion

* use to_dict instead of casting

* 16bit-lora w flash attention, 8bit lora seems problematic

* add llama fsdp test

* more tests

* Add test for qlora + fsdp with prequant

* limit accelerate to 2 processes and disable broken qlora+fsdp+bnb test

* move multigpu tests to biweekly
2024-08-09 11:50:13 -04:00
Wing Lian
3e2b269d06 update tinyllama to use final instead of checkpoints (#1820) [skip ci] 2024-08-09 10:58:19 -04:00
Wing Lian
5ee4b7325f fix z3 leaf configuration when not using lists (#1817) [skip ci] 2024-08-09 10:54:52 -04:00
Wing Lian
70978467a0 skip no commit to main on ci (#1814) 2024-08-06 15:25:54 -04:00
Wing Lian
850f999a76 update peft and transformers (#1811) 2024-08-06 10:32:05 -04:00
Wing Lian
c56e0a79a5 logging improvements (#1808) [skip ci]
* logging improvements

* fix sort
2024-08-06 10:31:50 -04:00
Wing Lian
35d5e59d78 set z3 leaf for deepseek v2 (#1809) [skip ci]
* set z3 leaf for deepseek v2

* add deepseek v2 chat template
2024-08-06 09:30:46 -04:00
Wing Lian
fbbeb4fee0 remove un-necessary zero-first guard as it's already only called in a parent fn (#1810) [skip ci] 2024-08-06 09:29:23 -04:00
Wing Lian
ecdda006de One cycle lr (#1803)
* refactor one_cycle lr scheduler so it's reusable in more situations

* fix validation for lr_scheduler

* default to cosine anneal strategy

* one cycle lr exepects cos
2024-08-05 13:12:05 -04:00
Ben Feuer
b7665c26c8 Update conversation.qmd (#1788) [skip ci] 2024-08-05 12:44:26 -04:00
Aaditya Ura (looking for PhD Fall’24)
cb023c70db Update instruct-lora-8b.yml (#1789) [skip ci]
Config is giving an error if not using the end of the token as the `pad_to_sequence_len` is true.
2024-08-05 12:43:20 -04:00
ripes
7402eb9dcb Fix setting correct repo id when pushing dataset to hub (#1657)
* use the ds hash as the dataset's config_name

* improve logging for loading/pushing ds to hub

* fix missing f string
2024-08-05 12:42:15 -04:00
Sri Kainkaryam
203816f7b4 Fix colab example notebook (#1805) [skip ci] 2024-08-04 13:24:26 -04:00
Wing Lian
78b42a3fe1 fix roles to train defaults and make logging less verbose (#1801) 2024-07-30 20:58:17 -04:00
Wing Lian
3ebf22464b qlora-fsdp ram efficient loading with hf trainer (#1791)
* fix 405b with lower cpu ram requirements

* make sure to use doouble quant and only skip output embeddings

* set model attributes

* more fixes for sharded fsdp loading

* update the base model in example to use pre-quantized nf4-bf16 weights

* upstream fixes  for qlora+fsdp
2024-07-30 19:21:38 -04:00
Wing Lian
dbf8fb549e publish axolotl images without extras in the tag name (#1798) 2024-07-30 13:36:19 -04:00
Wing Lian
9a63884597 update test and main/nightly builds (#1797)
* update test and main/nightly builds

* don't install mamba-ssm on 2.4.0 since it has no wheels yet
2024-07-30 12:37:40 -04:00
Wing Lian
c5587b45ac use 12.4.1 instead of 12.4 [skip-ci] (#1796) 2024-07-30 08:50:23 -04:00
Wing Lian
d4f6a6b103 fix dockerfile and base builder (#1795) [skip-ci] 2024-07-30 08:34:37 -04:00
Wing Lian
d8d1788ffc move to supporting mostly 12.1 w 2.3.1 and add new 12.4 with 2.4.0 (#1793) 2024-07-30 08:06:11 -04:00
mhenrichsen
3bc8e64557 Update README.md (#1792) 2024-07-30 07:59:53 +02:00
Adam Brusselback
55cc214c76 Add flexible configuration options for chat_template dataset training (#1756)
* Add flexible configuration options for chat dataset training

- Introduce roles_to_train parameter to set training labels by role
- Add train_on_eos option to configure training on end-of-sequence tokens
- Implement per-message training configuration in dataset
- Allow fine-grained control over training specific portions of messages
- Add message_field_training and message_field_training_detail settings
- Implement mapping between dataset character offsets and tokenized prompt
- Enhance test suite to cover new functionality

* Fix missing field inits, things weren't working from yaml.

* Add flexible configuration options for chat dataset training

- Introduce roles_to_train parameter to set training labels by role
- Add train_on_eos option to configure training on end-of-sequence tokens
- Implement per-message training configuration in dataset
- Allow fine-grained control over training specific portions of messages
- Add message_field_training and message_field_training_detail settings
- Implement mapping between dataset character offsets and tokenized prompt
- Enhance test suite to cover new functionality

* Fix missing field inits, things weren't working from yaml.

* chore: lint

* Revert test repo back to NousResearch after opening PR to fix the tokenizer_config.json.

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-07-28 21:48:57 -04:00
Wing Lian
94ba93259f various batch of fixes (#1785)
* various batch of fixes

* more tweaks

* fix autoawq requirement for torch flexibility

* simplify conditionals

* multi-node fixes wip

* bump transformers and include 405b qlora+fsdp yaml
2024-07-28 07:25:54 -04:00
Wing Lian
22680913f3 Bump deepspeed 20240727 (#1790)
* pin deepspeed to 0.14.4 otherwise it doesn't play nice with trl

* Add test to import to try to trigger import dependencies
2024-07-27 10:24:11 -04:00
Wing Lian
6a9cfec222 add support for simpo via cpo trainer (#1772)
* add support for simpo via cpo trainer

* add cpo_alpha / sft_weight from the paper

* make sure to use the right builder for simpo
2024-07-23 21:22:16 -04:00
Wing Lian
fe250ada78 fix fsdp loading of models, esp 70b (#1780) 2024-07-23 19:54:28 -04:00
Wing Lian
e6b299dd79 bump flash attention to 2.6.2 (#1781) [skip ci] 2024-07-23 19:54:15 -04:00
Wing Lian
608a2f3180 bump transformers for updated llama 3.1 (#1778)
* bump transformers for updated llama 3.1

* bump for patch fix
2024-07-23 13:21:03 -04:00
Wing Lian
87455e7f32 swaps to use newer sample packing for mistral (#1773)
* swaps to use newer sample packing for mistral

* fix multipack patch test

* patch the common fa utils

* update for refactor of flash attn unpad

* remove un-needed drop attn mask for mistral

* bump transformers to main to pick up latest mistral fix for 12b and refactor of fa2

* update test
2024-07-23 01:41:11 -04:00
Keith Stevens
985819d89b Add a chat_template prompt strategy for DPO (#1725)
* Implementing a basic chat_template strategy for DPO datasets

This mimics the sft chat_template strategy such that users can:
* Specify the messages field
* Specify the per message role and content fields
* speicfy the chosen and rejected fields
* Let the tokenizer construct the raw prompt
* Ensure the chosen and rejected fields don't have any prefix tokens

* Adding additional dpo chat template unittests

* Rename test class
2024-07-21 09:10:42 -04:00
Wing Lian
fa91b698e9 Fix untrained tokens (#1771)
* fix untrained reserved tokens

* save model after fixing untrained embeddings

* don't need fsdp conditional here
2024-07-19 12:21:37 -04:00
Wing Lian
e4063d60a7 bump transformers and set roundup_power2_divisions for more VRAM improvements, low bit ao optimizers (#1769)
* bump transformers and set roundup_power2_divisions for more VRAM improvements

* support for low bit optimizers from torch ao

* fix check for alternate optimizers and use nous models on hf for llama3

* add missing check for ao_adamw_fp8

* fix check when using custom optimizers w adamw
2024-07-19 00:47:07 -04:00
Wing Lian
7830fe04b5 Unsloth rope (#1767)
* Add unsloth rope embeddings support

* support for models weights in 4bit and do some memory gc

* use accelerate logger

* add unsloth llama rms norm optims

* update docs for unsloth

* more docs info
2024-07-18 14:54:41 -04:00
Wing Lian
c86c32a627 set the number of dataset processes on the DPO Config rather than the trainer (#1762) 2024-07-17 15:38:37 -04:00
Wing Lian
8731b95d04 re-enable PYTORCH_CUDA_ALLOC_CONF expandable_segments (#1765) [skip ci] 2024-07-17 15:38:26 -04:00
Wing Lian
8619b2d855 add torch_compile_mode options (#1763) [skip ci]
* add torch_compile_mode options

* make sure n_gpu is an int
2024-07-17 15:38:07 -04:00
Wing Lian
976f85195a fixes to accelerator so that iterable pretraining datasets work (#1759)
* fixes to accelerator so that iterable pretraining datasets work

* fix the pretraining test params

* split batches, not dispatch batches needs to be set

* update c4 datasets

* set epochs in pretrain config test

* need to set both split_batches and dispatch_batches to false for pretraining

* fix bool val in comment
2024-07-17 10:58:38 -04:00
Wing Lian
152ab76623 fix num gpu check (#1760) 2024-07-17 10:58:14 -04:00
Wing Lian
5f58555bd0 support for llama multipack using updated code/patches (#1754)
* support for llama multipack using updated code/patches

* also support unsloth patches

* incorrect arg

* add config validation for unsloth

* add missing return to validation

* add another missing return to validation
2024-07-16 17:36:29 -04:00
Wing Lian
cfc533a7f7 torch compile and cuda alloc improvements (#1755)
* enable experimental expandable_segments

* hf trainer seems to be missing torch compile

* disable PYTORCH_CUDA_ALLOC_CONF to see if that fixes cicd
2024-07-16 16:00:23 -04:00
Wing Lian
e1725aef2b update modal package and don't cache pip install (#1757)
* update modal package and cleanup pip cache

* more verbosity on the test
2024-07-16 14:45:38 -04:00
Wing Lian
78e12f8ca5 add basic support for the optimi adamw optimizer (#1727)
* add support for optimi_adamw optimizer w kahan summation

* pydantic validator for optimi_adamw

* workaround for setting optimizer for fsdp

* make sure to install optimizer packages

* make sure to have parity for model parameters passed to optimizer

* add smoke test for optimi_adamw optimizer

* don't use foreach optimi by default
2024-07-14 19:12:57 -04:00