Wing Lian
3202f19f52
add save_only_model arg
2024-04-10 16:09:08 -04:00
Thomas Capelle
5ed29393e3
Update SaveAxolotlConfigtoWandBCallback to use artifact instead of save ( #1483 )
...
* deprecated wandb.save
* also use wandb.save for axolotl yaml
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-04-09 18:58:38 -04:00
Wing Lian
da9b1a3196
use locale agnostic seperator to make large nums easier to read ( #1503 )
2024-04-09 17:28:43 -04:00
DavidFarago
057fa44191
WIP: Support table logging for mlflow, too ( #1506 )
...
* WIP: Support table logging for mlflow, too
Create a `LogPredictionCallback` for both "wandb" and "mlflow" if
specified.
In `log_prediction_callback_factory`, create a generic table and make it
specific only if the newly added `logger` argument is set to "wandb"
resp. "mlflow".
See https://github.com/OpenAccess-AI-Collective/axolotl/issues/1505
* chore: lint
* add additional clause for mlflow as it's optional
* Fix circular imports
---------
Co-authored-by: Dave Farago <dfarago@innoopract.com >
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-04-09 17:28:27 -04:00
Scott Fleming
8fa0785f74
Correctly handle splits for datasets.arrow_dataset.Dataset objects ( #1504 )
...
* Correctly handle splits for datasets.arrow_dataset.Dataset objects
The `load_tokenized_prepared_datasets` function currently has logic for loading a dataset from local path that always checks if a split is in the dataset. The problem is, if the dataset is loaded using `load_from_disk` and it is an Arrow-based dataset, *there is no* split information. Instead what happens is, by calling `split in ds`, it presumably searches through all the rows and columns of the arrow dataset object to find e.g., 'train' assuming `split == 'train'`. This causes the program to hang.
See https://chat.openai.com/share/0d567dbd-d60b-4079-9040-e1de58a4dff3 for context.
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-04-09 16:40:26 -04:00
Wing Lian
4313b1a6a0
Print versions ( #1496 )
...
* print out dependency versions for easier debugging
* improve readability
2024-04-09 11:05:15 -04:00
Maziyar Panahi
7f17eff81a
Fix the wrong adapter in qwen2-moe-qlora example ( #1501 ) [skip ci]
...
It should be `qlora` instead of `lora`
2024-04-09 10:57:24 -04:00
Wing Lian
ff01c45127
add field to sft dataset pydantic for completion support ( #1497 )
2024-04-08 21:37:54 -04:00
Wing Lian
2fa65b9599
ignore issues with calculating # params when printing ( #1493 )
2024-04-08 11:04:22 -04:00
xzuyn
9430b6e868
Remove validate_quantized_dora ( #1485 )
...
DoRA with quantized layers is supported with PEFT 0.10.0
2024-04-08 01:25:23 -04:00
Wing Lian
934fc851da
drop empty token from beginning if tokenizer has no bos_token (in the case of qwen) ( #1490 )
2024-04-06 19:55:19 -07:00
NanoCode012
bda48f0150
fix: reduce sample_packing warning ( #1484 )
2024-04-06 21:04:07 +09:00
NanoCode012
bf4cd67252
feat: validate sample packing requires flash_attention ( #1465 )
...
* feat: validate sample packing requires flash_attention
* fix: check for sdp_attn per suggestion
* feat: add FA to tests
2024-04-05 12:47:32 +09:00
Wing Lian
05b0b7e8ca
add support for cohere chat template ( #1478 )
2024-04-04 18:20:50 -07:00
Wing Lian
87ca3f98c6
don't use deepspeed or fsdp when merging loras ( #1479 )
2024-04-04 18:20:32 -07:00
Wing Lian
e0fcef403f
refactor utils.data module for line count linter ( #1476 )
2024-04-04 16:33:42 -07:00
NanoCode012
c2b64e4dcf
Feat: update doc ( #1475 ) [skip ci]
...
* feat: update doc contents
* chore: move batch vs ga docs
* feat: update lambdalabs instructions
* fix: refactor dev instructions
2024-04-04 13:43:40 +09:00
Hamel Husain
5760099bd4
fix toc
2024-04-03 12:05:49 -07:00
Wing Lian
5aa50974ce
Pretrain multipack v2 ( #1470 )
2024-04-02 05:42:16 -07:00
James Melvin Ebenezer
cae608f587
Added pip install ninja to accelerate installation of flash-attn ( #1461 )
...
* Added pip install ninja to accelerate installation of flash-attn
* doc: cleanup
2024-04-02 17:36:41 +09:00
Nick Doiron
586bd8d221
fix pretraining_ on odd datasets ( #1463 )
...
* can configure name of split of pretraining dataset
* streaming data and dataset map
* text column customized
* allow text_column to be set in pretrain
* pretrain type
* load a bit of the dataset
* fix dataset where splits have separate configs
* ok name param here is the config
* whitespace
2024-04-01 20:48:59 -07:00
Hamel Husain
86b7d22f35
Reorganize Docs ( #1468 )
2024-04-01 08:00:52 -07:00
Wing Lian
0b103775ad
reduce verbosity of the special tokens ( #1472 )
2024-04-01 21:47:27 +09:00
NanoCode012
946b497c3f
feat: add deepspeed 3 with cpuoffload ( #1466 )
...
* feat: add deepspeed 3 with cpuoffload
* make bf16 explicit, add param only offload variant
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-04-01 21:42:52 +09:00
Wing Lian
0ddfb24fcf
LISA ( #1469 )
...
* add lisa support
* fix default and fix attribute traversal for layers
* improve lisa callback logging
* fix LISA by ensuring params are not frozen during __init__
* example config for lisa
---------
Co-authored-by: Aman Karmani <aman@tmm1.net >
2024-04-01 04:54:53 -07:00
Wing Lian
89134f2143
make sure to install causal_conv1d in docker ( #1459 )
2024-03-29 16:43:25 -04:00
Wing Lian
6086be85f7
qwen2_moe support w multipack ( #1455 )
2024-03-29 11:04:53 -04:00
Wing Lian
4a92a3b9ee
Nightlies fix v4 ( #1458 ) [skip ci]
...
* another attempt at github actions
* try again
2024-03-29 11:04:34 -04:00
Wing Lian
46a73e3d1a
fix yaml parsing for workflow ( #1457 ) [skip ci]
2024-03-29 10:21:08 -04:00
Wing Lian
da3415bb5a
fix how nightly tag is generated ( #1456 ) [skip ci]
2024-03-29 09:29:17 -04:00
Wing Lian
8cb127abeb
configure nightly docker builds ( #1454 ) [skip ci]
...
* configure nightly docker builds
* also test update pytorch in modal ci
2024-03-29 08:25:45 -04:00
Wing Lian
05b398a072
fix some of the edge cases for Jamba ( #1452 )
...
* fix some of the edge cases for Jamba
* update requirements for jamba
2024-03-29 02:38:02 -04:00
Keith Stevens
e634118f90
Support loading datasets saved via save_to_disk ( #1432 )
...
* Support loading datasetes saved via save_to_disk
* Adding comprehensive unittests
* Fix dataset tests due to new hash changes
2024-03-29 00:19:36 -04:00
Wing Lian
02af0820f7
Jamba ( #1451 )
...
* fixes for larger models
* add qlora example for deepspeed
* add readme for jamba
2024-03-28 21:03:22 -04:00
Wing Lian
4155e9988f
fix layer_replication arg to peft ( #1446 )
2024-03-27 10:18:56 -04:00
Wing Lian
25afd35842
support layer replication for peft and fix rslora integration ( #1445 )
2024-03-27 10:16:47 -04:00
Wing Lian
da265dd796
fix for accelerate env var for auto bf16, add new base image and expand torch_cuda_arch_list support ( #1413 )
2024-03-26 16:46:19 -04:00
WenboPan
e07347b188
Remove seq_len arg in rotary_emb ( #1443 )
...
* remove seq_len in llama rotary_emb
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-03-26 15:19:44 -04:00
Far El
bcdc9b1601
Fix falcon tokenization step ( #1441 ) [skip ci]
...
* Fix falcon tokenization step
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-03-26 15:19:34 -04:00
Satpal Singh Rathore
c19d060a74
turn sample_packing on for training ( #1438 ) [skip ci]
2024-03-26 15:19:04 -04:00
Wing Lian
601b77bc9d
make sure to capture non-null defaults from config validation ( #1415 )
2024-03-26 15:18:47 -04:00
NanoCode012
ff939d8a64
fix(dataset): normalize tokenizer config and change hash from tokenizer class to tokenizer path ( #1298 )
...
* fix(dataset): normalize tokenizer config and change hash from tokenizer class to tokenizer path
* fix: normalize config
2024-03-25 15:34:54 +09:00
Phuc Van Phan
324d59ea0d
docs: update link to docs of advance topic in README.md ( #1437 )
2024-03-24 21:49:27 -07:00
NanoCode012
f1ebaa07c6
chore(config): refactor old mistral config ( #1435 )
...
* chore(config): refactor old mistral config
* chore: add link to colab on readme
2024-03-25 12:00:44 +09:00
Wing Lian
34ba634b8c
Fix ORPO multi gpu ( #1433 )
...
* don't drop attention_mask for orpo
* handle multi-gpu cases better for orpo
* revert change to not drop the attention_mask from inputs for orpo
2024-03-22 15:22:58 -07:00
Hamel Husain
4e69aa48ab
Update docs.yml
2024-03-21 22:36:57 -07:00
Hamel Husain
629450cecd
Bootstrap Hosted Axolotl Docs w/Quarto ( #1429 )
...
* precommit
* mv styes.css
* fix links
2024-03-21 22:28:36 -07:00
Wing Lian
2a1589f6f6
strip out hacky qlora-fsdp workarounds now that qlora-fsdp fixes are upstreamed ( #1428 )
2024-03-21 11:56:13 -04:00
Younes Belkada
7d55607368
HF / FEAT: Optimize HF tags ( #1425 ) [skip ci]
...
* optimize tags
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-03-21 11:55:56 -04:00
Wing Lian
7803f0934f
fixes for dpo and orpo template loading ( #1424 )
2024-03-20 11:36:24 -04:00