NanoCode012
9cd27b2f91
fix(readme): clarify custom user prompt [no-ci] ( #1124 )
...
* fix(readme): clarify custom user prompt
* chore: update example to show use case of setting field
2024-01-16 09:47:33 +09:00
Hamel Husain
2dc431078c
Add link on README to Docker Debugging ( #1107 )
...
* add docker debug
* Update docs/debugging.md
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* explain editable install
* explain editable install
* upload new video
* add link to README
* Update README.md
* Update README.md
* chore: lint
* make sure to lint markdown too
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-01-12 08:51:35 -05:00
Hamel Husain
b502392e82
Update README.md ( #1103 )
...
* Update README.md
* Update README.md
2024-01-11 16:41:58 -08:00
Hamel Husain
7512c3ad20
Add Debugging Guide ( #1089 )
...
* add debug guide
* add background
* add .gitignore
* Update devtools/dev_sharegpt.yml
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* Update docs/debugging.md
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* simplify example axolotl config
* add additional comments
* add video and TOC
* try jsonc for better md rendering
* style video thumbnail better
* fix footnote
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-01-10 20:49:24 -08:00
Wing Lian
d7057ccd36
paired kto support ( #1069 )
2024-01-09 13:30:45 -05:00
Johan Hansson
090c24dcb0
Add: mlflow for experiment tracking ( #1059 ) [skip ci]
...
* Update requirements.txt
adding mlflow
* Update __init__.py
Imports for mlflow
* Update README.md
* Create mlflow_.py (#1 )
* Update README.md
* fix precommits
* Update README.md
Update mlflow_tracking_uri
* Update trainer_builder.py
update trainer building
* chore: lint
* make ternary a bit more readable
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-01-09 09:34:09 -05:00
Ricardo Dominguez-Olmedo
04b978b428
Cosine learning rate schedule - minimum learning rate ( #1062 )
...
* Cosine min lr
* Cosine min lr - warn if using deepspeed
* cosine_min_lr_ratio readme
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-01-09 09:29:56 -05:00
Wing Lian
14964417ee
Sponsors ( #1065 )
...
* wip sponsors section in readme
* add ko-fi and contributors list
2024-01-08 18:52:00 -05:00
kallewoof
bdfefaf054
feature: better device mapping for large models ( #918 )
...
* fix: improved memory handling when model is bigger than existing VRAM
* feature: add lora_on_cpu flag to do LoRA loading on CPU (RAM)
For big models where the models are taking up the entire GPU VRAM, the LoRA part will fail unless it is loaded on CPU only.
* doc: add README
* fix: enable progress bars in do_merge_lora()
* doc: mention gpu_memory_limit and lora_on_cpu in merge part of README
* Update src/axolotl/utils/models.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* fix: remove deletion of removed model_kwargs key
* fix: validate that gpu_memory_limit and max_memory are not both set
---------
Co-authored-by: Karl-Johan Alm <kalle@gmail.com >
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-01-05 22:22:21 +09:00
Hamel Husain
63fb3eb426
set default for merge ( #1044 )
2024-01-04 18:14:20 -08:00
Hamel Husain
a3e8783328
[Docs] delete unused cfg value lora_out_dir ( #1029 )
...
* Update README.md
* Update README.md
* Update README.md
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2024-01-02 21:35:20 -08:00
NanoCode012
b31038aae9
chore(readme): update instruction to set config to load from cache ( #1030 )
2024-01-03 11:56:19 +09:00
Wing Lian
4d2e842e46
use recommended setting for use_reentrant w gradient checkpointing ( #1021 )
...
* use recommended setting for use_reentrant w gradient checkpointing
* add doc for gradient_checkpointing_kwargs
2024-01-01 22:17:27 -05:00
mhenrichsen
f8ae59b0a8
Adds chat templates ( #1022 )
2023-12-29 15:44:23 -06:00
NanoCode012
41353d2ea0
feat: expose bnb kwargs ( #1018 )
...
* feat: expose bnb kwargs
* chore: added examples and link per suggestion
* Uncomment defaults per suggestion for readability
Co-authored-by: Hamel Husain <hamel.husain@gmail.com >
---------
Co-authored-by: Hamel Husain <hamel.husain@gmail.com >
2023-12-29 18:16:26 +09:00
NanoCode012
f6ecf14dd4
feat: remove need to add load_in* during merge ( #1017 )
2023-12-29 18:15:30 +09:00
Hamel Husain
dec66d7c53
[Docs] Nit: Remind people to auth to wandb if they are going to use it ( #1013 )
2023-12-28 18:00:16 -08:00
Hamel Husain
76357dc5da
Update README.md ( #1012 )
2023-12-28 18:00:02 -08:00
Wing Lian
70b46ca4f4
remove landmark attn and xpos rope implementations ( #1010 )
2023-12-27 21:07:27 -08:00
Ikko Eltociear Ashimine
d25c34caa6
Update README.md ( #966 )
2023-12-17 09:51:25 -05:00
Hamel Husain
712fd27b3f
Add docs ( #947 )
...
* move section
* update README
* update README
* update README
* update README
* update README
* Update README.md
Co-authored-by: Wing Lian <wing.lian@gmail.com >
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-12-13 14:22:52 -08:00
kallewoof
ef24342538
fix: switch to using the HuggingFace Transformers NEFT implementation ( #941 )
...
* fix: switch to using the HuggingFace Transformers NEFT implementation
* linter
* add support for noisy_embedding_alpha with a warning about it being renamed
* restore pre/posttrain_hooks
* move validation of NEFT noise alpha into validate_config()
* linter
2023-12-13 17:15:34 -05:00
Juraj Bednar
b0cf397ecb
More hints on what to do with CUDA Out of memory errors ( #925 )
2023-12-13 16:38:38 +09:00
Wing Lian
5f79b8242f
new evals_per_epoch and saves_per_epoch to make things cleaner ( #944 )
...
* new evals_per_epoch and saves_per_epoch to make things cleaner
* update per PR feedback
2023-12-12 15:35:23 -05:00
Wing Lian
68b227a7d8
Mixtral multipack ( #928 )
...
* mixtral multipack
* use mixtral model
* sample yml
* calculate cu_seqlens properly
* use updated flash ettention setting
* attn var checks
* force use of flash attention 2 for packing
* lint
* disable future fix for now
* update support table
2023-12-09 21:26:30 -05:00
NanoCode012
d339beb9d9
chore: clarify Readme on sharegpt system role
2023-12-08 11:35:53 +09:00
Bryan Thornbury
992e742cdc
Support device_map=sequential & max_memory config parameters ( #903 )
...
* Support device_map sequential (and others). Support max_memory in cfg.
* Update documentation in README accordingly.
* Update README.md
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-12-04 09:29:21 -05:00
NanoCode012
a1da39cd48
Feat(wandb): Refactor to be more flexible ( #767 )
...
* Feat: Update to handle wandb env better
* chore: rename wandb_run_id to wandb_name
* feat: add new recommendation and update config
* fix: indent and pop disabled env if project passed
* feat: test env set for wandb and recommendation
* feat: update to use wandb_name and allow id
* chore: add info to readme
2023-12-04 22:17:25 +09:00
kallewoof
58ec8b1113
feature: loss watchdog for terminating training runs that are failing ( #899 )
...
Co-authored-by: Karl-Johan Alm <kalle@gmail.com >
2023-12-04 07:54:34 -05:00
NanoCode012
1115c501b8
Feat: Add Qwen ( #894 )
...
* Feat: Add Qwen
* feat: add qwen lora example
* feat: update matrix
* fix: add trust_remote_code
* fix: disable gradient checkpointing
* chore: add warning about gradient checkpointing
* fix: config
* fix: turn off sample packing for this example and reduce seq len
* chore: add comment on seq len
2023-11-26 00:05:01 +09:00
NanoCode012
fb12895a17
Feat: Add warmup_ratio ( #893 )
...
* Feat: Add warmup_ratio
* fix: update readme with more details on conflict
2023-11-25 12:15:43 +09:00
NanoCode012
9fc29e082b
chore(doc): Add info on changing role in sharegpt ( #886 )
2023-11-22 15:32:50 +09:00
Mark Saroufim
ddf815022a
Install from git url ( #874 )
...
* Install from git url
* Update README.md
2023-11-17 12:50:51 -05:00
Wing Lian
0de1457189
try #2 : pin hf transformers and accelerate to latest release, don't reinstall pytorch ( #867 )
...
* isolate torch from the requirements.txt
* fix typo for removed line ending
* pin transformers and accelerate to latest releases
* try w auto-gptq==0.5.1
* update README to remove manual peft install
* pin xformers to 0.0.22
* bump flash-attn to 2.3.3
* pin flash attn to exact version
2023-11-16 10:42:36 -05:00
NanoCode012
3cc67d2cdd
Feat: Add dataset loading from S3, GCS ( #765 )
...
* Feat: Add dataset loading from S3, GCS
* chore: update docs
* chore: add more info on cloud loading
2023-11-16 14:33:58 +09:00
Wing Lian
1bc11868eb
allow overriding of model_config parameters from the YML ( #853 )
...
* allow overriding of model_config parameters from the YML
* remove old logging, update readme
* move the updating of model config to the load_model_config function
* add warning for deprecated rope_scaling in the root of the YML config
2023-11-15 23:47:08 -05:00
Wing Lian
8a8d1c4023
make docker command more robust ( #861 )
...
* make docker command more robust
* update readme with more info
2023-11-15 23:03:54 -05:00
Wing Lian
332984db18
lint fix that didn't get caught by linter ( #866 )
2023-11-15 14:36:40 -05:00
Zongheng Yang
b33c1d55a2
Docs: add instructions to 1-click launching on public clouds ( #862 )
...
* Update README.md
* Update ToC
2023-11-15 14:11:27 -05:00
NanoCode012
501b4d1379
chore(doc): Separate section on runpod ( #860 )
2023-11-16 01:06:51 +09:00
NanoCode012
306fe19c54
feat(doc): add more info on train_on_split ( #855 )
2023-11-15 23:42:26 +09:00
Jason Stillerman
738a057674
Feat: Added Gradio support ( #812 )
...
* Added gradio support
* queuing and title
* pre-commit run
2023-11-04 23:59:22 -04:00
Wing Lian
cdc71f73c8
update table for rwkv4 support, fix process count for dataset ( #822 )
2023-11-04 23:45:44 -04:00
Wing Lian
8b79ff0e94
fix eval_steps to be a sane default ( #797 )
...
* fix eval_steps to be a sane default
* update docs for fractional eval_steps
2023-10-27 22:36:30 -04:00
Aleksa Gordić
2e71ff03a6
Add docker advanced instruction to README ( #792 )
2023-10-27 09:24:04 -04:00
Casper
e50ab072e2
Create preprocess CLI ( #785 )
...
* Create preprocess CLI
* Print prompt template if debugging
* Add print for unsupported prompters
* Formatting
* Formatting
* Refactor variables
* Formatting
* Formatting
* Formatting
* Formatting
2023-10-26 09:35:42 -04:00
NanoCode012
20aa4b57d2
chore(readme): Improve documentation on conversation field ( #782 )
...
* chore(readme): Improve documentation on conversation field
* fix: clarify where the option is
2023-10-24 12:52:32 +09:00
NanoCode012
afedc470bd
Fix: Cannot tokenize with bf16 and on cpu ( #766 )
2023-10-23 01:32:26 +09:00
Casper
15d3a654bf
Implement fused modules ( #747 )
...
* MLP: Memory saving
* Remove RMSNorm restrictions
* Map packed weights to original
* FusedAttention module
* Simplify code
* Move fused modules
* Fix critical typo
* Split inplace
* Add FFT config
* Add validation of fused arguments
* Add fused arguments to config
* Update docs
* Fix validation logic
* Add fused modules to flash attn
* Only fuse during training
* Remove timing
* Formatting
* Formatting
* Formatting
* chore: lint
* chore: lint
* add e2e tests for fused llama
* no lora for tests
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-21 16:08:25 -04:00
Wing Lian
a21935f07a
add to docs ( #703 )
2023-10-19 21:32:30 -04:00