NanoCode012
94d03c8402
Clarify pre-tokenize before multigpu ( #359 )
2023-08-11 11:27:42 +09:00
Aman Gupta Karmani
11ddccb80f
Merge pull request #356 from tmm1/load_model-args
...
simplify `load_model` signature
2023-08-09 18:24:34 -07:00
Aman Gupta Karmani
964312199e
Merge pull request #354 from tmm1/gpu-util
...
GPU memory usage logging
2023-08-09 15:44:18 -07:00
Aman Karmani
718102271f
simplify load_model signature
2023-08-09 22:36:02 +00:00
Aman Gupta Karmani
f5c11f8262
Merge pull request #350 from tmm1/group-len-false-examples
...
set `group_by_length` to false in all examples
2023-08-09 14:48:48 -07:00
Aman Karmani
9c314101d5
use newer pynvml package
2023-08-09 21:06:28 +00:00
Aman Karmani
e303d64728
log GPU memory usage
2023-08-09 18:26:28 +00:00
Aman Karmani
b4d1d22782
note pattern when using groups
2023-08-07 16:18:42 -07:00
Aman Karmani
9f99104038
update comment for group_by_length
2023-08-07 01:04:56 -07:00
Aman Karmani
36fefcf94b
set group_by_length to false in examples
2023-08-06 23:59:09 -07:00
Wing Lian
176b888a63
ensure enable_input_require_grads is called on model before getting the peft model ( #345 )
2023-08-06 18:13:10 -04:00
Jan Philipp Harries
3392270544
experimental llama 2 chat support ( #296 )
...
* experimental llama 2 chat support
* few small fixes
* llama2_chat
* small fix to follow original implementation
* small fixes and added fixtures/tests
* fix -mixed up inference and finetuning conversations
* args - small fix
* small fix
* small adjustment and warning
* fix with pre-commit
---------
Co-authored-by: Jan Philipp Harries <jpdus@users.noreply.github.com >
2023-08-06 17:40:52 -04:00
Wing Lian
bb53a165f5
add a basic ds zero3 config ( #347 )
...
better defaults for ds
2023-08-06 17:19:51 -04:00
ssmi153
10405b9995
Update XFormers Attention Monkeypatch to handle Llama-2 70B (GQA) ( #339 )
...
* Fix XFormers attention for Llama-2 70B (GQA)
Updated XFormers MonkeyPatch to handle GQA as used in Llama-2 70B. All the updated code is taken directly from the Transformers library: 07360b6c9c (diff-06392bad3b9e97be9ade60d4ac46f73b6809388f4d507c2ba1384ab872711c51) from their llama_modeling.py file.
* Catch configs without pretraining_tp
* Whitespace bug fix
Command had accidentally been moved out of if-else block.
* pre-commit formatting fixes
Thanks to @winglian
2023-08-06 11:09:04 -04:00
Jan Philipp Harries
c93655c0a3
Added Orca Mini prompt strategy ( #263 )
...
* added Orca Mini prompt strategy
* maybe this fixed precommit errors?
* pre-commits passing
---------
Co-authored-by: Jan Philipp Harries <jpdus@users.noreply.github.com >
2023-08-06 03:16:41 +09:00
Wing Lian
fe285430bc
optimize the iteration when tokenizeing large datasets ( #332 )
2023-08-04 12:12:05 -04:00
Aman Gupta Karmani
0d2e34f056
Merge pull request #336 from tmm1/flash-attn
...
Fix flash-attn + qlora not working with llama models
2023-08-03 16:25:30 -07:00
Aman Gupta Karmani
b56a6c0101
Merge pull request #337 from tmm1/readme-fix
...
update README
2023-08-03 15:14:17 -07:00
Aman Karmani
2eda9e02a9
fix typo
2023-08-03 21:04:12 +00:00
Aman Karmani
78b9efb7f4
scope flash-attn+qlora fix correctly, scope to llama, add comment
2023-08-03 19:19:39 +00:00
Aman Karmani
312a9fad07
move flash-attn monkey patch alongside the others
2023-08-03 17:20:49 +00:00
Aman Karmani
58d665943e
python 3.10 and 3.11 both work fine, as does pytorch 2.1.0.dev
2023-08-03 16:47:25 +00:00
Aman Karmani
cc7e80026e
there is no configs folder
2023-08-03 16:31:37 +00:00
mhenrichsen
dc71d8872a
feat/llama-2 examples ( #319 )
...
* qlora llama-2
* qlora llama-2
* linting
* readme
* lora added
* linting
* change group_by_length
* 13b fitting on 24gb
* grouped lengths true
* add pad token
* change out dir
---------
Co-authored-by: Mads Henrichsen <mads@Brbar-tilhrende-Mads.local >
2023-08-03 19:22:48 +09:00
Aman Karmani
248bf90f89
ensure flash-attn fixes happen in both adapter/lora modes, and use torch_dtype
2023-08-02 20:15:03 +00:00
Wing Lian
77085ea24e
qlora w flash attention fixes ( #333 )
2023-08-01 23:26:16 -04:00
Wing Lian
db2a3586f3
add peft install back since it doesn't get installed by setup.py ( #331 )
2023-07-31 16:31:53 -04:00
Wing Lian
6c9a87c8ee
pin accelerate so it works with llama2 ( #330 )
2023-07-30 22:20:06 -04:00
Wing Lian
894cba09f3
fix FSDP save of final model ( #329 )
2023-07-30 21:46:44 -04:00
Wing Lian
41a4d15d43
update README for updated docker images ( #328 )
...
* update README for updated docker images
* update readme from pr feedback
2023-07-28 16:50:03 -04:00
Wing Lian
2c37bf6c21
Prune cuda117 ( #327 )
...
* drop cuda117/torch 1.13.1 from support, pin flash attention to v2.0.1, rm torchvision/torchaudio install
* gptq base build not needed. add sm 9.0 support
2023-07-26 16:27:49 -04:00
Wing Lian
9f69c4d8c1
latest HEAD of accelerate causes 0 loss immediately w FSDP ( #321 )
2023-07-24 11:23:56 -04:00
Wing Lian
3d4984b9a5
update prompts for open orca to match the paper ( #317 )
...
fix the test for the updated system tokenizer
2023-07-22 13:49:11 -04:00
Wing Lian
ff7f18d1ed
disable gh cache for first step of docker builds too
2023-07-22 11:46:37 -04:00
Wing Lian
cf62cfd661
add runpod envs to .bashrc, fix bnb env ( #316 )
...
* hopper support for base dockerfile, add runpod envs to .bashrc
* set BNB_CUDA_VERSION env for latest bnb
* don't support hopper yet w 118
2023-07-22 10:09:38 -04:00
Wing Lian
c5df969262
don't use the gha cache w docker
2023-07-22 08:46:21 -04:00
Wing Lian
40a53ff181
Merge pull request #307 from OpenAccess-AI-Collective/xgen-user-sharegpt-tokens
...
better handling since xgen tokenizer breaks with convert_tokens_to_ids
2023-07-22 04:10:38 -04:00
Wing Lian
dcdec44347
Merge pull request #306 from ethanhs/xgen
...
Add XGen info to README and example config
2023-07-22 04:10:18 -04:00
Wing Lian
3ffb018a4c
Merge pull request #313 from OpenAccess-AI-Collective/tokenizer-llama2-embeddings
...
don't resize embeddings to multiples of 32x by default
2023-07-22 04:09:59 -04:00
Wing Lian
a94f2eecb1
Merge pull request #299 from OpenAccess-AI-Collective/flash-attention-2
...
Flash attention 2
2023-07-22 04:07:48 -04:00
Wing Lian
1066751358
don't resize embeddings to multiples of 32x by default
2023-07-22 01:52:38 -04:00
Wing Lian
1b63bf13bc
Merge pull request #308 from OpenAccess-AI-Collective/apache2-license
...
add apache 2.0 license
2023-07-21 09:50:14 -04:00
Wing Lian
5cce2a42ff
add apache 2.0 license
2023-07-21 09:49:29 -04:00
Wing Lian
2a428e8014
better handling since xgen tokenizer breaks with convert_tokens_to_ids
2023-07-21 09:24:11 -04:00
Wing Lian
cdf85fdbd5
pin flash attention 2 to the fix for backwards pass
2023-07-21 08:18:53 -04:00
Wing Lian
9b790d359b
flash attention 2
2023-07-21 08:17:46 -04:00
Ethan Smith
38811434e6
Add XGen info to README and example config
2023-07-21 00:44:50 -07:00
NanoCode012
06c61d6f13
Merge pull request #304 from OpenAccess-AI-Collective/NanoCode012-patch-1
...
Fix(readme): Improve wording for push model
2023-07-21 13:39:45 +09:00
Wing Lian
262dc29df2
Merge pull request #300 from OpenAccess-AI-Collective/pytorch-201
...
Pytorch 2.0.1
2023-07-21 00:28:38 -04:00
NanoCode012
165907fddb
Fix(readme): Improve wording for push model
2023-07-21 11:28:35 +09:00