Aman Karmani
a213d9972a
fix eval regression caused in 13f7efaf74
2023-08-21 10:40:06 -07:00
Wing Lian
fbf49a4770
is_causal fix for evals?
2023-08-21 10:36:26 -04:00
Wing Lian
58cf7e7fed
add missing positional arg ( #450 )
2023-08-21 04:10:19 -04:00
NanoCode012
04a42b6db1
feat(docs): improve user customized prompts ( #443 )
...
* feat(docs): improve user customized prompts
* feat(doc): add custom pretokenized instructions
* chore: clean old data folder
* chore: add new line
2023-08-20 23:59:43 -04:00
NanoCode012
919f4cac90
feat(doc): add pillow to lambda instructions ( #445 )
2023-08-20 23:59:23 -04:00
Wing Lian
ee262818ef
fix evals ( #447 )
2023-08-20 23:39:42 -04:00
Wing Lian
9d629d8bff
gracefully handle empty input ( #442 )
2023-08-20 09:18:18 -04:00
Wing Lian
d2e7f27240
support user defined prompters, pretokenized datasets in config, local parquet, local arrow files ( #348 )
...
* support user defined prompters, pretokenized datasets in config, local parquet, local arrow files
* fix user defined dataset types
* fix for system prompts
* fix tests
* fix checks for parquet and arrow
* aha moment that d.data_files isn't used
* add documentation for ds_type to add support for parquet and arrow
2023-08-20 09:17:49 -04:00
Philpax
d21318dfb9
docs(readme): add cd axolotl ( #440 )
2023-08-19 19:14:05 -04:00
Wing Lian
f733d0f31e
disable eval using multipack for now ( #437 )
2023-08-19 10:35:04 -04:00
Wing Lian
008505c8ae
fix comma, not a tuple ( #436 )
2023-08-19 00:57:40 -04:00
Wing Lian
b3f5e00ff5
use save_strategy from config if available ( #434 )
...
* use save_strategy from config if available
* update docs for save_strategy
2023-08-18 20:28:23 -04:00
Wing Lian
5247c5004e
set env for FSDP offload params ( #433 )
2023-08-18 20:28:09 -04:00
mhenrichsen
cf6654769a
flash attn pip install ( #426 )
...
* flash attn pip
* add packaging
* add packaging to apt get
* install flash attn in dockerfile
* remove unused whls
* add wheel
* clean up pr
fix packaging requirement for ci
upgrade pip for ci
skip build isolation for requiremnents to get flash-attn working
install flash-attn seperately
* install wheel for ci
* no flash-attn for basic cicd
* install flash-attn as pip extras
---------
Co-authored-by: Ubuntu <mgh@mgh-vm.wsyvwcia0jxedeyrchqg425tpb.ax.internal.cloudapp.net >
Co-authored-by: mhenrichsen <some_email@hey.com >
Co-authored-by: Mads Henrichsen <mads@BrbartiendeMads.lan >
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-08-18 19:00:27 -04:00
Aman Gupta Karmani
06edf175ac
standardize attn hijack patches ( #381 )
...
* split sdp attn into its own patch
* sync xformers patch to follow shared format and be diffable
* update flash-attn patch for 70B/GQA and inference using helper from flash-attn tests
* speed up flash-attn inference
* fix patch to check position ids and don't use multipack for evals
* copy LlamaModel.forward and LlamaDecoderLayer.forward into monkeypatch
* update forwards so we only calculate cu_seqlens once
* enable eval dataloader using multipack again
* fix the patch to work properly and work with FSDP
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-08-18 12:54:16 -04:00
mhenrichsen
0a228479b3
adds color ( #425 )
...
* adds color
* chore: lint
* fix for colorama
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-08-18 10:59:43 -04:00
Wing Lian
82e111aba9
remove extra accelearate in requirements ( #430 )
2023-08-18 10:56:14 -04:00
Wing Lian
8cace80175
fix fixture for new tokenizer handling in transformers ( #428 )
2023-08-17 17:01:52 -04:00
Wing Lian
1b7e8604bb
fix orca prompts ( #422 )
2023-08-16 11:21:03 -04:00
NanoCode012
3d1f203b62
Fix(docs): Remove gptq+lora and fix xformer compat list ( #423 )
2023-08-16 13:56:48 +09:00
Wing Lian
d3d6fd6ae6
just resort to tags ans use main-latest ( #424 )
2023-08-16 00:39:57 -04:00
NanoCode012
b7449a997f
Fix(template): Inform to place stack trace to Issue ( #417 )
...
* Fix(template): Inform to place stack trace to Issue
* Update following suggestions
Co-authored-by: Wing Lian <wing.lian@gmail.com >
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-08-16 11:55:48 +09:00
Wing Lian
5f80b3560b
use inputs for image rather than outputs for docker metadata ( #420 )
2023-08-15 18:26:59 -04:00
Wing Lian
24959091d7
hopefully improve the README ( #419 )
...
* hopefully improve the README
* exitcode -9 help
* table of contents
* formatting
2023-08-15 15:30:53 -04:00
Wing Lian
7af816699e
tag with latest as well for axolotl-runpod ( #418 )
...
* tag with latest as well for axolotl-runpod
* no dev branch for now
2023-08-15 15:30:41 -04:00
mhenrichsen
f806e86a6e
Merge pull request #413 from mhenrichsen/chore/update-deepseed-config
...
update path to align with fsdp example
2023-08-15 20:08:23 +02:00
NanoCode012
2b990eb628
Feat(doc): Add lr_quadratic_warmup to readme ( #412 )
2023-08-16 02:55:48 +09:00
mhenrichsen
bd8cab49c9
update path to align with fsdp example
2023-08-15 19:51:58 +02:00
NanoCode012
c01015f33f
Fix(config): Update handling of deepspeed config ( #404 )
...
* Fix(config): Update handling of deepspeed config
* feat: auto set deepspeed env if deepspeed passed
* fix: update new deepspeed instructions
2023-08-16 01:22:43 +09:00
NanoCode012
72fe3f8e3d
Fix(docs): Update flash attn requirements ( #409 )
2023-08-15 22:40:52 +09:00
Wing Lian
47961fdb8b
update docs for tokenizer_legacy ( #401 )
...
* update docs for tokenizer_legacy
* add default info
2023-08-15 22:34:42 +09:00
NanoCode012
7ad37cb6d7
Fix(template): Remove iPhone/android from Issue template ( #407 )
2023-08-15 22:32:51 +09:00
Wing Lian
29241cf1e4
Ax art ( #405 )
...
* axolotl text art :D
* only print art on rank0
* lint and pr feedback
2023-08-15 08:34:30 -04:00
lightningRalf
31db0ecce4
add templates, CoC and contributing guide ( #126 )
...
* add templates, CoC and contributing guide
* Update .github/SECURITY.md
correct responsible person
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* Update bug-report.yaml
axolotl version switch with axolotl branch-commit
* update CONTRIBUTING doc
* update reporting link
* linter fixes
* chore: fix linter
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2023-08-15 07:41:05 -04:00
Wing Lian
da10af03e9
fix eval steps and strategy ( #403 )
2023-08-15 07:28:50 -04:00
Wing Lian
85cf4f8e2c
better handling of empty input ids when tokenizing ( #395 )
...
* better handling of empty input ids when tokenizing
* Add warning if tokenizer resulted in empty result
* fix len comparison for linter
2023-08-15 01:09:59 -04:00
Aman Karmani
2e22404d2d
add utils.data.prepare_dataset
2023-08-14 21:28:29 -07:00
NanoCode012
be294fd605
Feat(doc): Add how to save by epochs ( #396 )
2023-08-15 13:24:25 +09:00
Wing Lian
fc2d6be96d
use context manager to run things on rank0 before others ( #397 )
2023-08-15 00:10:47 -04:00
Wing Lian
1687be6a35
don't use mask expansion for inference ( #392 )
2023-08-14 20:52:54 -04:00
NanoCode012
41ecb451c2
Feat(doc): Add max_steps to readme ( #389 )
2023-08-15 00:34:22 +09:00
Gabriel Puliatti
3c2ad00d07
Feat(config): add max steps ( #387 )
2023-08-14 11:19:29 -04:00
florian peyron
5d48a10548
Added "epoch" evaluation_strategy ( #388 )
2023-08-14 10:59:23 -04:00
NanoCode012
73a0b6ead5
Feat(config): Add hub_strategy ( #386 )
2023-08-14 07:12:55 -04:00
florian peyron
63fdb5a7fb
Error msg for sharegpt if conv has less than 2 msg ( #379 )
2023-08-14 17:40:40 +09:00
mhenrichsen
fdffef5940
new llama-2 default settings ( #370 )
...
* new default settings
* fix whitespace
* rm max packed sequence length
---------
Co-authored-by: Mads Henrichsen <mads@BrbartiendeMads.lan >
2023-08-14 17:39:09 +09:00
Wing Lian
919246fbc1
don't pass rope_scaling kwarg if it's None ( #383 )
2023-08-13 18:57:38 -04:00
Wing Lian
ffac902c1b
bump flash-attn to 2.0.4 for the base docker image ( #382 )
2023-08-13 17:55:04 -04:00
Charles Goddard
15f6e57eaa
Fix crash when running without CUDA
2023-08-13 13:36:40 -07:00
NanoCode012
729c299256
Feat(doc): Improve sharegpt doc ( #378 )
...
* Feat(doc): Improve sharegpt doc
* Fix typo
2023-08-14 00:36:00 +09:00