Wing Lian
5bb4a782ce
dataloader defaults
2023-12-12 17:33:31 -05:00
Casper
86487c2e96
Mixtral: More correct MoE, lower loss ( #932 )
...
* More correct MoE
* Fix formatting
2023-12-10 10:34:25 -05:00
Wing Lian
35f9b0f149
update to latest transformers for mixstral support ( #929 )
...
* update to latest transformers for mixstral support
* pin transformers
* fix typo
2023-12-10 10:32:27 -05:00
Wing Lian
68b227a7d8
Mixtral multipack ( #928 )
...
* mixtral multipack
* use mixtral model
* sample yml
* calculate cu_seqlens properly
* use updated flash ettention setting
* attn var checks
* force use of flash attention 2 for packing
* lint
* disable future fix for now
* update support table
2023-12-09 21:26:30 -05:00
Timothy Lim
03c6318ba3
fixing prompt template of chatml by removal of linebreak ( #922 )
...
Co-authored-by: Timothy Lim <timothyyonglee.lim@kxrdev.com >
2023-12-09 13:07:44 -05:00
Wing Lian
40a6362c92
support for mamba ( #915 )
...
* support for mamba
* more mamba fixes
* use fork for mamba kwargs fix
* grad checkpointing doesn't work
* fix extras for mamaba
* mamba loss fix
* use fp32 and remove verbose logging
* mamba fixes
* fix collator for mamba
* set model_type on training_args
* don't save safetensors for mamba
* update mamba config to disable safetensor checkpooints, install for tests
* no evals for mamba tests
* handle save_pretrained
* handle unused safetensors arg
2023-12-09 12:10:41 -05:00
NanoCode012
d339beb9d9
chore: clarify Readme on sharegpt system role
2023-12-08 11:35:53 +09:00
NanoCode012
fde091cb12
fix(tokenizer): handle fast tokenizer properly for bos/eos ( #914 )
2023-12-08 11:31:13 +09:00
Casper
06ae39200b
Pin flash-attn to 2.3.3 ( #919 )
2023-12-07 07:36:52 +01:00
NanoCode012
a581e9f8f6
feat: add check for quantized model ( #913 )
...
* feat: add check for quantized model
* chore: refactor and add another check
* Update src/axolotl/utils/models.py
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-12-05 01:20:06 +09:00
Bryan Thornbury
992e742cdc
Support device_map=sequential & max_memory config parameters ( #903 )
...
* Support device_map sequential (and others). Support max_memory in cfg.
* Update documentation in README accordingly.
* Update README.md
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-12-04 09:29:21 -05:00
NanoCode012
a1da39cd48
Feat(wandb): Refactor to be more flexible ( #767 )
...
* Feat: Update to handle wandb env better
* chore: rename wandb_run_id to wandb_name
* feat: add new recommendation and update config
* fix: indent and pop disabled env if project passed
* feat: test env set for wandb and recommendation
* feat: update to use wandb_name and allow id
* chore: add info to readme
2023-12-04 22:17:25 +09:00
kallewoof
58ec8b1113
feature: loss watchdog for terminating training runs that are failing ( #899 )
...
Co-authored-by: Karl-Johan Alm <kalle@gmail.com >
2023-12-04 07:54:34 -05:00
Haoxiang Wang
476a205cea
Remove learning rate scheduler in deepspeed config to avoid conflict ( #909 )
2023-12-04 05:17:38 -05:00
Wing Lian
3e3229e2d9
fix for qwen w lora ( #906 )
2023-11-30 12:45:50 -05:00
Wing Lian
1d21aa6b0a
ensure merged model matches the training dtype ( #902 )
...
* ensure merged model matches the training dtype
* Update src/axolotl/cli/__init__.py
* Update src/axolotl/cli/__init__.py
2023-11-29 09:55:19 -05:00
kallewoof
71b7ea3c05
Determine FSDP/deepspeed settings on device select. ( #883 )
...
* Determine FSDP/deepspeed settings on device select.
Without this, the OS env check for accelerate will fail.
* rename and move env setup call
* chore: lint
---------
Co-authored-by: Karl-Johan Alm <kalle@gmail.com >
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-11-29 08:36:35 -05:00
NanoCode012
a48dbf6561
fix: remove FA for qwen examples ( #900 )
...
* fix: remove FA for qwen lora
* fix: remove FA for qlora
2023-11-27 21:23:54 +09:00
Wing Lian
6a4562ac08
update datasets version to cut down the warnings due to pyarrow arg change ( #897 )
...
* update datasets to cut down the warnings
* set versions for tokenizers and gradio
* upgrade transformers to latest version
2023-11-25 16:30:00 -05:00
NanoCode012
1115c501b8
Feat: Add Qwen ( #894 )
...
* Feat: Add Qwen
* feat: add qwen lora example
* feat: update matrix
* fix: add trust_remote_code
* fix: disable gradient checkpointing
* chore: add warning about gradient checkpointing
* fix: config
* fix: turn off sample packing for this example and reduce seq len
* chore: add comment on seq len
2023-11-26 00:05:01 +09:00
NanoCode012
7ee3c4cacb
fix: warning should not show if eval_batch_size not provided ( #896 )
2023-11-25 16:04:00 +09:00
NanoCode012
fb12895a17
Feat: Add warmup_ratio ( #893 )
...
* Feat: Add warmup_ratio
* fix: update readme with more details on conflict
2023-11-25 12:15:43 +09:00
NanoCode012
9fc29e082b
chore(doc): Add info on changing role in sharegpt ( #886 )
2023-11-22 15:32:50 +09:00
NanoCode012
575a082aae
fix: revert local dir dataset load ( #878 )
2023-11-18 22:50:41 +09:00
Mark Saroufim
ddf815022a
Install from git url ( #874 )
...
* Install from git url
* Update README.md
2023-11-17 12:50:51 -05:00
Wing Lian
9bf854e59c
Phi update 202311 ( #876 )
...
* add phi modeling from hf
* update for packing and use new modeling class for phi
* update e2e tests for phi to use new model name
* update example phi to also use new phi model name
* use AutoModelForCausalLM for phi lora since sample packing isn't supported
2023-11-17 12:47:17 -05:00
Wing Lian
797f3dd1de
don't train if eval split is too small ( #873 )
...
* allow zero len dataset
* better handling and warning of small eval splits
* raise error if eval split is too small
* don't mess with calculating total num steps in distributed context
* fix eval_sample_packing training args logic
2023-11-16 11:35:42 -05:00
Wing Lian
0de1457189
try #2 : pin hf transformers and accelerate to latest release, don't reinstall pytorch ( #867 )
...
* isolate torch from the requirements.txt
* fix typo for removed line ending
* pin transformers and accelerate to latest releases
* try w auto-gptq==0.5.1
* update README to remove manual peft install
* pin xformers to 0.0.22
* bump flash-attn to 2.3.3
* pin flash attn to exact version
2023-11-16 10:42:36 -05:00
NanoCode012
3cc67d2cdd
Feat: Add dataset loading from S3, GCS ( #765 )
...
* Feat: Add dataset loading from S3, GCS
* chore: update docs
* chore: add more info on cloud loading
2023-11-16 14:33:58 +09:00
Wing Lian
1bc11868eb
allow overriding of model_config parameters from the YML ( #853 )
...
* allow overriding of model_config parameters from the YML
* remove old logging, update readme
* move the updating of model config to the load_model_config function
* add warning for deprecated rope_scaling in the root of the YML config
2023-11-15 23:47:08 -05:00
Wing Lian
b3a61e8ce2
add e2e tests for checking functionality of resume from checkpoint ( #865 )
...
* use tensorboard to see if resume from checkpoint works
* make sure e2e test is either fp16 or bf16
* set max_steps and save limit so we have the checkpoint when testing resuming
* fix test parameters
2023-11-15 23:05:55 -05:00
Wing Lian
8a8d1c4023
make docker command more robust ( #861 )
...
* make docker command more robust
* update readme with more info
2023-11-15 23:03:54 -05:00
Wing Lian
332984db18
lint fix that didn't get caught by linter ( #866 )
2023-11-15 14:36:40 -05:00
MilesQLi
48630f5b34
Update data.py for signature generation ( #851 )
...
* Update data.py
Change of conversation formatting type should also trigger updating the preprocessed dataset, so it should be part of the signature.
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-11-15 14:12:32 -05:00
Zongheng Yang
b33c1d55a2
Docs: add instructions to 1-click launching on public clouds ( #862 )
...
* Update README.md
* Update ToC
2023-11-15 14:11:27 -05:00
Wing Lian
0c2a630326
multipack len should use max, not min ( #863 )
2023-11-15 12:52:32 -05:00
Wing Lian
db8a8afcba
adds llama and mistral dropout support ( #858 )
...
* adds llama and mistral dropout support
* gracefully handle attention dropout if not available yet
2023-11-15 12:28:50 -05:00
Wing Lian
14706504e3
various bugfixes ( #856 )
...
* various bugfixes
use latest tinyllama release
check if val_set_size is empty first
update sdp and xformers llama patches for updated upstream transformers
fix system prompt when no input
calculate total and total supervised tokens even when not sample packing
* add fix for when eval size is estimated to be too small
* should be len 1 for dataset length
* add catchall kwargs
2023-11-15 12:23:18 -05:00
NanoCode012
501b4d1379
chore(doc): Separate section on runpod ( #860 )
2023-11-16 01:06:51 +09:00
NanoCode012
306fe19c54
feat(doc): add more info on train_on_split ( #855 )
2023-11-15 23:42:26 +09:00
Fabian Preiß
614cff4107
include the suffix modified string in ascii art ( #852 )
2023-11-15 07:12:28 -05:00
Wing Lian
1a6309c8a6
cleanup the old multipack dataloader ( #841 )
2023-11-12 05:39:09 -05:00
Bryan Thornbury
105d0b350b
Pin optimum package ( #838 )
2023-11-09 22:36:15 -05:00
Wing Lian
f544ab2bed
don't compile deepspeed or bitsandbytes from source ( #837 )
2023-11-08 19:49:55 -05:00
Wing Lian
641e6f7e51
multipack w batch sampler ( #795 )
...
* test batch sampler w varying batch lens
* wip
* multipack batchsampler wip
* wip
* fix for prepare data loader to get correct # of steps based on gpues
* lint and clean up
* calculate len estimate
* fix total num steps calc
* add options for dataloader_num_workers and dataloader_pin_memory
* remove gitbook
* support prefetch_factor for dataloader optimization
* fix the kwarg
2023-11-07 20:27:40 -05:00
Wing Lian
6dc68a653f
use temp_dir kwarg instead
2023-11-06 18:33:01 -05:00
Wing Lian
7de6a5639c
missing dunder-init
2023-11-06 18:33:01 -05:00
Wing Lian
c74f045ba7
chore: lint
2023-11-06 18:33:01 -05:00
Wing Lian
0402d19759
make sure to cleanup tmp output_dir for e2e tests
2023-11-06 18:33:01 -05:00
Wing Lian
b2430ce670
use accelerate logging for zero/main loggin only
2023-11-06 18:32:26 -05:00
Wing Lian
4c834bf25d
cleanup verbosity a bit
2023-11-06 18:32:26 -05:00
Fabian Preiß
8056ecd30e
add deepspeed-kernels dependency for deepspeed>=0.12.0 ( #827 )
2023-11-05 07:52:56 -05:00
Jason Stillerman
738a057674
Feat: Added Gradio support ( #812 )
...
* Added gradio support
* queuing and title
* pre-commit run
2023-11-04 23:59:22 -04:00
Wing Lian
cdc71f73c8
update table for rwkv4 support, fix process count for dataset ( #822 )
2023-11-04 23:45:44 -04:00
NanoCode012
6459ac7357
fix: pin autogptq ( #818 )
2023-11-03 10:14:55 -04:00
Wing Lian
964d858da0
fix model parallel ( #816 )
2023-11-02 21:34:22 -04:00
NanoCode012
10388a8daf
fix(tokenizer): update log order after update ( #806 )
2023-10-31 13:21:20 +09:00
NanoCode012
9f7e8a971d
feat(doc): add dummyoptim faq fix ( #802 )
2023-10-29 23:06:06 +09:00
NanoCode012
637ed095a0
fix(config): Set eos/bos to tokenizer if different ( #801 )
...
* fix(config): Set eos/bos to tokenizer if different
* chore: fix lint
2023-10-29 21:32:37 +09:00
Wing Lian
827ec3d274
refactor neft patch to be more re-usable similar to trl's impl ( #796 )
2023-10-29 04:33:13 -04:00
Wing Lian
8b79ff0e94
fix eval_steps to be a sane default ( #797 )
...
* fix eval_steps to be a sane default
* update docs for fractional eval_steps
2023-10-27 22:36:30 -04:00
MilesQLi
0800885e2f
Update to adapt to sharegpt datasets with "assistant" rather than "gp… ( #774 )
...
* Update to adapt to sharegpt datasets with "assistant" rather than "gpt" as the machine answers.
* use a strict option for hanedling incorrect turn data
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-27 22:00:16 -04:00
Teknium
d3193beac3
Fix Deepspeed Zero3 Config ( #791 )
...
* Update zero3.json
Take away CPU Offload by default (Slows things down horribly, better off reducing batchsize), and changes LR Scheduler to a properly decaying one
* Update zero3.json
fix something
2023-10-27 21:57:02 -04:00
Aleksa Gordić
2e71ff03a6
Add docker advanced instruction to README ( #792 )
2023-10-27 09:24:04 -04:00
chanvichetvong
facc49f32b
GitBook: No commit message
2023-10-26 15:11:00 +00:00
Casper
e50ab072e2
Create preprocess CLI ( #785 )
...
* Create preprocess CLI
* Print prompt template if debugging
* Add print for unsupported prompters
* Formatting
* Formatting
* Refactor variables
* Formatting
* Formatting
* Formatting
* Formatting
2023-10-26 09:35:42 -04:00
Casper
05bd6f1122
Threaded MultipackDistributedDataloader with prefetched samples ( #759 )
...
* Multithreading implementation [WIP]
* Added benchmarking
* 35% increased throughput
* Memory pinning
* Start threads in init
* Correct print of samples
* Sleep if queue is full
* Remove pin_memory (worse)
* Simplify logic to one thread
* Remove benchmark
* Use deque for constant speed
* Formatting
* Formatting
* Formatting
* Formatting
* Rollback to use queue
* Fix multi-epoch training
* Add num epochs arg
* Start thread in __iter__
* Formatting
* Use is_alive correctly
* Simplify loading thread
2023-10-26 07:49:52 +02:00
NanoCode012
20aa4b57d2
chore(readme): Improve documentation on conversation field ( #782 )
...
* chore(readme): Improve documentation on conversation field
* fix: clarify where the option is
2023-10-24 12:52:32 +09:00
NanoCode012
11d1d607db
chore: refactor truthy check and fix mypy ( #780 )
2023-10-24 12:28:40 +09:00
Wing Lian
6c81c61bc4
refactor setup trainer so we can add more hooks ( #773 )
...
* refactor setup trainer so we can add more hooks
* Remove stray comma
2023-10-23 17:38:41 -04:00
Wing Lian
9b43e7ea15
disable eval table w sample packing in examples ( #778 )
2023-10-23 09:18:44 -04:00
Wing Lian
2d8def68dc
simplify by removing duplicate base_model_config ( #772 )
2023-10-23 01:42:38 -04:00
NanoCode012
44c9d0151a
Fix: Warn when fullfinetune without adapter ( #770 )
2023-10-22 15:41:43 -04:00
Wing Lian
ca84cca2c0
convert exponential notation lr to floats ( #771 )
2023-10-22 15:37:03 -04:00
Casper
32eeeb5b64
Hotfix for not saving correctly ( #762 )
2023-10-22 13:22:32 -04:00
NanoCode012
afedc470bd
Fix: Cannot tokenize with bf16 and on cpu ( #766 )
2023-10-23 01:32:26 +09:00
NanoCode012
9923b72649
Fix: eval table conflict with eval_sample_packing ( #769 )
2023-10-23 01:18:12 +09:00
Wing Lian
21cf09b608
remove lora fused packing test ( #758 )
2023-10-21 22:59:35 -04:00
Casper
15d3a654bf
Implement fused modules ( #747 )
...
* MLP: Memory saving
* Remove RMSNorm restrictions
* Map packed weights to original
* FusedAttention module
* Simplify code
* Move fused modules
* Fix critical typo
* Split inplace
* Add FFT config
* Add validation of fused arguments
* Add fused arguments to config
* Update docs
* Fix validation logic
* Add fused modules to flash attn
* Only fuse during training
* Remove timing
* Formatting
* Formatting
* Formatting
* chore: lint
* chore: lint
* add e2e tests for fused llama
* no lora for tests
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-21 16:08:25 -04:00
Wing Lian
a21935f07a
add to docs ( #703 )
2023-10-19 21:32:30 -04:00
NanoCode012
8966a6f566
chore: bump transformers to v4.34.1 to fix tokenizer issue ( #745 )
2023-10-19 20:18:22 -04:00
Motoki Wu
e4d1585c4e
Fix DeepSpeed Zero 3 Saving ( #709 )
...
* Update train.py
* add zero3 check
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-19 19:18:24 -04:00
Wing Lian
70157ccb8f
add a latest tag for regular axolotl image, cleanup extraneous print statement ( #746 )
2023-10-19 12:28:29 -04:00
seungduk.kim.2304
3a99495b05
improve: Enhance code readability of prompt_tokenizers.py ( #707 )
2023-10-19 08:12:17 -04:00
NanoCode012
440c3ab527
Fix(model): Linear detected and added to target module with rope linear ( #738 )
...
* Fix(model): Linear detected and added to target module with rope linear
* fix: exclude layer instead
2023-10-18 22:13:20 -04:00
Napuh
992d57f20a
catch ConnectionError when checking dataset from HuggingFace ( #743 )
2023-10-18 22:11:54 -04:00
mhenrichsen
91a016f410
badge ( #739 )
...
* badge
* fixed text
2023-10-18 10:21:34 -04:00
Casper
a045db0214
Mistral: Sliding Window Attention with Flash Attention and Sample Packing ( #732 )
...
* Implement Mistral FA + SWA + Sample Packing
* Handle unbroadcastable tensor
* chore: lint
* Simplify _prepare_decoder_attention_mask
* Uncomment window size
* Upgrade flash-attn to minimum of 2.3.0 to support SWA
* Add original condition to avoid error during inference
* chore: lint
* use torchscript to prevent oom
* chore: pylint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-10-16 15:13:46 -04:00
Casper
e1b214c62b
Clarify custom format example ( #729 )
...
* Clarify custom prompt format
* Simplify format
2023-10-14 09:28:12 -04:00
Wing Lian
3553172e3c
fixes for alpaca w chatml, and don't include attention_mask w mistral for flash attention ( #728 )
2023-10-14 09:27:07 -04:00
Wing Lian
7f2027d93f
tweak for xformers install w pytorch 2.1.0 ( #727 )
2023-10-13 15:21:17 -04:00
Wing Lian
8d288a2ad4
workaround for installing xformers w torch 2.1.0 ( #725 )
2023-10-13 11:19:30 -04:00
Wing Lian
f30afe4544
misc sharegpt fixes ( #723 )
...
* support for sharegpt with assistant talking first, better masking of assistant token, allow remap of roles from dataset
* invalid role is actually not possible
* update tokenized fixture for corrected labels
2023-10-13 11:04:39 -04:00
Wing Lian
bfbdba8614
pin xformers >= 0.0.22 ( #724 )
2023-10-13 10:27:56 -04:00
Maxime
3bd9528390
add noisy embedding ( #721 )
...
* add noisy embedding
* fix format
* Update README.md
* Update README.md
* linter issues
* caseus fixes
---------
Co-authored-by: Maxime <maxime@nope.no >
2023-10-13 10:00:42 -04:00
Wing Lian
2aa1f71464
fix pytorch 2.1.0 build, add multipack docs ( #722 )
2023-10-13 08:57:28 -04:00
Wing Lian
1c412c7e9d
improve handling of the prepared ds path and other cfg defaults ( #701 )
2023-10-13 07:46:07 -04:00
Jan Philipp Harries
490923fb78
Save Axolotl config as WandB artifact ( #716 )
2023-10-11 07:28:12 -04:00
NanoCode012
5855dded3d
fix(doc): update default doc according to arg ( #714 )
2023-10-10 21:51:56 +09:00
atgctg
ace70b33c6
Fix: lowercase True values in config ( #713 )
...
* Fix: lowercase `True` values in config
* Fix: lowercase `True` values in config
2023-10-10 21:32:20 +09:00
NanoCode012
11c48c5e03
fix(doc): Add note on inference w sample packing ( #712 )
2023-10-10 21:08:17 +09:00
lukemarsden
295b2662e1
Get qlora mistral-7b fine tuning working on a single 4090 ( #708 )
2023-10-10 15:14:23 +09:00
seungduk.kim.2304
77c84e02fd
Update README with some explanations ( #700 )
...
* Update README with some explanations
* revert commit-hook change
* add more explanation about batch size and gradient accum
* not use latex foromat
* decorate
* git hook again
* Attach a link that explains about LoRA hyperparameters
* update table of content
* Explanation about lora_modules_to_save
2023-10-08 13:37:54 -04:00
mhenrichsen
f91db198f3
fix unneeded space ( #699 )
2023-10-07 14:19:25 -04:00
Wing Lian
7f2618b5f4
add docker images for pytorch 2.10 ( #697 )
2023-10-07 12:23:31 -04:00
Wing Lian
aca0398315
apex not needed as amp is part of pytorch ( #696 )
2023-10-07 12:20:45 -04:00
mhenrichsen
29b8f46aed
Merge pull request #693 from OpenAccess-AI-Collective/update-mistral-example
...
update mistral lr, sample pack
2023-10-07 11:04:58 +02:00
mhenrichsen
83a950bb87
lint
2023-10-07 11:04:35 +02:00
Wing Lian
de87ea68f6
fix multiline for docker ( #694 )
2023-10-06 22:38:15 -04:00
mhenrichsen
4c8ddf2c6f
new lr, sample pack
2023-10-06 22:58:13 +02:00
NanoCode012
669f1d052c
Fix: Higher vram usage for mistral and sample_packing ( #691 )
...
* Fix: Higher vram usage for mistral and sample_packing
* chore: update comment
* chore: lint
2023-10-06 12:33:43 -04:00
Abhishek Mishra
d4a88e4eca
Adding qlora config for Mistral ( #675 )
...
* Adding qlora config for Mistral
Contains fix for Mistral FA issue - ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Mistral. Make sure to call tokenizer.padding_side = 'left' before tokenizing the input.
Fix for now is to set sample_packing: true and pad_to_sequence_len: true
* Renamed to qlora.yml
2023-10-06 21:05:56 +09:00
Wing Lian
2d60ba3a6e
flash_attention + sample packing for stablelm 3b ( #671 )
...
* stablelm epoch fa patch
* is causal for fa
* working stablelm fa w packing
* chore: pre-commit linting
2023-10-05 16:03:43 -04:00
NanoCode012
eb480dfd68
Fix: ValueError when FA + Mistral when padding_side=right ( #681 )
...
* Fix: ValueError when FA + Mistral when padding_side=right
* fix: remove tokenizer class check
2023-10-06 04:12:54 +09:00
NanoCode012
133e676bcc
Feat: Set WORKDIR to /workspace/axolotl ( #679 )
2023-10-06 04:09:14 +09:00
NanoCode012
69fac9a020
Fix: Future deprecation warning with use_auth_token ( #680 )
2023-10-06 03:56:18 +09:00
NanoCode012
e0b7eeabfd
Fix(tokenizer): Set rstrip,lstrip,norm to False ( #678 )
2023-10-06 03:50:49 +09:00
NanoCode012
43856c0a39
Fix(version): Update FA to work with Mistral SWA ( #673 )
2023-10-04 21:32:19 +09:00
NanoCode012
e62d5901b5
chore: Clean up repetitive model kwargs ( #670 )
2023-10-04 20:41:26 +09:00
NanoCode012
697c50d408
Feat: Allow usage of native Mistral FA when no sample_packing ( #669 )
...
* Allow usage of native Mistral FA when no sample_packing
* fix: do not apply custom patch when sample_pack off
* chore: lint
* chore: pin transformer to v4.35.0.dev0
* fix: split sample_packing to separate test
2023-10-04 20:40:47 +09:00
NanoCode012
90e0d673f7
Feat: Add config yaml to section for reprod in bug-report.yaml ( #667 )
...
* Update bug-report.yaml
* Update bug-report.yaml
* Update bug-report.yaml
2023-10-03 23:38:42 +09:00
Wing Lian
2642caedf2
refactor to set eval_batch_size earlier if unset, so we can warn if mismatched ( #662 )
2023-10-02 21:08:07 -04:00
Wing Lian
f34648c8b9
remove patch fix for phi ( #664 )
2023-10-02 21:07:41 -04:00
Wing Lian
e50a64e85e
prepared dataset caching, other misc fixes ( #665 )
...
* prepared dataset caching, other misc fixes
* also don't load from disk cache unless explicit
2023-10-02 21:07:24 -04:00
Wing Lian
f4868d733c
make sure we also run CI tests when requirements.txt changes ( #663 )
2023-10-02 08:43:40 -04:00
Napuh
a7e56d83c2
removed duplicate on requirements.txt ( #661 )
2023-10-02 08:40:05 -04:00
Wing Lian
5b0bc48fbc
add mistral e2e tests ( #649 )
...
* mistral e2e tests
* make sure to enable flash attention for the e2e tests
* use latest transformers full sha
* uninstall first
2023-09-29 00:22:40 -04:00
Kyle Corbitt
9ec20777ba
Make dataset_processes configurable ( #651 )
...
I'm using the Axolotl script to train models on https://modal.com serverless GPUs. Unfortunately, their environment seems to have some kind of bug where if I try to run `datasets.filter` with too high a `num_proc`, it throws an error and dies.
This PR adds a new configuration option `dataset_processes`, which lets you explicitly set the number of processes used to map/filter the dataset. If not included, this defaults to the current behavior of setting that to `os.cpu_count()`.
2023-09-29 00:22:22 -04:00
ich
590d6032fd
Fix bug when using pretokenized datasets ( #652 )
...
* fix pretokenized datasets readme
* check if dataset type is not set to handle pretokenized datasets
2023-09-28 22:54:10 -04:00
Wing Lian
409ca0f21c
add support for defined train split ( #654 )
2023-09-28 20:14:14 -04:00
Wing Lian
8662e8ffe8
don't strip the prompt for check since we don't strip to tokenize anymore ( #650 )
2023-09-28 12:21:51 -04:00
Wing Lian
b2edaaeff6
fix for flash attn w mistral w/o sammple packing ( #648 )
2023-09-28 10:57:37 -04:00
Adarsh Shirawalmath
b88f51512a
Update mistral/README.md ( #647 )
2023-09-28 10:24:56 -04:00
NanoCode012
eb41f76f92
Feat: Add example for Mistral ( #644 )
...
* Feat: Add example for Mistral
* chore: turn off flash
* chore: add is_mistral_derived_model
* chore: update following PR
2023-09-28 20:15:00 +09:00
NanoCode012
383f88d7a7
Fix(cfg): Add validation for save_strategy and eval_strategy ( #633 )
...
* Fix(cfg): Check save_strategy cfg conflict with save_steps
* Fix(cfg): Check evaluation_strategy cfg conflict with eval_steps
* chore: add extra check for steps only
2023-09-28 10:14:41 +09:00
Wing Lian
b6ab8aad62
Mistral flash attn packing ( #646 )
...
* add mistral monkeypatch
* add arg for decoder attention masl
* fix lint for duplicate code
* make sure to update transformers too
* tweak install for e2e
* move mistral patch to conditional
2023-09-27 18:41:00 -04:00
Napuh
85b0be2ba7
Warn users to login to HuggingFace ( #645 )
...
* added warning if user is not logged in HF
* updated doc to suggest logging in to HF
2023-09-27 17:43:35 -04:00
Ethan Smith
8fe0e633d2
Fix bug in dataset loading ( #284 )
...
* Fix bug in dataset loading
This fixes a bug when loading datasets. `d.data_files` is a list, so it cannot be directly passed to `hf_hub_download`
* Check type of data_files, and load accordingly
2023-09-27 13:41:31 -04:00
Felix Yan
d1236f2c41
Correct typos in datasets.py ( #639 )
2023-09-27 12:12:10 -04:00
Wing Lian
895f0a0723
skip some flash attn patches unless explicitly enabled ( #643 )
...
* skip some flash attn patches if explicitly disabled
* make the other patches optional
2023-09-27 12:11:07 -04:00
Wing Lian
e7d3e2dbb6
use fastchat conversations template ( #578 )
...
* use fastchat conversations template
* require fastchat (fschat) pip install
* handle roles dynamically from conversation
* tweak fastchat conversation with a monkeypatch to get individual turns
* fix up so it works with multiple conversation styles, and don't strip the turns
* fix sharegpt fixture now that we're using a more correct tokenization
* use a new prompter and support fastchat conversation type
* use sharegpt from prompt strategies now
* update docs, add chatml template
* add a newline after im_end token
* ensure we correctly set system message
* update per PR feedback to handle deprecated sharegpt types
* don't add duplicate wandb req
* make sharegpt fields configurable from yml
* llama2 fixes
* don't fail fatally when turns are improper
2023-09-27 12:10:45 -04:00
Wing Lian
60c7c48c97
update for recent transformers updates ( #636 )
...
* update for recent transformers updates
* fix checkpoint forward kwargs
* just pass args into torch checkpoint
2023-09-27 12:10:32 -04:00
Wing Lian
e8cbf50be6
attention_mask not needed for training ( #642 )
...
* attention_mask not needed for training
* specifically don't use attention mask for phi
* use a different check for phi
* small fixes since phi removed some values from their config
2023-09-27 11:12:08 -04:00
Wing Lian
d887ad86c3
eval_table isn't quite stable enough to be in default llama configs ( #637 )
2023-09-26 10:13:20 -04:00
NanoCode012
19a600a8b8
Feat: Add support for upstream FA2 ( #626 )
...
* Feat: Add support for upstream FA2
* chore: add is_falcon_derived_model: true to examples
* chore: add config to readme for documentation
* feat: add extra model types
* fix: remove old falcon flash patch
* chore: pin transformers and accelerate
2023-09-26 09:53:28 -04:00
Fernando Tarin Morales
5e5296a77c
Added quotes to the pip install -e command to fix an incompatibility with shells that do glob expansion like zsh ( #632 )
2023-09-25 11:50:14 -04:00
mhenrichsen
f3d939016a
Merge pull request #629 from OpenAccess-AI-Collective/chore/-change-default-model
...
default model changed
2023-09-25 09:32:01 +02:00
NanoCode012
cfbce020e9
Fix: Fail bf16 check when running on cpu during merge ( #631 )
2023-09-25 13:48:18 +09:00
mhenrichsen
4fecbfe5e1
default model changed
2023-09-24 18:52:53 +02:00
NanoCode012
67b9888630
Feat(doc): Add eval_sample_packing to doc ( #625 )
2023-09-23 13:11:27 +09:00
Maxime
923eb91304
tweak: improve base builder for smaller layers ( #500 )
2023-09-22 16:17:50 -04:00
Wing Lian
a363604dcf
better handling and logging of empty sharegpt turns ( #603 )
2023-09-22 16:13:42 -04:00
Wing Lian
501958bb6f
create a model card with axolotl badge ( #624 )
2023-09-22 16:13:26 -04:00
Wing Lian
c25ba7939b
update README w deepspeed info ( #605 )
2023-09-22 00:15:52 -04:00
NanoCode012
d5f8589021
chore(callback): Remove old peft saving code ( #510 )
2023-09-22 12:31:33 +09:00
Wing Lian
03e59077a0
misc fixes to add gptq tests ( #621 )
...
* misc fixes to add gptq tests
* set bf16 needed for fa2
2023-09-21 21:52:12 -04:00
Wing Lian
97d3776ce6
split completion text to sequence_len ( #616 )
2023-09-21 21:51:25 -04:00
Wing Lian
2844eb22b6
run eval on the first step to get a baseline ( #617 )
...
* run eval on the first step to get a baseline
* wandb kleeps getting moved around by pre-commit ...
2023-09-21 21:51:09 -04:00
Wing Lian
e85d2eb06b
let MAX_JOBS use the default since we're not resource constrained on our self-hosted runners ( #427 )
2023-09-21 20:36:30 -04:00
Wing Lian
196ff1181e
skip the gpu memory checks if the device is set to 'auto' ( #609 )
...
* skip the gpu memory checks if the device is set to 'auto'
* skip gpu mem logging if cpu too
* don't worry about log_gpu_memory_usage since it calls another annotated fn
* rename decorator internal
2023-09-21 15:20:31 -04:00
Wing Lian
92512c390b
ignore wandb to resolve isort headaches ( #619 )
2023-09-21 11:50:09 -04:00
Maxime
2fe95cdcc1
fix distributed devices ( #612 )
...
* fix distributed devices
* Update distributed.py
* Update distributed.py
2023-09-21 09:11:34 -04:00
Maxime
c1382e79b6
Create multi-node.md ( #613 )
...
* Create multi-node.md
* Update multi-node.md
* Update multi-node.md
2023-09-20 22:02:16 -04:00
Maxime
5d931cc042
Only run tests when a change to python files is made ( #614 )
...
* Update tests.yml
* Update .github/workflows/tests.yml
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-09-20 22:02:04 -04:00
Javier
ec0958f4f8
Update requirements.txt ( #610 )
2023-09-20 08:40:49 -04:00
Wing Lian
faecff9798
support to disable exllama for gptq ( #604 )
...
* support to disable exllama for gptq
* update property instead of item
* fix config key
2023-09-19 17:51:08 -04:00
bofeng huang
aa656e04bd
Delete duplicate lines ( #606 )
2023-09-19 16:40:05 -04:00
Wing Lian
b53e77775b
update dockerfile to not build evoformer since it fails the build ( #607 )
2023-09-19 16:28:29 -04:00
Wing Lian
674c57692d
more sane defaults for openllama 3b used for quickstarts ( #602 )
...
* more sane defaults for openllama 3b used for quickstarts
* don't use bf16 for quickstart to simplify gpu compatibility
* use the update openlm-research/open_llama_3b_v2 models
2023-09-19 09:15:10 -04:00
Wing Lian
1eebbd09c3
improve handling for empty text on the tokenization step ( #502 )
2023-09-19 08:09:56 -04:00
Wing Lian
62a774140b
Fix for check with cfg and merge_lora ( #600 )
2023-09-18 21:14:32 -04:00
Wing Lian
31b9e0c6e8
minor tweaks to simplify ( #597 )
2023-09-18 11:45:44 -04:00
Wing Lian
6b9b229356
btlm and falcon monkey patches for flash attn ( #566 )
2023-09-17 13:49:18 -04:00
Wing Lian
131afdbd89
add bf16 check ( #587 )
2023-09-17 13:49:03 -04:00
NanoCode012
00dce35fb2
Feat(data): Allow loading local csv and text ( #594 )
...
* Feat(data): Allow loading local csv and text
* chore: update readme for loading data
2023-09-17 11:32:27 -04:00
Wing Lian
b15b19eb8d
gather/broadcast the max value of the packing efficiency automatically ( #463 )
2023-09-17 11:08:18 -04:00
Wing Lian
ab534d75ba
don't add position_ids for evals ( #591 )
2023-09-16 16:11:57 -04:00
Wing Lian
21ec195c9f
optionally configure sample packing for evals ( #589 )
2023-09-16 00:09:48 -04:00
Wing Lian
62eaee7649
make phi training work with Loras ( #588 )
...
* valdiation for phi loras
* fix model config class check
* update readme for phi traiing
2023-09-15 20:51:55 -04:00
Jan Philipp Harries
be75668400
set fsdp state dict ( #584 )
...
Co-authored-by: Jan Philipp Harries <jphme@users.noreply.github.com >
2023-09-15 17:47:36 -04:00
Wing Lian
aeec7c4688
pop block_cls since it's not an actual kwarg
2023-09-15 15:54:06 -04:00
Wing Lian
360788296a
don't resize embeddings if it's already large enough ( #577 )
...
* don't resize embeddings if it's already large enough
* make sure to tie weights, even if we aren't resizing
2023-09-15 15:47:09 -04:00
Wing Lian
12a2dbbc2c
Support Sample packing for phi arch ( #586 )
...
* phi sequence packing
* sample packing fixes
* fix linting
* fix inference and phi e2e tests
* update phi example now that sample packing works
* wandb import keeps getting moved around
2023-09-15 15:46:54 -04:00
NanoCode012
3a2edc85c3
Feat(doc): Add features to doc ( #583 )
2023-09-16 01:14:15 +09:00
Wing Lian
f7a22632d7
support custom field for completion from yml ( #580 )
...
* support custom field for completion from yml
* remove legacy completion check and add doc
* update README docs
2023-09-15 07:48:21 -04:00
Doan Minh Phuong
1aa400721e
Fix Codellama examples ( #582 )
...
* Fix seq_len
* Update lora.yml
* Update qlora.yml
* Update lora.yml
* Update lora.yml
* Update qlora.yml
2023-09-15 04:19:13 -04:00
Wing Lian
8dcd40ac78
prevent cli functions from getting fired on import ( #581 )
2023-09-15 04:03:32 -04:00
Wing Lian
a5a625f47e
update support matrix with btlm and phi ( #579 )
2023-09-15 02:46:15 -04:00
Wing Lian
861cecac2a
refactor scripts/finetune.py into new cli modules ( #550 )
...
* refactor scripts/finetune.py into new cli modules
* continue to support scripts/finetune.py
* update readme with updated cli commands
* Update scripts/finetune.py
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2023-09-15 01:43:52 -04:00
Wing Lian
1078d3eae7
E2e passing tests ( #576 )
...
* run e2e tests after all other checks have passed
* tweak tests so they get run on PRs or push to main
* change dependent action for chcecking
* one test workflow to rule them all
* no need for custom action, just use needs
* whoops, python version should be a string
* e2e tests can run on any available gpu
2023-09-15 01:03:49 -04:00
Wing Lian
24146733db
E2e device cuda ( #575 )
...
* use torch.cuda.current_device() instead of local_rank
* ignore NVML errors for gpu stats
* llama lora packing e2e tests
2023-09-14 22:49:27 -04:00
Wing Lian
9218ebecd2
e2e testing ( #574 )
2023-09-14 21:56:11 -04:00
Wing Lian
228420972e
Phi examples ( #569 )
...
* add phi full ft example
* Add readme to point out that deepspeed should be used
* zero1 is better than zero2 for phi
2023-09-14 11:17:47 -04:00
Wing Lian
c6d870b91d
mypy wandb ignore ( #572 )
...
* mypy wandb ignore
* fix isort for wandb
2023-09-14 11:17:30 -04:00
Wing Lian
115795079d
remove columns after tokenizing for pretraining ( #571 )
2023-09-14 11:08:22 -04:00
Wing Lian
3b18c963cc
set auto for other params that hf trainer sets for ds. include zero1 json ( #570 )
2023-09-14 11:04:37 -04:00
Wing Lian
3fbde762ab
fix save_steps so it doesn't get duplicated ( #567 )
2023-09-13 20:40:33 -04:00
Wing Lian
f6060a664e
Model parallel ( #538 )
...
* model-parallel for single process
* fix device/device_map
* fix handling for device
2023-09-13 11:45:30 -04:00
Wing Lian
a4e1bb6606
let hf trainer handle torch compile ( #516 )
...
* let hf trainer handle torch compile
* remove torch compile checks, include option for backend
* suppress torch errors to get further
* require min torch version of 2.1.0 for torch compile to work
---------
Co-authored-by: Aman Karmani <aman@tmm1.net >
2023-09-13 11:42:12 -04:00
Wing Lian
36e53c7442
improve how we setup eval/save strategies and steps ( #547 )
...
* setup save end eval strategies to be consistent with trainer logic
* add comments
* better eval handling
2023-09-13 11:37:23 -04:00
Wing Lian
e7aa7b1a1e
gracefully handle length feature used for group by ( #565 )
2023-09-13 11:23:30 -04:00
Wing Lian
e5bb22a56b
add optimization for group-by-len ( #563 )
2023-09-13 10:57:12 -04:00
Wing Lian
fdb777bc06
check for the existence of the default accelerate config that can create headaches ( #561 )
2023-09-13 10:38:28 -04:00
Wing Lian
bf0804447c
fix wandb so mypy doesn't complain ( #562 )
...
* fix wandb so mypy doesn't complain
* fix wandb so mypy doesn't complain
* no need for mypy override anymore
2023-09-13 10:36:16 -04:00
Glavin Wiechert
5b67ea98a6
Add training callback to send predictions to WandB table ( #521 )
...
* WIP Add training callback to send predictions to WandB table
* WIP improve wandb table reporting callback
* WIP improve wandb table reporting callback (cont)
* Add VSCode launching for debugging
* Add tiny llama example
* WIP attempt to improve post-eval prediction generation for table
* WIP attempt to improve post-eval prediction generation for table - part 2
* WIP batch generation
* WIP attempt to handle sample_packing using position_ids for wandb prediction table
* WIP add code for debugging
* Fix sample_packing support for wandb prediction table
* Clean up code for PR review
* Add eval_table_size, eval_table_max_new_tokens configs & clean up code
* Clean up PR, delete VSCode config, add tiny-llama example
* Add eval_table_size, eval_table_max_new_tokens documentation. Fix linting/formatting
2023-09-13 09:51:08 -04:00
Jan Philipp Harries
2f586d18db
Fix pretraining with iterable/streaming Dataset ( #556 )
...
* return without packing prep/len
* fix remove columns
* fix encode arguments
* add error when max steps not set
* fix test
---------
Co-authored-by: Jan Philipp Harries <jphme@users.noreply.github.com >
2023-09-13 00:16:40 -04:00
Wing Lian
9845c5e12d
document that packaging needs to be installed before flash-attn ( #559 )
2023-09-12 12:18:30 -04:00
Wing Lian
772cd870d4
fix the sed command to replace the version w the tag
pre-commit / pre-commit (push) Has been cancelled
publish pypi / Upload release to PyPI (push) Has been cancelled
PyTest / test (3.10) (push) Has been cancelled
PyTest / test (3.9) (push) Has been cancelled
2023-09-11 13:44:19 -04:00
Wing Lian
6c5fbe6223
add long_description for pypi push ( #555 )
2023-09-11 13:34:29 -04:00
Wing Lian
bcbc9597e9
replace tags, build dist for pypi publish ( #553 )
...
* replace tags, build dist for pypi publish
* missing trailing comma
2023-09-11 13:25:41 -04:00
The Objective Dad
6d57f2f0f0
ergonomic update to optimizer config doc ( #548 )
2023-09-11 12:35:45 -04:00
Wing Lian
20ed4c1f9e
pypi on tag push ( #552 )
2023-09-11 10:33:42 -04:00
Wing Lian
c5dedb17ad
remove with section, doesn't seem to work ( #551 )
2023-09-11 10:27:17 -04:00
Wing Lian
b56503d423
publish to pypi workflow on tagged release ( #549 )
2023-09-11 09:44:47 -04:00
Wing Lian
a94f9cb99e
fix for quant config from model ( #540 )
2023-09-10 12:40:52 -04:00
dongxiaolong
c1921c9acb
Update requirements.txt ( #543 )
...
fix fsdp
2023-09-08 16:07:11 -04:00
Wing Lian
0b4cf5bc8c
workaround for md5 variations ( #533 )
...
* workaround for md5 variations
* refactor the prepared hash too
2023-09-08 16:01:05 -04:00
SlapDrone
78ee2cdab2
add git environment variables to compose: avoid checkout failure error 128 on build ( #534 )
2023-09-08 15:59:49 -04:00
Wing Lian
34c0a86a11
update readme to point to direct link to runpod template, cleanup install instrucitons ( #532 )
...
* update readme to point to direct link to runpod template, cleanup install instrucitons
* default install flash-attn and auto-gptq now too
* update readme w flash-attn extra
* fix version in setup
2023-09-08 11:58:54 -04:00
The Objective Dad
5e2d8a42d9
Adding NCCL Timeout Guide ( #536 )
...
* fixes NCCL_P2P_LEVEL=NVL #429
* adding more insights into verious values of NCCL_P2P_LEVEL
2023-09-08 11:57:47 -04:00
Wing Lian
e30f1e3cf7
Early stopping metric ( #537 )
...
* set early stopping metric to check
* tweak how load_best_model_at_end gets set for early stopping
* add validation for earl;y stopping patience
* remove negation
* save results to metrics in callback
* move early stopping callback after the benchmark evals
* broadcast metrics so early stopping works
2023-09-08 11:57:02 -04:00
Wing Lian
343714972b
recommend padding when using sample packing ( #531 )
2023-09-06 17:00:21 -04:00
Wing Lian
245c5c41e2
log rank too ( #527 )
2023-09-06 08:37:51 -04:00
Wing Lian
a546ca2813
misc fixes/improvements ( #513 )
...
fix per pr feedback
2023-09-05 16:40:13 -04:00
Wing Lian
3355706e22
Add support for GPTQ using native transformers/peft ( #468 )
...
* auto gptq support
* more tweaks and add yml
* remove old gptq docker
* don't need explicit peft install for tests
* fix setup.py to use extra index url
install torch for tests
fix cuda version for autogptq index
set torch in requirements so that it installs properly
move gptq install around to work with github cicd
* gptq doesn't play well with sample packing
* address pr feedback
* remove torch install for now
* set quantization_config from model config
* Fix the implementation for getting quant config from model config
2023-09-05 12:43:22 -04:00
mhenrichsen
daa4faca12
Merge pull request #520 from bdashore3/sharegpt-fixes
...
Allow for custom system prompts with ShareGPT
2023-09-05 09:02:55 +02:00
Aman Karmani
fc8766e502
reorg a bit
2023-09-05 02:21:24 +00:00
Aman Gupta Karmani
72a6fe1c1f
use flash_attn rmsnorm when available ( #526 )
...
* use flash_attn xentropy when available
* use flash_attn.ops.rms_norm when available
* log when xentropy is not found
* log how to install RMSNorm
* add quotes so pip install works
2023-09-04 19:44:51 -04:00
Aman Gupta Karmani
5fe30b1497
use flash_attn xentropy when available ( #525 )
...
* use flash_attn xentropy when available
* log when xentropy is not found
2023-09-04 17:49:16 -04:00
Aman Gupta Karmani
44454ae4c4
move is_llama_derived_model into normalize_config ( #524 )
2023-09-04 00:19:03 -04:00
Wing Lian
09f154397e
No gather single gpu ( #523 )
...
* don't attempt to gather on multi-gpu
* also check distributed status in bench callback
2023-09-03 23:24:28 -04:00
kingbri
995557bdf3
Prompters: ShareGPT: Allow for custom system prompts
...
If a system prompt is present in a conversation, add it instead of
using the default.
Signed-off-by: kingbri <bdashore3@proton.me >
2023-09-01 13:53:05 -04:00
Maxime
1991946c5a
fix: bad dtype for full finetune ( #504 )
...
* fix: bad dtype for full finetune
* Update src/axolotl/utils/models.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* Update models.py
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-09-01 07:11:45 -07:00
NanoCode012
f51c9c56c6
Fix(doc): Inform Windows users to use WSL/docker ( #518 )
2023-09-01 00:08:21 -07:00
Wing Lian
7710e81f50
log supervised token count ( #448 )
2023-08-31 15:45:23 -07:00
Tom Jobbins
48434bec54
Debug tokenization output: Add ability to output text only (no tokens), and/or specify num samples to see ( #511 )
2023-08-31 14:26:52 -07:00
Jan Philipp Harries
396a7a74fc
Added advanced DDP args ( #515 )
...
* add ddp_config
* add advanced ddp config
* add ddp_config
* add advanced ddp config
---------
Co-authored-by: Jan Philipp Harries <jphme@users.noreply.github.com >
2023-08-31 10:37:47 -07:00
Wing Lian
b21e4a20fe
split train from other cli options ( #503 )
2023-08-30 22:01:47 -07:00
Alpay Ariyak
42f9642792
Changed Bench Eval to report metrics correctly by split. Added total accuracy and renamed previously used bench_accuracy to bench_average_accuracy. ( #512 )
...
* Added "eval_" prefix
* Added total bench accuracy and renamed the previous one to bench_average_accuracy. Changed naming to use bench_split instead of always using eval_ prefix.
2023-08-30 22:00:50 -07:00
Wing Lian
c56b450cf5
drop empty tokenized rows too ( #509 )
2023-08-30 06:55:26 -07:00
Aman Gupta Karmani
1e07c162f1
set zero3 optimizer betas to auto so they inherit from HF trainer config ( #507 )
2023-08-30 08:10:33 -04:00
Wing Lian
76576323df
add eval benchmark callback ( #441 )
...
* add mmlu callback
* use hf dataset for mmlu evals
* default to mmlu-zs
* make sure to define all the explicit positional args
* include metrics in callback
* another callback fix for collator max len attribute
* fix mmlu evals
* sample benchmarks, ensure we drop long samples
* fix the data file
* fix elif and add better messaging
* more fixes
* rename mmlu to bench
* more fixes
* dataset handling and aggregate across benchmark
* better handling when no subjects
* benchmark callback has its own dataloader and collator
* fixes
* updated dataset
* more fixes
* missing transformers import
* improve support for customized dataset for bench evals
* gather benchmarks from all ranks
* fix for gather across multiple gpus
2023-08-29 13:24:19 -07:00
Wing Lian
548787daae
customizable ascii art ( #506 )
2023-08-29 10:13:42 -07:00
Wing Lian
5ac3392075
support for datasets with multiple names ( #480 )
...
* support for datasets with multiple names
* update docs
2023-08-29 06:18:17 -07:00
Aman Gupta Karmani
e356b297cb
remove --force-reinstall from Dockerfile to ensure correct pytorch version ( #492 )
2023-08-29 06:17:51 -07:00
NanoCode012
48c56470d0
Fix(doc): Clarify no amp to full yaml docs ( #496 )
2023-08-29 06:17:37 -07:00
Maxime
36b2e1cfee
tweak: use default config file when only one file is present ( #501 )
2023-08-29 06:17:10 -07:00
Wing Lian
125cccb786
Refactor train cfg cli ( #499 )
...
* wip to cleanup cfg cli options
* fix launcher
* fix cli args
2023-08-29 05:37:53 -07:00
Aman Karmani
fd55bc87e2
use math.ceil instead of round /cc #498
2023-08-29 01:03:41 +00:00
Birch-san
8e197f6fb4
pad_to_worst_case_seq_len boolean, for testing memory limits ( #498 )
...
* pad_to_worst_case_seq_len boolean, for testing memory limits
* remove collator_pad_to_longest option since it does nothing
see docs: https://huggingface.co/docs/transformers/main_classes/data_collator#transformers.DataCollatorWithPadding.padding
True and "longest" mean the same thing
* rename to `pad_to_sequence_len, and ensure 64 alignment
---------
Co-authored-by: Aman Karmani <aman@tmm1.net >
2023-08-28 18:47:16 -04:00
Aman Karmani
267b7b24e5
simplify linear layer locator
2023-08-28 09:45:16 -04:00
Wing Lian
98bf76e236
fsdp requires params be the same type too ( #493 )
2023-08-28 04:33:50 -04:00
NanoCode012
4c37bd0b54
Fix(tokenizer): Make sure to add pad for CodeLlamaTokenizer ( #489 )
2023-08-28 09:39:10 +09:00
Aman Gupta Karmani
f144e98a32
Merge pull request #485 from maximegmd/patch-4
...
fix: finetune model inference needs the dtype fix to work with flash-attn
2023-08-27 16:27:47 -04:00
Aman Karmani
3a011ea1ef
fix condition and add logging
2023-08-27 20:09:26 +00:00
Aman Karmani
1f613e5aa7
Merge branch 'main' into patch-4
2023-08-27 19:57:34 +00:00
Aman Karmani
f319b0bc67
rename var and reformat
2023-08-27 19:55:11 +00:00
Maxime
7fd662dd89
Update src/axolotl/utils/models.py
...
Co-authored-by: Aman Gupta Karmani <aman@tmm1.net >
2023-08-27 21:01:43 +02:00
Maxime
9e699683d7
Update src/axolotl/utils/models.py
...
Co-authored-by: Aman Gupta Karmani <aman@tmm1.net >
2023-08-27 21:01:37 +02:00
mhenrichsen
35130711d6
Feat(cfg): Add code-llama configs for all sizes ( #479 )
...
* configs for all sizes
* update tokenizer type
---------
Co-authored-by: mhenrichsen <some_email@hey.com >
2023-08-27 10:20:17 +09:00
mhenrichsen
3fc9006298
Feat(deepspeed): Add zero2 config ( #476 )
...
* zero2 config
* config added
* linting
---------
Co-authored-by: mhenrichsen <some_email@hey.com >
2023-08-27 10:10:33 +09:00
NanoCode012
ad8be435ad
Feat(doc): Update eval_steps doc ( #487 )
2023-08-27 10:09:09 +09:00
Charles O. Goddard
fe4d6baf92
Add example Llama 2 ReLoRA config ( #471 )
...
* Add example Llama 2 ReLoRA config
* Use adamw_bnb_8bit in example relora config
2023-08-27 10:08:34 +09:00
Aman Gupta Karmani
f31301063d
Merge pull request #486 from OpenAccess-AI-Collective/adam-bnb-simpler
...
let transformers handle adamw_bnb_8bit
2023-08-26 20:44:19 -04:00
Aman Karmani
868530c39c
let transformers handle adamw_bnb_8bit
2023-08-26 21:40:12 +00:00
Maxime
d03887fad5
ignore: address pr review
2023-08-26 22:45:45 +02:00
Maxime
17605b85d8
fix: inference did not move the model to the correct device ( #483 )
2023-08-26 16:40:56 -04:00
Maxime
a184549e4c
ignore: linter
2023-08-26 22:36:14 +02:00
Maxime
f311df9462
fix: finetune model inference needs the dtype fix to work with flash-attn
2023-08-26 22:34:11 +02:00
Maxime
c500d02517
Fix missing 'packaging' wheel ( #482 )
2023-08-26 12:02:15 -04:00
Wing Lian
31f3e71764
fix checkpints on multigpu ( #481 )
2023-08-26 12:00:03 -04:00
Aman Gupta Karmani
56c4a94caf
Merge pull request #484 from OpenAccess-AI-Collective/reqs
...
allow newer deps in requirements.txt
2023-08-26 11:13:41 -04:00
Aman Karmani
c29117a0d7
allow newer deps
2023-08-26 15:06:05 +00:00
Wing Lian
0b7ba57ec4
fix types w lora ( #478 )
2023-08-25 02:03:24 -04:00
NanoCode012
71bd06243c
Fix(tokenizer): Fix condition to add pad token ( #477 )
...
* Fix(tokenizer): Fix condition to add pad token
* chore: fix lint
2023-08-25 14:30:50 +09:00
Wing Lian
cb9797ef5a
improve llama pad token handling ( #475 )
...
* improve llama pad token handling
* tweak logic to not clobber
2023-08-24 13:20:35 -04:00
Charles O. Goddard
bde3c5a478
ReLoRA implementation (with quantization) ( #322 )
...
* Experimental ReLoRA (+qlora) implementation
* Add CPU offload
* Remove local config
* Fix saving logic
* Remove redundant assert
* Fix logic errors
* Move ReLoRA into its own trainer class with a method override to create the proper scheduler
* Formatting & typing fixes
* Use safe_serialization
* Don't allow fsdp/deepspeed with ReLoRA
* Fix cpu-offload logic, enable multi gpu
* Document parameters and add comment
* Fix merge issue
* Smooth over some sharp edges
* Implement resume from checkpoint for relora
* Address review comments
* Fix saving logic
* Add necessary metadata to safetensors
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-08-23 23:07:18 -04:00
NanoCode012
55c23c7bcb
Fix(doc): Clarify config ( #466 )
2023-08-23 11:56:01 -04:00
Wing Lian
c69faee7a7
workaround so training doesn't hang when packed dataloader batches aren't even ( #461 )
...
* workaround so training doesn't hang when packed dataloader batches aren't even
* don't bother labeling anything in the no-op data
2023-08-23 10:39:11 -04:00
Wing Lian
d5dcf9c350
fix test fixture b/c hf trainer tokenization changed ( #464 )
2023-08-23 04:04:49 -04:00
TearGosling
f4746507f6
feat: add Metharme prompt strategy ( #446 )
...
* Add Metharme tokenizing strategy
This strategy accounts for how the Metharme JSONLs are formatted as well as adds duplicated EOS tokens which can help trim model output length.
I haven't gotten the chance to test this yet, and probably won't have the chance for quite a bit, so I'm committing this now.
* Redo Metharme tokenizing strategy
lol
* fix: oops
* Rearrange a conditional
* chore: reformat code in accordance with linter
* chore: Make lint not freak out
* chore: fix lint
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2023-08-22 11:21:45 +09:00
Wing Lian
96deb6bd67
recast loralayer, norm, lmhead + embed token weights per original qlora ( #393 )
...
* recast loralayer, norm, lmhead + embed token weights per original qlora
* try again for the fix
* refactor torch dtype picking
* linter fixes
* missing import for LoraLayer
* fix install for tests now that peft is involved
2023-08-21 18:41:12 -04:00
Wing Lian
50682a3c06
always drop samples that are too long ( #452 )
2023-08-21 16:43:33 -04:00
Wing Lian
5a1985ba24
set env var for FSDP layer to wrap ( #453 )
2023-08-21 16:43:22 -04:00
Aman Gupta Karmani
5e9c6afa10
Merge pull request #451 from OpenAccess-AI-Collective/eval-is-causal
...
is_causal fix for evals?
2023-08-21 10:43:46 -07:00
Aman Karmani
a213d9972a
fix eval regression caused in 13f7efaf74
2023-08-21 10:40:06 -07:00
Wing Lian
fbf49a4770
is_causal fix for evals?
2023-08-21 10:36:26 -04:00
Wing Lian
58cf7e7fed
add missing positional arg ( #450 )
2023-08-21 04:10:19 -04:00
NanoCode012
04a42b6db1
feat(docs): improve user customized prompts ( #443 )
...
* feat(docs): improve user customized prompts
* feat(doc): add custom pretokenized instructions
* chore: clean old data folder
* chore: add new line
2023-08-20 23:59:43 -04:00
NanoCode012
919f4cac90
feat(doc): add pillow to lambda instructions ( #445 )
2023-08-20 23:59:23 -04:00
Wing Lian
ee262818ef
fix evals ( #447 )
2023-08-20 23:39:42 -04:00
Wing Lian
9d629d8bff
gracefully handle empty input ( #442 )
2023-08-20 09:18:18 -04:00
Wing Lian
d2e7f27240
support user defined prompters, pretokenized datasets in config, local parquet, local arrow files ( #348 )
...
* support user defined prompters, pretokenized datasets in config, local parquet, local arrow files
* fix user defined dataset types
* fix for system prompts
* fix tests
* fix checks for parquet and arrow
* aha moment that d.data_files isn't used
* add documentation for ds_type to add support for parquet and arrow
2023-08-20 09:17:49 -04:00
Philpax
d21318dfb9
docs(readme): add cd axolotl ( #440 )
2023-08-19 19:14:05 -04:00
Wing Lian
f733d0f31e
disable eval using multipack for now ( #437 )
2023-08-19 10:35:04 -04:00
Wing Lian
008505c8ae
fix comma, not a tuple ( #436 )
2023-08-19 00:57:40 -04:00
Wing Lian
b3f5e00ff5
use save_strategy from config if available ( #434 )
...
* use save_strategy from config if available
* update docs for save_strategy
2023-08-18 20:28:23 -04:00
Wing Lian
5247c5004e
set env for FSDP offload params ( #433 )
2023-08-18 20:28:09 -04:00
mhenrichsen
cf6654769a
flash attn pip install ( #426 )
...
* flash attn pip
* add packaging
* add packaging to apt get
* install flash attn in dockerfile
* remove unused whls
* add wheel
* clean up pr
fix packaging requirement for ci
upgrade pip for ci
skip build isolation for requiremnents to get flash-attn working
install flash-attn seperately
* install wheel for ci
* no flash-attn for basic cicd
* install flash-attn as pip extras
---------
Co-authored-by: Ubuntu <mgh@mgh-vm.wsyvwcia0jxedeyrchqg425tpb.ax.internal.cloudapp.net >
Co-authored-by: mhenrichsen <some_email@hey.com >
Co-authored-by: Mads Henrichsen <mads@BrbartiendeMads.lan >
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-08-18 19:00:27 -04:00
Aman Gupta Karmani
06edf175ac
standardize attn hijack patches ( #381 )
...
* split sdp attn into its own patch
* sync xformers patch to follow shared format and be diffable
* update flash-attn patch for 70B/GQA and inference using helper from flash-attn tests
* speed up flash-attn inference
* fix patch to check position ids and don't use multipack for evals
* copy LlamaModel.forward and LlamaDecoderLayer.forward into monkeypatch
* update forwards so we only calculate cu_seqlens once
* enable eval dataloader using multipack again
* fix the patch to work properly and work with FSDP
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-08-18 12:54:16 -04:00
mhenrichsen
0a228479b3
adds color ( #425 )
...
* adds color
* chore: lint
* fix for colorama
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-08-18 10:59:43 -04:00
Wing Lian
82e111aba9
remove extra accelearate in requirements ( #430 )
2023-08-18 10:56:14 -04:00
Wing Lian
8cace80175
fix fixture for new tokenizer handling in transformers ( #428 )
2023-08-17 17:01:52 -04:00
Wing Lian
1b7e8604bb
fix orca prompts ( #422 )
2023-08-16 11:21:03 -04:00
NanoCode012
3d1f203b62
Fix(docs): Remove gptq+lora and fix xformer compat list ( #423 )
2023-08-16 13:56:48 +09:00
Wing Lian
d3d6fd6ae6
just resort to tags ans use main-latest ( #424 )
2023-08-16 00:39:57 -04:00
NanoCode012
b7449a997f
Fix(template): Inform to place stack trace to Issue ( #417 )
...
* Fix(template): Inform to place stack trace to Issue
* Update following suggestions
Co-authored-by: Wing Lian <wing.lian@gmail.com >
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-08-16 11:55:48 +09:00
Wing Lian
5f80b3560b
use inputs for image rather than outputs for docker metadata ( #420 )
2023-08-15 18:26:59 -04:00
Wing Lian
24959091d7
hopefully improve the README ( #419 )
...
* hopefully improve the README
* exitcode -9 help
* table of contents
* formatting
2023-08-15 15:30:53 -04:00
Wing Lian
7af816699e
tag with latest as well for axolotl-runpod ( #418 )
...
* tag with latest as well for axolotl-runpod
* no dev branch for now
2023-08-15 15:30:41 -04:00
mhenrichsen
f806e86a6e
Merge pull request #413 from mhenrichsen/chore/update-deepseed-config
...
update path to align with fsdp example
2023-08-15 20:08:23 +02:00
NanoCode012
2b990eb628
Feat(doc): Add lr_quadratic_warmup to readme ( #412 )
2023-08-16 02:55:48 +09:00
mhenrichsen
bd8cab49c9
update path to align with fsdp example
2023-08-15 19:51:58 +02:00
NanoCode012
c01015f33f
Fix(config): Update handling of deepspeed config ( #404 )
...
* Fix(config): Update handling of deepspeed config
* feat: auto set deepspeed env if deepspeed passed
* fix: update new deepspeed instructions
2023-08-16 01:22:43 +09:00
NanoCode012
72fe3f8e3d
Fix(docs): Update flash attn requirements ( #409 )
2023-08-15 22:40:52 +09:00
Wing Lian
47961fdb8b
update docs for tokenizer_legacy ( #401 )
...
* update docs for tokenizer_legacy
* add default info
2023-08-15 22:34:42 +09:00
NanoCode012
7ad37cb6d7
Fix(template): Remove iPhone/android from Issue template ( #407 )
2023-08-15 22:32:51 +09:00
Wing Lian
29241cf1e4
Ax art ( #405 )
...
* axolotl text art :D
* only print art on rank0
* lint and pr feedback
2023-08-15 08:34:30 -04:00
lightningRalf
31db0ecce4
add templates, CoC and contributing guide ( #126 )
...
* add templates, CoC and contributing guide
* Update .github/SECURITY.md
correct responsible person
Co-authored-by: Wing Lian <wing.lian@gmail.com >
* Update bug-report.yaml
axolotl version switch with axolotl branch-commit
* update CONTRIBUTING doc
* update reporting link
* linter fixes
* chore: fix linter
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2023-08-15 07:41:05 -04:00
Wing Lian
da10af03e9
fix eval steps and strategy ( #403 )
2023-08-15 07:28:50 -04:00
Wing Lian
85cf4f8e2c
better handling of empty input ids when tokenizing ( #395 )
...
* better handling of empty input ids when tokenizing
* Add warning if tokenizer resulted in empty result
* fix len comparison for linter
2023-08-15 01:09:59 -04:00
Aman Karmani
2e22404d2d
add utils.data.prepare_dataset
2023-08-14 21:28:29 -07:00
NanoCode012
be294fd605
Feat(doc): Add how to save by epochs ( #396 )
2023-08-15 13:24:25 +09:00
Wing Lian
fc2d6be96d
use context manager to run things on rank0 before others ( #397 )
2023-08-15 00:10:47 -04:00
Wing Lian
1687be6a35
don't use mask expansion for inference ( #392 )
2023-08-14 20:52:54 -04:00
NanoCode012
41ecb451c2
Feat(doc): Add max_steps to readme ( #389 )
2023-08-15 00:34:22 +09:00
Gabriel Puliatti
3c2ad00d07
Feat(config): add max steps ( #387 )
2023-08-14 11:19:29 -04:00
florian peyron
5d48a10548
Added "epoch" evaluation_strategy ( #388 )
2023-08-14 10:59:23 -04:00
NanoCode012
73a0b6ead5
Feat(config): Add hub_strategy ( #386 )
2023-08-14 07:12:55 -04:00
florian peyron
63fdb5a7fb
Error msg for sharegpt if conv has less than 2 msg ( #379 )
2023-08-14 17:40:40 +09:00
mhenrichsen
fdffef5940
new llama-2 default settings ( #370 )
...
* new default settings
* fix whitespace
* rm max packed sequence length
---------
Co-authored-by: Mads Henrichsen <mads@BrbartiendeMads.lan >
2023-08-14 17:39:09 +09:00
Wing Lian
919246fbc1
don't pass rope_scaling kwarg if it's None ( #383 )
2023-08-13 18:57:38 -04:00
Wing Lian
ffac902c1b
bump flash-attn to 2.0.4 for the base docker image ( #382 )
2023-08-13 17:55:04 -04:00
Charles Goddard
15f6e57eaa
Fix crash when running without CUDA
2023-08-13 13:36:40 -07:00
NanoCode012
729c299256
Feat(doc): Improve sharegpt doc ( #378 )
...
* Feat(doc): Improve sharegpt doc
* Fix typo
2023-08-14 00:36:00 +09:00
Wing Lian
86a91e260b
save tokenizer before training starts ( #380 )
2023-08-13 11:28:58 -04:00
Aman Gupta Karmani
094fc2c6e6
try to detect accelerate and only use device_map=None in that case ( #373 )
2023-08-13 00:32:07 -04:00
Wing Lian
2dafa730ef
Create FUNDING.yml
2023-08-13 00:30:34 -04:00
Wing Lian
343ac84e5a
fix check for flash attn branching ( #377 )
2023-08-12 22:48:08 -04:00
Aman Karmani
0c967279ce
remove unnecessary local variable
2023-08-13 01:58:39 +00:00
Aman Karmani
efb3b2c95e
simplify load_tokenizer
2023-08-12 18:55:06 -07:00
Aman Karmani
7b55fe6419
improve GPU logging to break out pytorch cache and system mem
2023-08-12 18:52:57 -07:00
Aman Karmani
e029ab34ea
quiet noise from llama tokenizer by setting pad token earlier
2023-08-12 18:31:40 -07:00
Aman Karmani
8cec513447
extract module for working with cfg
2023-08-12 18:25:27 -07:00
Aman Karmani
a13e45d548
fix DefaultDict.__or__
2023-08-13 01:15:50 +00:00
Wing Lian
918f1b0dfb
revert previous change and build ax images w docker on gpu ( #371 )
2023-08-12 20:23:00 -04:00
Wing Lian
c3fde36ada
attempt to run non-base docker builds on regular cpu hosts ( #369 )
2023-08-12 19:07:38 -04:00
Wing Lian
2bb0b78975
Attention mask and position id fixes for packing ( #285 )
...
* fix attetion mask with packing
* set position ids and use block diagonal attn mask
* fix expand mask for multiple batch items, make sure we pad position_ids
* don't move masks to cpu
* use multi pack dataloader w random sampler
* add position_ids back
* more fixes for dataloader integration
* est total tokens, fix field loop
* more fixes, position_ids seems broken
* more fixes for sample packing
* use distributed sampler, avoid accelerate prepare
* use accelerator prepare for dataloader
* fix for position_ids w packing
* Update src/axolotl/utils/dataloader.py
* validation for sample packing and doc
* more fixes for 4k and optimizations
* optimized expand mask fn
* better handling of variance in multipack dataloader length and trainer hanging when it runs out of data
* fix rounding of len of batches to int
* better handling so that all devices have the same dataloader len
* fix step calc for packing
* pass sample packing efficiency to training args
* add a test for the mask expansion for sequence packing
* only process eval dataset for packing if not None
* don't split batches when packing
* weighted CE losses
* weighted CEL fixes
* limit packing to sequences of max seq len
* seq_len_multiple for packing
* make sure the chunk size is an int
* sample_packing_seq_len_multiplier config
* use cumulative seq len with var len flash attn v2 w packing
* properly calculate max len
* fix flash-attn, xformers, packing, support chatml
* fix chatml system prompt for openorca, legacy tokenizer opts
* add chatml
* add unit tests for cum seq lens, add ability to build cu_seq_lens from positional ids, fix prompt test
* fix test and pylint checks
* more packing and dataset optimizations and fixes
* filter w multiple cpus
* more fixes and optimizations
* fixes and go back to distributed sampler since batch sampler won't work
* fix counts by accounting for num devices
* fix steps calculation
* previous accelerate is still most performant
* add numba to requirements.
* use custom distributed checks
* fix sampler to prevent overfit w new epochs
* let's not cleanup the cached datasets
* calculate cum seq lens with pos_ids instead of mask, simplify packing params, fix distributed barrier
* speed optimizations and set accelerate fsdp env vars
* optimize dataset concatenation?
* more optimizations for dataset handling
* fix import for annotation
* manual pre-commit fixes
* another sum optimization and bug fix for calc steps
* fix packing estimations
* fix formatting
* pylint problems
* add back flash attention branch for handling unpacked sequences seperately
* Address PR feedback
* add optional sample packing config params to readme
2023-08-12 15:14:56 -04:00
NanoCode012
a276c9c88d
Fix(save): Save as safetensors ( #363 )
2023-08-13 01:22:52 +09:00
Morgan McGuire
7019509daa
Add wandb_entity to wandb options, update example configs, update README ( #361 )
...
* Update wandb_entity and add wandb descriptions
* add wandb to config section
* remove trailing whitespace for pre-commit hook
* remove trailing whitespace for pre-commit hook
---------
Co-authored-by: Morgan McGuire <morganmcguire@Morgans-MacBook-Pro.local >
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-08-12 12:17:11 -04:00
NanoCode012
96bd6ae1c4
Fix(model loading): Warn when model revision is passed to gptq ( #364 )
...
* fix(model loading): warn when model revision is passed to gptq
* chore: improve message
2023-08-13 01:16:59 +09:00
NanoCode012
e37d9358e6
Fix(message): Improve error message for bad format ( #365 )
2023-08-13 01:16:18 +09:00
NanoCode012
b5212068ac
Feat: Add rope scaling ( #343 )
...
* Feat: Add rope scaling
* fix: move rope config
2023-08-13 00:50:15 +09:00
NanoCode012
289d5c403d
feat(merge): save tokenizer on merge ( #362 )
2023-08-13 00:18:10 +09:00
Aman Gupta Karmani
35c8b90306
Merge pull request #355 from tmm1/bitsandbytes-fixes
...
bump to latest bitsandbytes release with major bug fixes
2023-08-11 15:15:38 -07:00
NanoCode012
fae6ed8092
Update README.md on pretraining_dataset ( #360 )
...
* Update README.md on pretraining_dataset
* Fix message
2023-08-11 12:17:07 +09:00
NanoCode012
94d03c8402
Clarify pre-tokenize before multigpu ( #359 )
2023-08-11 11:27:42 +09:00
Aman Gupta Karmani
11ddccb80f
Merge pull request #356 from tmm1/load_model-args
...
simplify `load_model` signature
2023-08-09 18:24:34 -07:00
Aman Gupta Karmani
964312199e
Merge pull request #354 from tmm1/gpu-util
...
GPU memory usage logging
2023-08-09 15:44:18 -07:00
Aman Karmani
718102271f
simplify load_model signature
2023-08-09 22:36:02 +00:00
Aman Gupta Karmani
f5c11f8262
Merge pull request #350 from tmm1/group-len-false-examples
...
set `group_by_length` to false in all examples
2023-08-09 14:48:48 -07:00
Aman Karmani
fce40aab23
bump to latest bitsandbytes release with major bug fixes
2023-08-09 21:47:11 +00:00
Aman Karmani
9c314101d5
use newer pynvml package
2023-08-09 21:06:28 +00:00
Aman Karmani
e303d64728
log GPU memory usage
2023-08-09 18:26:28 +00:00
Aman Karmani
b4d1d22782
note pattern when using groups
2023-08-07 16:18:42 -07:00
Aman Karmani
9f99104038
update comment for group_by_length
2023-08-07 01:04:56 -07:00
Aman Karmani
36fefcf94b
set group_by_length to false in examples
2023-08-06 23:59:09 -07:00
Wing Lian
176b888a63
ensure enable_input_require_grads is called on model before getting the peft model ( #345 )
2023-08-06 18:13:10 -04:00
Jan Philipp Harries
3392270544
experimental llama 2 chat support ( #296 )
...
* experimental llama 2 chat support
* few small fixes
* llama2_chat
* small fix to follow original implementation
* small fixes and added fixtures/tests
* fix -mixed up inference and finetuning conversations
* args - small fix
* small fix
* small adjustment and warning
* fix with pre-commit
---------
Co-authored-by: Jan Philipp Harries <jpdus@users.noreply.github.com >
2023-08-06 17:40:52 -04:00
Wing Lian
bb53a165f5
add a basic ds zero3 config ( #347 )
...
better defaults for ds
2023-08-06 17:19:51 -04:00
ssmi153
10405b9995
Update XFormers Attention Monkeypatch to handle Llama-2 70B (GQA) ( #339 )
...
* Fix XFormers attention for Llama-2 70B (GQA)
Updated XFormers MonkeyPatch to handle GQA as used in Llama-2 70B. All the updated code is taken directly from the Transformers library: 07360b6c9c (diff-06392bad3b9e97be9ade60d4ac46f73b6809388f4d507c2ba1384ab872711c51) from their llama_modeling.py file.
* Catch configs without pretraining_tp
* Whitespace bug fix
Command had accidentally been moved out of if-else block.
* pre-commit formatting fixes
Thanks to @winglian
2023-08-06 11:09:04 -04:00
Jan Philipp Harries
c93655c0a3
Added Orca Mini prompt strategy ( #263 )
...
* added Orca Mini prompt strategy
* maybe this fixed precommit errors?
* pre-commits passing
---------
Co-authored-by: Jan Philipp Harries <jpdus@users.noreply.github.com >
2023-08-06 03:16:41 +09:00
Wing Lian
fe285430bc
optimize the iteration when tokenizeing large datasets ( #332 )
2023-08-04 12:12:05 -04:00
Aman Gupta Karmani
0d2e34f056
Merge pull request #336 from tmm1/flash-attn
...
Fix flash-attn + qlora not working with llama models
2023-08-03 16:25:30 -07:00
Aman Gupta Karmani
b56a6c0101
Merge pull request #337 from tmm1/readme-fix
...
update README
2023-08-03 15:14:17 -07:00
Aman Karmani
2eda9e02a9
fix typo
2023-08-03 21:04:12 +00:00
Aman Karmani
78b9efb7f4
scope flash-attn+qlora fix correctly, scope to llama, add comment
2023-08-03 19:19:39 +00:00
Aman Karmani
312a9fad07
move flash-attn monkey patch alongside the others
2023-08-03 17:20:49 +00:00
Aman Karmani
58d665943e
python 3.10 and 3.11 both work fine, as does pytorch 2.1.0.dev
2023-08-03 16:47:25 +00:00
Aman Karmani
cc7e80026e
there is no configs folder
2023-08-03 16:31:37 +00:00
mhenrichsen
dc71d8872a
feat/llama-2 examples ( #319 )
...
* qlora llama-2
* qlora llama-2
* linting
* readme
* lora added
* linting
* change group_by_length
* 13b fitting on 24gb
* grouped lengths true
* add pad token
* change out dir
---------
Co-authored-by: Mads Henrichsen <mads@Brbar-tilhrende-Mads.local >
2023-08-03 19:22:48 +09:00
Aman Karmani
248bf90f89
ensure flash-attn fixes happen in both adapter/lora modes, and use torch_dtype
2023-08-02 20:15:03 +00:00
Wing Lian
77085ea24e
qlora w flash attention fixes ( #333 )
2023-08-01 23:26:16 -04:00
Wing Lian
db2a3586f3
add peft install back since it doesn't get installed by setup.py ( #331 )
2023-07-31 16:31:53 -04:00
Wing Lian
6c9a87c8ee
pin accelerate so it works with llama2 ( #330 )
2023-07-30 22:20:06 -04:00
Wing Lian
894cba09f3
fix FSDP save of final model ( #329 )
2023-07-30 21:46:44 -04:00
Wing Lian
41a4d15d43
update README for updated docker images ( #328 )
...
* update README for updated docker images
* update readme from pr feedback
2023-07-28 16:50:03 -04:00
Wing Lian
2c37bf6c21
Prune cuda117 ( #327 )
...
* drop cuda117/torch 1.13.1 from support, pin flash attention to v2.0.1, rm torchvision/torchaudio install
* gptq base build not needed. add sm 9.0 support
2023-07-26 16:27:49 -04:00
Wing Lian
9f69c4d8c1
latest HEAD of accelerate causes 0 loss immediately w FSDP ( #321 )
2023-07-24 11:23:56 -04:00
Wing Lian
3d4984b9a5
update prompts for open orca to match the paper ( #317 )
...
fix the test for the updated system tokenizer
2023-07-22 13:49:11 -04:00
Wing Lian
ff7f18d1ed
disable gh cache for first step of docker builds too
2023-07-22 11:46:37 -04:00
Wing Lian
cf62cfd661
add runpod envs to .bashrc, fix bnb env ( #316 )
...
* hopper support for base dockerfile, add runpod envs to .bashrc
* set BNB_CUDA_VERSION env for latest bnb
* don't support hopper yet w 118
2023-07-22 10:09:38 -04:00
Wing Lian
c5df969262
don't use the gha cache w docker
2023-07-22 08:46:21 -04:00
Wing Lian
40a53ff181
Merge pull request #307 from OpenAccess-AI-Collective/xgen-user-sharegpt-tokens
...
better handling since xgen tokenizer breaks with convert_tokens_to_ids
2023-07-22 04:10:38 -04:00
Wing Lian
dcdec44347
Merge pull request #306 from ethanhs/xgen
...
Add XGen info to README and example config
2023-07-22 04:10:18 -04:00
Wing Lian
3ffb018a4c
Merge pull request #313 from OpenAccess-AI-Collective/tokenizer-llama2-embeddings
...
don't resize embeddings to multiples of 32x by default
2023-07-22 04:09:59 -04:00
Wing Lian
a94f2eecb1
Merge pull request #299 from OpenAccess-AI-Collective/flash-attention-2
...
Flash attention 2
2023-07-22 04:07:48 -04:00
Wing Lian
1066751358
don't resize embeddings to multiples of 32x by default
2023-07-22 01:52:38 -04:00
Wing Lian
1b63bf13bc
Merge pull request #308 from OpenAccess-AI-Collective/apache2-license
...
add apache 2.0 license
2023-07-21 09:50:14 -04:00
Wing Lian
5cce2a42ff
add apache 2.0 license
2023-07-21 09:49:29 -04:00
Wing Lian
2a428e8014
better handling since xgen tokenizer breaks with convert_tokens_to_ids
2023-07-21 09:24:11 -04:00
Wing Lian
cdf85fdbd5
pin flash attention 2 to the fix for backwards pass
2023-07-21 08:18:53 -04:00
Wing Lian
9b790d359b
flash attention 2
2023-07-21 08:17:46 -04:00
Ethan Smith
38811434e6
Add XGen info to README and example config
2023-07-21 00:44:50 -07:00
NanoCode012
06c61d6f13
Merge pull request #304 from OpenAccess-AI-Collective/NanoCode012-patch-1
...
Fix(readme): Improve wording for push model
2023-07-21 13:39:45 +09:00
Wing Lian
262dc29df2
Merge pull request #300 from OpenAccess-AI-Collective/pytorch-201
...
Pytorch 2.0.1
2023-07-21 00:28:38 -04:00
NanoCode012
165907fddb
Fix(readme): Improve wording for push model
2023-07-21 11:28:35 +09:00
Wing Lian
a032c9f452
fix sdp attention to use the flash/mem-efficient context manaager
2023-07-20 01:05:48 -04:00
Wing Lian
b06d3e3645
explicitly pin flash attention 1 to v1.0.9
2023-07-20 01:02:08 -04:00
Wing Lian
c58034d48c
use pytorch 2.0.1
2023-07-20 00:47:13 -04:00
NanoCode012
28fd429bcf
Merge pull request #293 from NanoCode012/fix/tokenize-speed
...
Fix(tokenizing): Use multi-core
2023-07-19 11:02:04 +09:00
NanoCode012
45ac7c4f88
feat: use multi-core
2023-07-19 10:16:54 +09:00
Wing Lian
edd6980dd9
Merge pull request #289 from OpenAccess-AI-Collective/hf_transfer
...
add hf_transfer to requirements for faster hf upload
2023-07-17 15:08:06 -04:00
Wing Lian
dc6d25124d
Merge pull request #288 from OpenAccess-AI-Collective/NanoCode012-patch-1
...
fix(readme): remove accelerate config
2023-07-17 14:46:43 -04:00
Wing Lian
6dd2e7d671
add hf_transfer to requirements for faster hf upload
2023-07-17 14:44:48 -04:00
NanoCode012
b64f411849
fix(readme): remove accelerate config
2023-07-18 01:31:02 +09:00
Wing Lian
03a59c1ed4
Merge pull request #287 from OpenAccess-AI-Collective/dataclass-fix
...
fix axolotl training args dataclass annotation
2023-07-17 06:09:23 -04:00
Wing Lian
ebaec3c406
fix axolotl training args dataclass annotation
2023-07-17 04:57:02 -04:00
Wing Lian
73e70e3996
Merge pull request #286 from OpenAccess-AI-Collective/logging-docker-fixes
...
misc fixes
2023-07-17 04:26:39 -04:00
Wing Lian
d75adb9835
misc fixes
2023-07-17 03:00:27 -04:00
Wing Lian
02224668c3
Merge pull request #283 from OpenAccess-AI-Collective/docker-git-fetch
...
git fetch fix for docker
2023-07-17 02:17:00 -04:00
Wing Lian
f162f3c7cc
set transformers cache env var in docker image
2023-07-16 23:03:54 -04:00
Wing Lian
eca3531329
git fetch fix for docker
2023-07-16 22:25:05 -04:00
Wing Lian
6f16c4569d
Merge pull request #276 from theobjectivedad/logging_enhancement
...
Logging update: added PID and formatting
2023-07-16 17:04:52 -04:00
Wing Lian
0bd09c077d
Merge pull request #280 from teknium1/main
...
Update requirements.txt
2023-07-16 16:08:58 -04:00
Wing Lian
469c08c9ba
Merge pull request #279 from NanoCode012/feat/multi-gpu-readme
...
Feat(readme): improve docs on multi-gpu
2023-07-16 16:08:37 -04:00
Wing Lian
334af625d0
Merge pull request #277 from cg123/dataset-name
...
Allow non-default dataset configurations
2023-07-16 16:08:15 -04:00
Teknium
273b3a3aa7
Update requirements.txt
...
Require latest git accelerate to fix saving checkpoint issue
2023-07-16 10:24:24 -07:00
Charles Goddard
3cdd8e4122
Add dataset name to all yaml options in README
2023-07-15 13:17:37 -07:00
NanoCode012
cf5ae6b649
Feat(readme): improve docs on multi-gpu
2023-07-16 01:07:27 +09:00
theobjectivedad
b1f4f7a34d
Fixed pre-commit problems, fixed small bug in logging_config to handle LOG_LEVEL env var
2023-07-15 12:29:35 +00:00
The Objective Dad
83237b8445
Merge branch 'OpenAccess-AI-Collective:main' into logging_enhancement
2023-07-15 06:16:04 -05:00
Charles Goddard
46032a1a1f
Fix formatting mistake
2023-07-14 20:57:27 -07:00
Charles Goddard
8bba64258e
Add example of dataset with configuration name to README
2023-07-14 20:46:21 -07:00
Charles Goddard
88089e8b32
Add ability to pass 'name' argument to load_dataset
2023-07-14 16:46:39 -07:00
NanoCode012
168a7a09cc
Merge pull request #274 from OpenAccess-AI-Collective/NanoCode012-patch-2
...
Feat: Set push to hub as private by default
2023-07-14 23:15:47 +09:00
NanoCode012
231031a0e1
Merge pull request #275 from NanoCode012/feat/safetensors
...
Feat: Add save_safetensors
2023-07-14 23:07:26 +09:00
theobjectivedad
9234b75cb4
Update log message format, IMO this is easier to read.
2023-07-14 07:36:21 -05:00
theobjectivedad
553a86b52c
Adding logging enhancement
2023-07-14 07:26:19 -05:00
NanoCode012
5daf7d5299
Merge pull request #273 from OpenAccess-AI-Collective/NanoCode012-patch-1
...
Feat(docs): Add model_revision arg
2023-07-14 21:09:50 +09:00
NanoCode012
5491278a79
Feat: Add save_safetensors
2023-07-14 13:21:47 +09:00
NanoCode012
1514739f0f
Set push to hub as private by default
2023-07-14 13:17:49 +09:00
NanoCode012
896c1aebcf
Feat(docs): Add model_revision arg
2023-07-14 12:56:07 +09:00
Wing Lian
ef17e15483
Merge pull request #272 from OpenAccess-AI-Collective/model-revision
...
support for loading a model by git revision
2023-07-13 23:12:00 -04:00
Wing Lian
69a235061b
support for loading a model by git revision
2023-07-13 22:58:25 -04:00
Wing Lian
687d889928
Merge pull request #271 from OpenAccess-AI-Collective/quadratic-warmup
...
Quadratic warmup
2023-07-10 12:48:02 -04:00
Wing Lian
c4cf567b55
Merge branch 'main' into quadratic-warmup
2023-07-10 12:42:12 -04:00
Wing Lian
c49729d2bc
better configuration for quadratic warmup
2023-07-10 11:52:59 -04:00
Wing Lian
13ac4d8de2
Merge pull request #268 from OpenAccess-AI-Collective/fix-adam-args
...
params are adam_*, not adamw_*
2023-07-08 12:33:34 -04:00
Wing Lian
19cf0bda99
params are adam_*, not adamw_*
2023-07-08 12:13:39 -04:00
Wing Lian
f74edd5b56
Merge pull request #266 from OpenAccess-AI-Collective/trust-remote-no-llama
2023-07-07 21:38:11 -04:00
Wing Lian
d69da99c2c
skip explicit model type too if using trust_remote_code
2023-07-07 21:33:11 -04:00
Wing Lian
66afb76a15
don't use llama if trust_remote_code is set since that needs to use AutoModel path
2023-07-07 21:31:02 -04:00
NanoCode012
a692ad3f4c
Merge pull request #264 from OpenAccess-AI-Collective/NanoCode012-patch-1
...
Fix(readme): local path loading and custom strategy type
2023-07-06 23:34:57 +09:00
NanoCode012
41da98b982
Fix for linter
2023-07-06 23:20:11 +09:00
NanoCode012
9e64f42e0f
Fix local path loading and custom strategy type
2023-07-06 23:08:09 +09:00
Wing Lian
b9b7d4ce92
Merge pull request #221 from utensil/local_dataset
...
[WIP] Support loading data files from a local directory
2023-07-03 09:10:13 -04:00
Wing Lian
9bed281867
Merge pull request #258 from NanoCode012/fix/deprecate-push
...
Fix future deprecation push_to_hub_model_id
2023-07-03 09:08:26 -04:00
NanoCode012
e79c8e617e
Fix future deprecation push_to_hub_model_id
2023-07-03 12:44:29 +09:00
Wing Lian
71456955f5
pin pydantic so deepspeed isn't broken
2023-07-02 22:26:51 -04:00
Wing Lian
3a783c04e4
Merge pull request #247 from OpenAccess-AI-Collective/fix-apex-base
...
update pip install command for apex
2023-07-01 06:18:25 -04:00
Wing Lian
1e5014acec
Merge pull request #255 from OpenAccess-AI-Collective/open-orca-prompts
...
open orca support
2023-07-01 01:11:23 -04:00
Wing Lian
a10da1caff
11.7.0 nvidia/cuda docker images are deprecated, move to 11.7.1
ci-cd-base / build-base (<nil>, 117, 11.7.1, 3.9, 1.13.1) (push) Has been cancelled
ci-cd-base / build-base (<nil>, 118, 11.8.0, 3.10, 2.0.0) (push) Has been cancelled
ci-cd-base / build-base (<nil>, 118, 11.8.0, 3.9, 2.0.0) (push) Has been cancelled
ci-cd-base / build-base (gptq, 118, 11.8.0, 3.9, 2.0.0) (push) Has been cancelled
pre-commit / pre-commit (push) Has been cancelled
PyTest / test (3.10) (push) Has been cancelled
PyTest / test (3.9) (push) Has been cancelled
2023-07-01 00:29:07 -04:00
Wing Lian
4066c78631
Merge pull request #246 from OpenAccess-AI-Collective/sys-prompts-instruct
...
add option for instruct w sys prompts
2023-07-01 00:27:29 -04:00
Wing Lian
78a1e1fa12
open orca support
2023-07-01 00:19:41 -04:00
NanoCode012
bc8a2e5547
Merge pull request #249 from OpenAccess-AI-Collective/NanoCode012-patch-1
...
Fix typing list in prompt tokenizer
2023-06-30 15:01:41 +09:00
NanoCode012
910ebe47f5
Merge pull request #252 from OpenAccess-AI-Collective/NanoCode012-readme-fix
...
Add cfg.push_to_hub_model_id to readme
2023-06-30 14:56:55 +09:00
NanoCode012
c146880a75
Update README.md
2023-06-30 11:33:53 +09:00
NanoCode012
77bdb7d144
Fix typing list
2023-06-29 14:29:55 +09:00
Wing Lian
530809fd74
update pip install command for apex
2023-06-28 22:36:28 -04:00
Wing Lian
924bbfddec
add option for instruct w sys prompts
2023-06-28 22:27:17 -04:00
Wing Lian
f150c027e3
Merge pull request #224 from OpenAccess-AI-Collective/system-prompt-data
...
System prompt data
2023-06-27 17:57:43 -04:00
Wing Lian
5c39c006c9
Merge pull request #244 from OpenAccess-AI-Collective/push-to-hub
...
push intermediate model checkpoints to hub
2023-06-27 17:57:30 -04:00
Wing Lian
612aabd8c4
push intermediate model checkpoints to hub
2023-06-27 15:40:25 -04:00
Wing Lian
af05883f75
Merge pull request #243 from OpenAccess-AI-Collective/unprompted-instruct
...
skip the system prompt
2023-06-25 22:50:35 -04:00
Wing Lian
05ab9092e3
skip the system prompt
2023-06-25 22:40:50 -04:00
Wing Lian
7b57ed7618
pylint for duplicated code for system prompts
2023-06-25 22:28:07 -04:00
Wing Lian
3a38271276
add tests and supoort for loader for sys prompt data
2023-06-25 22:28:07 -04:00
Wing Lian
8d20e0a3d3
initial wip to get sys prompt from dataset
2023-06-25 22:28:07 -04:00
Wing Lian
de8ed229c3
Merge pull request #240 from OpenAccess-AI-Collective/tokenizer-fast
...
optionally define whether to use_fast tokenizer
2023-06-25 12:47:55 -04:00
Wing Lian
478d8c7b8e
Merge pull request #241 from OpenAccess-AI-Collective/py3-pre-commit
...
better py3 support w pre-commit
2023-06-25 12:47:02 -04:00
Wing Lian
645c13592c
better py3 support w pre-commit
2023-06-25 10:26:02 -04:00
Wing Lian
47d601fa23
optionally define whether to use_fast tokenizer
2023-06-25 10:19:49 -04:00
Wing Lian
756dfba97b
Merge pull request #218 from OpenAccess-AI-Collective/no-fail-fast
...
don't fail fast
2023-06-23 15:42:54 -04:00
Wing Lian
91ab0592af
Merge pull request #235 from msinha251/Fixing-data-readme
2023-06-23 13:52:01 -04:00
Mahesh Sinha
0aeb7c7802
Fixing Data Readme
2023-06-21 15:34:48 +02:00
Utensil
9bdd30cdfd
Support loading data files from a local directory
...
ref: https://huggingface.co/docs/datasets/v2.13.0/en/package_reference/loading_methods#datasets.load_dataset.path
2023-06-21 08:00:58 +00:00
Wing Lian
d35278aaf1
don't fail fast
2023-06-15 16:01:27 -04:00
Wing Lian
9492d4ebb7
Merge pull request #215 from OpenAccess-AI-Collective/adamw-hyperparams-cfg
...
support adamw and grad norm hyperparams
2023-06-15 12:20:55 -04:00
Wing Lian
ad5ca4f734
Additional test case per pr
2023-06-15 10:12:47 -04:00
Wing Lian
cb9d3af5c0
add validation and tests for adamw hyperparam
2023-06-15 09:39:42 -04:00
Wing Lian
c969f0a9dc
add docs
2023-06-15 08:43:20 -04:00
Wing Lian
6d0ee4ba34
support adamw and grad norm hyperparams
2023-06-15 08:40:41 -04:00
Wing Lian
a81f52d575
Merge pull request #212 from OpenAccess-AI-Collective/doc-20230615-v1
...
add float16 docs and tweak typehints
2023-06-15 08:28:57 -04:00
Wing Lian
1925eaf1e6
Merge pull request #214 from OpenAccess-AI-Collective/fix-tokenizing-labels
...
Fix tokenizing labels
2023-06-15 08:13:43 -04:00
Wing Lian
1ab3bf3e67
fix test name
2023-06-15 02:09:33 -04:00
Wing Lian
d7635b7148
hint to what AMP means
2023-06-15 02:06:27 -04:00
Wing Lian
88e17ffc50
add float16 docs and tweak typehints
2023-06-15 02:05:31 -04:00
Wing Lian
baed440fa1
ingore duplicate code in tests
2023-06-15 02:03:53 -04:00
Wing Lian
7925ddce86
bugfix for potential off by one
2023-06-15 01:59:33 -04:00
Wing Lian
6f849809c5
Merge pull request #206 from MaciejKarasek/issue205
...
issue #205 bugfix
2023-06-14 14:23:38 -04:00
Wing Lian
c16644d05e
Merge pull request #209 from sroecker/fix_redpajama_example_tokenizer
...
Use AutoTokenizer for redpajama example
2023-06-14 14:23:21 -04:00
Steffen Röcker
945c4191a3
Use AutoTokenizer for redpajama example
2023-06-14 20:09:26 +02:00
maciej.karasek
136522f9c9
style correction
2023-06-14 20:02:09 +02:00
maciej.karasek
556fe408b3
issue #205 bugfix
2023-06-14 16:59:57 +02:00
Wing Lian
16bb6276a5
Merge pull request #92 from OpenAccess-AI-Collective/flash-optimum
...
add support for opimum bettertransformers
2023-06-14 07:50:15 -04:00
NanoCode012
06674a11f2
Merge pull request #202 from OpenAccess-AI-Collective/NanoCode012-patch-1
...
Fix sharegpt type in doc
2023-06-14 09:48:35 +09:00
NanoCode012
3513885f43
Fix sharegpt type
2023-06-14 01:10:58 +09:00
Wing Lian
06652c1c39
Merge pull request #196 from OpenAccess-AI-Collective/openllama-ft-config
...
pre-commit / pre-commit (push) Has been cancelled
PyTest / test (3.10) (push) Has been cancelled
PyTest / test (3.9) (push) Has been cancelled
tweak config to work
2023-06-13 11:51:04 -04:00
NanoCode012
068fc48978
Merge pull request #199 from NanoCode012/chore/prompter-arg
...
chore: Refactor inf_kwargs out
2023-06-13 17:56:22 +09:00
Wing Lian
aaadacf6b3
Merge pull request #200 from PocketDocLabs/main
...
Update README.md to include a community showcase
2023-06-13 04:44:34 -04:00
PocketDoc Labs
5ff547dc70
Update README.md to include a community showcase
2023-06-12 22:38:10 -07:00
NanoCode012
dc77c8ebce
chore: Refactor inf_kwargs out
2023-06-13 12:01:46 +09:00
NanoCode012
51a4c12242
Merge pull request #197 from mhenrichsen/chore/update-readme
...
chore: Fix inference README.
2023-06-13 11:53:26 +09:00
Wing Lian
4b43a66a0b
update alpaca_chat prompts for instructions to explainn the conversation
2023-06-12 18:38:38 -04:00
mhenrichsen
34ae69989f
fix inference
2023-06-12 21:39:19 +02:00
Wing Lian
7dc580b837
add axolotl trainer and quadratic warmup
2023-06-12 13:16:40 -04:00
Wing Lian
fd2c9814c9
Merge branch 'main' into flash-optimum
2023-06-12 13:12:15 -04:00
Wing Lian
2ba4ae8f46
tweak config to work
2023-06-12 10:07:18 -04:00
Wing Lian
93dacba228
Merge pull request #187 from OpenAccess-AI-Collective/strip-peft-device-map
...
peft no longer needs device_map
2023-06-12 09:10:49 -04:00
Wing Lian
8002ffb41f
Merge pull request #177 from NanoCode012/fix/landmark-patch
...
Fix landmark attention patch
2023-06-12 08:27:12 -04:00
Wing Lian
74ef5cc083
Merge pull request #192 from OpenAccess-AI-Collective/sharegpt-custom-prompt
...
misc fixes
2023-06-12 08:26:38 -04:00
Wing Lian
5e616d91c0
Merge branch 'main' into strip-peft-device-map
2023-06-12 08:25:54 -04:00
Wing Lian
94f310c7a6
Merge pull request #193 from OpenAccess-AI-Collective/config-fixes-20230612
...
config fixes
2023-06-12 08:24:52 -04:00
NanoCode012
8e568bbdae
Merge pull request #159 from AngainorDev/patch-1
...
Fix training over existing lora
2023-06-12 20:27:11 +09:00
NanoCode012
e21dab49fd
Merge pull request #194 from NanoCode012/fix/config-path
...
Fix config path after config moved
2023-06-12 19:28:12 +09:00
NanoCode012
52cde69288
Fix config path after config moved
2023-06-12 17:06:15 +09:00
Wing Lian
9a58e99e81
config fixes
2023-06-12 01:52:58 -04:00
Wing Lian
c7dee56b87
add typehints
2023-06-11 19:52:34 -04:00
Wing Lian
aac4b7691e
add new sharegpt, refactor prompt so it can be customized later, add exception if no data is processed
2023-06-11 19:42:25 -04:00
NanoCode012
f31a338cbb
Merge pull request #191 from OpenAccess-AI-Collective/NanoCode012-patch-1
...
Add save_steps and eval_steps to Readme
2023-06-12 02:55:37 +09:00
NanoCode012
4cd1deeef2
Add save_steps and eval_steps to Readme
2023-06-12 02:44:46 +09:00
Wing Lian
9ac16ed8d1
Merge pull request #190 from OpenAccess-AI-Collective/fixes-20230711-v2
...
more config pruning and migrating
2023-06-11 13:27:08 -04:00
Wing Lian
6b3f509d9e
forgot to add this file
2023-06-11 11:50:12 -04:00
Wing Lian
336aa3fd48
gptq lora llama is obviously good
2023-06-11 11:05:29 -04:00
Wing Lian
d0d7eaa4f3
update openllama and clean up paths
2023-06-11 11:03:31 -04:00
Wing Lian
a6ebf57e82
fix table formatting
2023-06-11 10:55:32 -04:00
Wing Lian
280832cec2
more matrix updates
2023-06-11 10:52:36 -04:00
Wing Lian
a43bae9ff0
update the support matrix
2023-06-11 10:44:03 -04:00
Wing Lian
effbbf6dd1
more pruning
2023-06-11 10:38:24 -04:00
Wing Lian
c9a149f9e8
add check for attr
2023-06-11 10:11:17 -04:00
Wing Lian
c530e4b9c8
more config pruning and migrating
2023-06-11 10:09:05 -04:00
Wing Lian
f620706776
Merge pull request #189 from OpenAccess-AI-Collective/fixes-20230711
...
various fixes
2023-06-11 09:49:23 -04:00
Wing Lian
77762a5d6b
get rid of some configs, formalize pythioa lora config
2023-06-11 09:41:41 -04:00
Wing Lian
14668fa54e
new validation for mpt w grad checkpoints
2023-06-11 09:26:10 -04:00
AngainorDev
b565ecf0a1
Fix strict and Lint
2023-06-11 15:23:38 +02:00
Wing Lian
fe0b76854e
match up gradient checkpointing when using lora w config
2023-06-11 09:20:40 -04:00
NanoCode012
e944311442
Merge pull request #186 from akj2018/main
...
Update FAQS.md
2023-06-11 19:45:06 +09:00
Akshay Jain
e3e7b52a5b
Update FAQS.md
...
Converted (```) to single backtick (') uniformly.
2023-06-10 23:36:14 -07:00
NanoCode012
974dc00a7d
Fix set mem_id for inference and refactor
2023-06-11 14:00:54 +09:00
NanoCode012
572d1141e6
Set mem cache args on inference
2023-06-11 12:05:37 +09:00
NanoCode012
a6190c8094
Clean up landmark patching
2023-06-11 11:59:03 +09:00
NanoCode012
563b6d89e6
Fix undefined LlamaForCausalLM and del try except
2023-06-11 11:58:31 +09:00
Wing Lian
cd0a6f6027
peft no longer needs device_map
2023-06-10 22:50:09 -04:00
Akshay Jain
0e664a5ebc
Update FAQS.md
...
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2023-06-10 19:26:12 -07:00
Akshay Jain
dd7d16d2eb
Update FAQS.md
...
Updated FAQS.md with backticks around error message
2023-06-10 19:15:50 -07:00
NanoCode012
e285e24f7f
Address PR suggestion
...
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-06-11 10:52:12 +09:00
NanoCode012
919727b4d7
Refactor landmark attention patch
2023-06-11 10:51:05 +09:00
Akshay Jain
5ffefee37f
Update FAQS.md
...
Update FAQS.md with the following statement
Error invalid argument at line 359 in file /workspace/bitsandbytes/csrc/pythonInterface.c
/arrow/cpp/src/arrow/filesystem/s3fs.cc:2598: arrow::fs::FinalizeS3 was not called even though S3 was initialized. This could lead to a segmentation fault at exit
try reinstalling bitsandbytes and transformers from source
2023-06-10 18:34:54 -07:00
Wing Lian
d9f713e4e3
Merge pull request #183 from OpenAccess-AI-Collective/inference-from-stdin
...
pass a prompt in from stdin for inference
2023-06-10 17:06:55 -04:00
Wing Lian
958da70376
fix formatting
2023-06-10 15:28:08 -04:00
Wing Lian
c4e4f8115c
pass a prompt in from stdin for inference
2023-06-10 15:07:40 -04:00
Angainor Development
a808bf913f
Fix missing cfg.
2023-06-10 20:28:49 +02:00
Wing Lian
01248253a3
Merge pull request #182 from OpenAccess-AI-Collective/fix-llama-ref
...
fix for local variable 'LlamaForCausalLM' referenced before assignment
2023-06-10 14:25:51 -04:00
Wing Lian
759e8673ce
Update scripts/finetune.py
...
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2023-06-10 14:25:21 -04:00
Wing Lian
0c6f928601
address PR feedback
2023-06-10 14:23:56 -04:00
Wing Lian
eea2731a5e
add streaming dataset support for pretraining datasets
2023-06-10 14:23:56 -04:00
Wing Lian
1db46a9c72
linting fix
2023-06-10 14:23:56 -04:00
Wing Lian
ab5cd28acf
more gpt-neox long ctx fixes
2023-06-10 14:23:55 -04:00
Wing Lian
1a82082e91
fix bettertransformers save, force it to skip after saving correctly in callback
2023-06-10 14:23:55 -04:00
Wing Lian
1210dc8fd5
more tweaks to do pre-training with bettertransformers
2023-06-10 14:23:55 -04:00
Wing Lian
488a67d75a
experimental expansion of ctx len
2023-06-10 14:23:53 -04:00
Wing Lian
71a43f8479
add validation/warning for bettertransformers and torch version
2023-06-10 14:22:31 -04:00
Wing Lian
39619028a3
use pythia-12b, neox-20b is flaky
2023-06-10 14:22:30 -04:00
Wing Lian
8792199799
add flash attn context for efficient training and attempt setting model to train mode:
2023-06-10 14:22:30 -04:00
Wing Lian
1edc30c786
add support for opimum bettertransformers
2023-06-10 14:22:30 -04:00
Wing Lian
14163c15d9
fix for local variable 'LlamaForCausalLM' referenced before assignment
2023-06-10 14:11:13 -04:00
Wing Lian
41e4f6ca31
Merge pull request #181 from OpenAccess-AI-Collective/xpos-rope
...
add support to extend context with xpos rope
2023-06-10 14:04:03 -04:00
Angainor Development
79e2a6f140
Merge branch 'main' into patch-1
2023-06-10 19:07:54 +02:00
Angainor Development
c2508987a6
Remove explicit definition of cfg.inference
2023-06-10 19:06:10 +02:00
Wing Lian
215d775147
Merge pull request #180 from Glavin001/feat/stream-inference
...
Add streaming inference & fix stopping at EOS
2023-06-10 12:04:34 -04:00
Wing Lian
f36e227eaf
formatting for linter
2023-06-10 12:00:52 -04:00
Wing Lian
5878bb1f3a
add option to readme
2023-06-10 11:57:41 -04:00
Wing Lian
a03a7d7d8b
add support to extend context with xpos rope
2023-06-10 10:29:46 -04:00
Glavin Wiechert
fec6bcc3e6
Add streaming inference & fix stopping at EOS
2023-06-10 08:14:47 +00:00
Wing Lian
931e606459
Merge pull request #179 from OpenAccess-AI-Collective/fix-max_seq_len
...
fix for max sequence len across different model types
2023-06-09 20:52:03 -04:00
Wing Lian
7f09106437
fix for max sequence len across different model types
2023-06-09 20:42:33 -04:00
NanoCode012
6b50200234
Merge pull request #178 from PocketDocLabs/main
...
Update README.md to reflect current gradient checkpointing support
2023-06-10 08:26:48 +09:00
PocketDocLabs
16f9e28048
Update README.md to reflect current gradient checkpointing support
...
Previously the readme stated gradient checkpointing was incompatible with 4-bit lora in the current implementation however this is no longer the case. I have replaced the warning with a link to the hugging face documentation on gradient checkpointing.
2023-06-09 16:10:58 -07:00
NanoCode012
b9083a7fc1
Merge pull request #176 from NanoCode012/fix/peft-import
...
Fix backward compat for peft
2023-06-10 07:56:35 +09:00
NanoCode012
aefb2fc681
Fix backward compat for peft
2023-06-10 07:46:36 +09:00
NanoCode012
b5aa8d854c
Merge pull request #169 from NanoCode012/feat/landmark
...
Feat: Add landmark attention
2023-06-10 07:26:06 +09:00
NanoCode012
4d6490bce2
Merge pull request #171 from OpenAccess-AI-Collective/NanoCode012-falcon-lora-matrix
...
Fix falcon support lora
2023-06-09 17:58:22 +09:00
NanoCode012
b242b69e10
Fix falcon support lora
2023-06-09 17:50:16 +09:00
NanoCode012
320beb20f4
Merge pull request #170 from OpenAccess-AI-Collective/NanoCode012-lambdalabs-fix
...
Feat: Improve lambda labs instruction
2023-06-09 16:52:27 +09:00
Angainor Development
bd3b537344
Feed cfg.inference
2023-06-09 08:59:05 +02:00
Angainor Development
813cfa4c14
WIP: Rely on cfg.inference
2023-06-09 08:49:32 +02:00
NanoCode012
2e13ceff37
Improve lambda labs instruction
2023-06-09 15:03:08 +09:00
NanoCode012
2a801b001a
Fix grad checkpoint and outputs param
2023-06-09 14:28:44 +09:00
NanoCode012
e44c9e0b3e
Fix patching via import instead of hijacking
2023-06-09 14:27:24 +09:00
NanoCode012
55b8542de8
Feat: Add landmark attention
2023-06-09 12:54:08 +09:00
Wing Lian
febe902517
Merge pull request #168 from bratao/main
...
Disable Wandb if no wandb project is specified
2023-06-08 22:05:56 -04:00
Bruno Cabral
f4df266842
Disable Wandb
2023-06-08 21:02:02 -03:00
NanoCode012
281dc3df59
Merge pull request #167 from NanoCode012/fix/redundant-save-eval-steps
...
Fix: Refactor out unmodified save_steps and eval_steps
2023-06-09 01:39:33 +09:00
NanoCode012
2ef4634d45
Refactor out unmodified save_steps and eval_steps
2023-06-09 01:23:13 +09:00
NanoCode012
7eae90333e
Merge pull request #166 from NanoCode012/fix/seed
...
Fix: Set to use cfg.seed or 42 for seed
2023-06-09 01:15:08 +09:00
NanoCode012
c8242de725
Merge pull request #132 from utensil/falcon-7b-qlora
...
Axolotl supports falcon + qlora
2023-06-09 01:14:03 +09:00
NanoCode012
2cfe9e9b16
Set to use cfg.seed or 42 for backward compat
2023-06-09 01:02:36 +09:00
Utensil
79a8f52181
Trim trailing whitespace
2023-06-08 23:48:57 +08:00
NanoCode012
afaa0d2c01
Merge pull request #164 from NanoCode012/fix/falcon-fsdp-validate
...
Fix: Validate falcon with fsdp
2023-06-09 00:44:12 +09:00
NanoCode012
bfd27ba55e
Fix failing test
2023-06-09 00:35:03 +09:00
NanoCode012
babf0fdb71
Validate falcon with fsdp
2023-06-09 00:29:04 +09:00
Utensil
a52f4816b0
Default wandb_project to empty as suggested
...
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2023-06-08 23:04:19 +08:00
NanoCode012
81911d112c
Merge pull request #163 from NanoCode012/feat/matmul-tf32
...
Feat: Set matmul tf32=True when tf32 passed
2023-06-09 00:01:31 +09:00
NanoCode012
52765ac588
Set matmul tf32
2023-06-08 23:41:12 +09:00
NanoCode012
73e9ea4069
Merge pull request #143 from NanoCode012/fix/deprecate-prepare-8bit-training
...
Fix future deprecate prepare_model_for_int8_training
2023-06-08 23:07:53 +09:00
NanoCode012
f8d379883d
Merge pull request #162 from NanoCode012/fix/custom-prompt-readme
...
Fix: Move custom prompts out of hidden
2023-06-08 23:05:17 +09:00
NanoCode012
04a1b77307
Merge pull request #161 from NanoCode012/fix/peft-setup
...
Fix: Update peft and gptq instruction
2023-06-08 23:01:53 +09:00
NanoCode012
2097a09d2d
Move custom prompts out of hidden
2023-06-08 22:53:56 +09:00
NanoCode012
cfff94b123
Add peft install for quickstart
2023-06-08 22:50:20 +09:00
NanoCode012
2b222de5b6
Update peft and gptq instruction
2023-06-08 22:48:26 +09:00
NanoCode012
df9528f865
Fix future deprecate prepare_model_for_int8_training
2023-06-08 21:42:10 +09:00
Angainor Development
193c73bce0
Fix training over existing lora
...
When training with Lora, and starting with an existing lora weights, current code produces a model with 0 trainable params and training can't work.
Adding the "is_trainable" param allows the loaded peft to be trained and fixes the bug.
2023-06-08 09:18:58 +02:00
Wing Lian
6abfd87d44
Merge pull request #158 from OpenAccess-AI-Collective/prompter-fixes
...
fix camel ai, add guanaco/oasst mapping for sharegpt
2023-06-07 11:02:30 -04:00
Wing Lian
59bb2197ed
fix camel ai, add guanaco/oasst mapping for sharegpt
2023-06-07 09:51:29 -04:00
Wing Lian
9a02e7e1ff
Merge pull request #155 from OpenAccess-AI-Collective/misc-fixes
...
new prompters, misc fixes for output dir missing using fsdp, and changing max seq len
2023-06-06 16:52:39 -04:00
Wing Lian
5b33e295bd
update docs
2023-06-05 22:48:16 -04:00
Wing Lian
4ac9e251b7
new prompters, misc fixes for output dir missing using fsdp, and changing max seq len
2023-06-05 22:41:00 -04:00
Utensil
c9c050316f
Default micro_batch_size to 1 for a safer start
2023-06-03 17:26:33 +08:00
Utensil
ca11ae9689
Add comments/alternatives for falcon-qlora configs
2023-06-03 15:04:02 +08:00
Wing Lian
328c3bce96
Merge pull request #149 from OpenAccess-AI-Collective/docker-clone-axolotl
...
clone in docker
2023-06-02 15:15:30 -04:00
Wing Lian
5cd2126439
shallow clone
2023-06-02 14:54:28 -04:00
Wing Lian
12620f3089
clone in docker
2023-06-02 14:52:50 -04:00
Wing Lian
4ab0c8b201
Merge pull request #148 from OpenAccess-AI-Collective/fix-device-load
2023-06-02 14:37:17 -04:00
Wing Lian
74ebbf4371
fix device map
2023-06-02 14:29:08 -04:00
Wing Lian
76a70fd739
Merge pull request #147 from OpenAccess-AI-Collective/winglian-rocker-images
...
Update README.md for correct image tags
2023-06-02 14:10:40 -04:00
Wing Lian
618816d4df
Update README.md for correct image tags
2023-06-02 14:10:23 -04:00
Wing Lian
91992cb8f5
Merge pull request #146 from FarisHijazi/main
...
added docker-compose file
2023-06-02 13:58:23 -04:00
FarisHijazi
84169d15b3
added docker-compose file
2023-06-02 18:17:43 +03:00
Wing Lian
ecfe8d0a1a
Merge pull request #142 from NanoCode012/feat/custom-prompt-readme
...
Feat: Add custom prompt readme and add missing prompt strategies to Readme
2023-06-02 07:21:04 -04:00
Wing Lian
eee44a3b47
Merge pull request #141 from NanoCode012/feat/lambdalabs-readme
...
Feat: Add lambdalabs instruction
2023-06-02 07:20:12 -04:00
NanoCode012
078a43eef8
Remove redundant instruction
2023-06-02 12:30:11 +09:00
NanoCode012
33e1890086
Add pygmalion
2023-06-02 12:27:51 +09:00
NanoCode012
1c38253692
Add other prompt_strategies
2023-06-02 12:24:44 +09:00
NanoCode012
496b83f778
Add short instruction for custom prompts
2023-06-02 12:16:20 +09:00
NanoCode012
ff68a95781
Add lambdalabs instruction
2023-06-02 12:09:40 +09:00
Utensil
fb3d40f197
falcon + qlora + xformer mbs 40 gas 2 on A6000
2023-06-01 18:29:20 +08:00
NanoCode012
288fd62431
Merge pull request #135 from NanoCode012/fix/grad-accu-readme
...
Fix: Update doc for grad_accu and add validation tests for batch size
2023-06-01 06:33:05 +09:00
NanoCode012
3c71c8debe
Update doc for grad_accu and add validation tests for batch size
2023-06-01 06:13:47 +09:00
Wing Lian
a6f5e5eaec
Merge pull request #134 from OpenAccess-AI-Collective/gas-batch-fix
...
fix batch size calculation
2023-05-31 14:24:48 -04:00
Wing Lian
5a631b305b
fix batch size calculation
2023-05-31 14:11:32 -04:00
Wing Lian
f94dd626f0
Merge pull request #130 from OpenAccess-AI-Collective/gas
...
swap batch size for gradient accumulation steps to decouple from num gpu
2023-05-31 13:03:51 -04:00
Wing Lian
5079753b7a
Merge pull request #131 from OpenAccess-AI-Collective/fix-packing-mask
...
fix packing so that concatenated sequences reset the attention
2023-05-31 13:03:37 -04:00
Wing Lian
0136f510f2
don't worry about duplicate code here
2023-05-31 12:05:43 -04:00
Utensil
72bf8aafb6
Create config-7b-qlora.yml
2023-06-01 00:00:37 +08:00
Utensil
8afb0fbaba
Axolotl supports falcon + qlora
2023-05-31 23:58:40 +08:00
Wing Lian
9b8585dc70
fix packing so that concatenated sequences reset the attention
2023-05-31 11:38:52 -04:00
Wing Lian
8eb5811d4e
Merge pull request #129 from OpenAccess-AI-Collective/builder-badge
...
add badge info to readme
2023-05-31 10:37:59 -04:00
Wing Lian
e0011fdf55
Fix base builder, missing tags
2023-05-31 09:52:03 -04:00
Wing Lian
6e9e98720e
Merge pull request #127 from OpenAccess-AI-Collective/py310-docker-runpod
...
add py310 support from base image
2023-05-31 09:39:42 -04:00
Wing Lian
c2a0792680
swap batch size for gradient accumulation steps to decouple from num gpu
2023-05-31 09:38:12 -04:00
Wing Lian
b267d24a2b
add badge info to readme
2023-05-31 09:28:44 -04:00
Wing Lian
5c3f5db38b
Add files via upload
2023-05-31 09:22:54 -04:00
Wing Lian
e3d03745ba
add py310 support from base image
2023-05-31 09:07:28 -04:00
NanoCode012
fac46002d4
Merge pull request #119 from NanoCode012/feat/update-inference
...
Feat(inference): Swap to GenerationConfig
2023-05-31 14:09:18 +09:00
NanoCode012
33d40179ba
Increase max_new_tokens
...
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-05-31 14:04:49 +09:00
Wing Lian
dcb03d6da4
Merge pull request #114 from OpenAccess-AI-Collective/accelerate-dep
...
Add accelerate dep
2023-05-31 00:47:17 -04:00
NanoCode012
0e4be625ae
Merge pull request #118 from NanoCode012/feat/torch-readme
...
Fix(readme): Fix torch missing from readme
2023-05-31 13:29:41 +09:00
NanoCode012
bdc4bd7d4e
Update README.md
2023-05-31 13:24:28 +09:00
Wing Lian
2d0ba3b818
Merge pull request #124 from OpenAccess-AI-Collective/xformers-fix
...
copy xformers attn from ooba since we removed dep on alpaca_lora_4bit
2023-05-31 00:11:40 -04:00
Wing Lian
c7021e191f
Merge pull request #120 from OpenAccess-AI-Collective/model-from-path
...
split up llama model loading so config can be loaded from base config and models can be loaded from a path
2023-05-31 00:08:38 -04:00
Wing Lian
c56818b119
don't worry about dupes
2023-05-31 00:06:47 -04:00
Wing Lian
2675fb756e
update readme for SDP
2023-05-31 00:04:54 -04:00
Wing Lian
1076bcbbca
Update src/axolotl/monkeypatch/llama_attn_hijack_xformers.py
...
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2023-05-31 00:00:19 -04:00
Wing Lian
2daa6835f0
Update src/axolotl/monkeypatch/llama_attn_hijack_xformers.py
...
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2023-05-30 23:59:05 -04:00
Wing Lian
e3c494ca7b
remove unused import and update readme
2023-05-30 23:55:45 -04:00
Wing Lian
ad0ea6aaab
black formatting
...
ignore copied file
fix linting
2023-05-30 23:50:29 -04:00
Wing Lian
876edd83d0
Merge pull request #123 from OpenAccess-AI-Collective/bas-batch
...
add support for gradient accumulation steps
2023-05-30 23:45:29 -04:00
Wing Lian
6cb2310592
copy xformers attn from ooba since we removed dep on alpaca_lora_4bit
2023-05-30 23:34:36 -04:00
Wing Lian
6fa40bf8ad
black formatting
2023-05-30 23:33:37 -04:00
Wing Lian
3aad5f3b3e
add support for gradient accumulation steps
2023-05-30 23:24:37 -04:00
Wing Lian
39a208c2bc
fix up tokenizer config, isort fix
2023-05-30 23:00:02 -04:00
Wing Lian
2520ecd6df
split up llama model loading so config can be loaded from base config and models can be loaded from a path
2023-05-30 22:32:44 -04:00
Wing Lian
c5b0af1a7e
define python version (3.10) explicitly as string in yaml
2023-05-30 22:23:35 -04:00
NanoCode012
988aeb9c34
Feat: Swap to GenerationConfig
2023-05-31 10:48:19 +09:00
NanoCode012
cf61f14bff
FIx(readme): Fix torch missing from readme
2023-05-31 10:28:49 +09:00
Wing Lian
0abcd71a85
Merge pull request #115 from OpenAccess-AI-Collective/docker-version-fixes
...
docker fixes: py310, fix cuda arg in deepspeed
2023-05-30 18:11:26 -04:00
Wing Lian
c43c5c84ff
py310, fix cuda arg in deepspeed
2023-05-30 18:02:34 -04:00
Wing Lian
36ec6e1a0e
Add accelerate dep
2023-05-30 16:36:13 -04:00
Wing Lian
13b80937f9
add release draft template for gh
pre-commit / pre-commit (push) Has been cancelled
PyTest / test (3.10) (push) Has been cancelled
PyTest / test (3.9) (push) Has been cancelled
2023-05-30 15:10:19 -04:00
Wing Lian
bbc5bc5791
Merge pull request #108 from OpenAccess-AI-Collective/docker-gptq
...
default to qlora support, make gptq specific image
2023-05-30 15:07:04 -04:00
Wing Lian
4df9da74e3
Merge pull request #105 from viktoriussuwandi/viktoriussuwandi-patch
...
Viktoriussuwandi patch
2023-05-30 15:05:23 -04:00
Wing Lian
2531ea24c1
Merge pull request #106 from fearnworks/qlora-openllama-3b-example
...
Qlora openllama 3b example
2023-05-30 15:05:05 -04:00
Wing Lian
01a75fd027
Merge pull request #98 from NanoCode012/feat/pre-commit
...
Add pre-commit: black+flake8+pylint+mypy+isort+bandit
2023-05-30 14:57:15 -04:00
NanoCode012
b81c97ff76
Fix pre-commit for rebased files
2023-05-31 03:01:38 +09:00
NanoCode012
594e72b6e8
Fix incorrect rebase
2023-05-31 02:58:50 +09:00
NanoCode012
25eeeeba0b
Fix sharegpt prompt
2023-05-31 02:55:21 +09:00
Wing Lian
cfcc549f6b
fix relative path for fixtures
2023-05-31 02:55:21 +09:00
NanoCode012
a1f9850b91
Fix security issue or ignore false positives
2023-05-31 02:53:53 +09:00
NanoCode012
83d29209f7
Add bandit
2023-05-31 02:53:53 +09:00
NanoCode012
d011422200
Add isort
2023-05-31 02:53:53 +09:00
NanoCode012
b1cc54b14a
Update pip install to also setup tests
2023-05-31 02:53:53 +09:00
NanoCode012
c17dae6d07
Update src/axolotl/prompt_strategies/alpaca_instruct.py
...
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-05-31 02:53:53 +09:00
NanoCode012
37293dce07
Apply isort then black
2023-05-31 02:53:53 +09:00
NanoCode012
96e8378692
Delete extract_lora.py
2023-05-31 02:53:53 +09:00
NanoCode012
e9650d3ae4
Fix mypy typing
2023-05-31 02:53:53 +09:00
NanoCode012
f1232b35ba
Update mypy dependencies
2023-05-31 02:53:53 +09:00
NanoCode012
741a3f2edc
Add mypy
2023-05-31 02:53:53 +09:00
NanoCode012
0dd35c74af
Ignore unsupported-binary-operation
2023-05-31 02:53:53 +09:00
NanoCode012
db288e9b13
Set python version
2023-05-31 02:53:53 +09:00
NanoCode012
be22551435
Fix unsupported operand type(s) for |
2023-05-31 02:53:53 +09:00
NanoCode012
b832a0ac62
Black formatting
2023-05-31 02:53:53 +09:00
NanoCode012
afb31e13a3
Add badge and update contribution section
2023-05-31 02:53:53 +09:00
NanoCode012
1bf1f59a41
Move black to dev requirements
2023-05-31 02:53:53 +09:00
NanoCode012
8e46c0fb0d
Refactor duplicate code between Prompter and Pygmalion
2023-05-31 02:53:53 +09:00
NanoCode012
1f3c3f5ea0
Lint validation
2023-05-31 02:53:53 +09:00
NanoCode012
0e952889dc
Lint test_dict
2023-05-31 02:53:53 +09:00
NanoCode012
9c6750a075
Lint wandb
2023-05-31 02:53:53 +09:00
NanoCode012
c2dbf2c526
Lint validation
2023-05-31 02:53:53 +09:00
NanoCode012
e6b57decbd
Lint tokenization
2023-05-31 02:53:53 +09:00
NanoCode012
fe1f4c4e7d
Lint schedulers
2023-05-31 02:53:53 +09:00
NanoCode012
dae14e5951
Ignore too-many-instance-attributes
2023-05-31 02:53:53 +09:00
NanoCode012
633ff2150f
Lint dict
2023-05-31 02:53:53 +09:00
NanoCode012
5d86137f70
Lint prompt_tokenizers
2023-05-31 02:53:53 +09:00
NanoCode012
01c8a333b3
Lint pygmalion
2023-05-31 02:53:53 +09:00
NanoCode012
7eb33a77dd
Lint test_prompters
2023-05-31 02:53:53 +09:00
NanoCode012
1645a4ddd5
Lint creative_acr
2023-05-31 02:53:53 +09:00
NanoCode012
145b060cbe
Lint alpaca_instruct
2023-05-31 02:53:53 +09:00
NanoCode012
8cc0aadcb8
Lint alpaca_chat
2023-05-31 02:53:53 +09:00
NanoCode012
6abb7f6a16
Lint datasets
2023-05-31 02:53:53 +09:00
NanoCode012
de2406c488
Lint convert.py
2023-05-31 02:53:53 +09:00
NanoCode012
8b617cc7f6
Lint setup.py
2023-05-31 02:53:53 +09:00
NanoCode012
ddb86ea821
Lint trainer.py
2023-05-31 02:53:53 +09:00
NanoCode012
1a2bd7ff62
Ignore too-few-public-methods
2023-05-31 02:53:23 +09:00
NanoCode012
82971e1565
Lint finetune.py
2023-05-31 02:53:23 +09:00
NanoCode012
f4e5d86268
Lint models.py
2023-05-31 02:53:23 +09:00
NanoCode012
daf47ccf45
Refactor disable pylint
2023-05-31 02:53:23 +09:00
NanoCode012
545cfeb5c7
Refactor error code to use full error message
2023-05-31 02:53:23 +09:00
NanoCode012
69722aeef4
Remove fixme disable
2023-05-31 02:53:23 +09:00
NanoCode012
5658717dbd
Remove disable too many arg
2023-05-31 02:53:23 +09:00
NanoCode012
e8717d3bef
Remove disable
2023-05-31 02:53:23 +09:00
NanoCode012
54c3b5b25f
Ignore too-many-arguments
2023-05-31 02:53:23 +09:00
NanoCode012
5062eca069
Lint callbacks.py
2023-05-31 02:53:23 +09:00
NanoCode012
cb4f0e9342
Lint prompters.py
2023-05-31 02:53:23 +09:00
NanoCode012
4c0eddb3f8
Refactor
2023-05-31 02:53:23 +09:00
NanoCode012
1c60c10e00
Lint flash_attn.py
2023-05-31 02:53:23 +09:00
NanoCode012
903ea3080d
Fix lint
2023-05-31 02:53:23 +09:00
NanoCode012
cb7cd3429f
Fix data.py lint
2023-05-31 02:53:23 +09:00
NanoCode012
d57ba56746
Ignore import and too many * pylint errors
2023-05-31 02:53:23 +09:00
NanoCode012
c3a4697016
Update ignores
2023-05-31 02:53:22 +09:00
NanoCode012
392dfd9b07
Lint and format
2023-05-31 02:53:22 +09:00
NanoCode012
a98deb31a6
Add config files
2023-05-31 02:53:22 +09:00
NanoCode012
36596adaf7
Add pre-commit: black+flake8+pylint
2023-05-31 02:53:22 +09:00
Wing Lian
a924a33b45
Merge pull request #111 from OpenAccess-AI-Collective/sharegpt-token-tests
...
PyTest / test (3.10) (push) Has been cancelled
PyTest / test (3.9) (push) Has been cancelled
add unit test for sharegpt tokenization
2023-05-30 11:18:31 -04:00
Wing Lian
e65aeedce7
fix relative path for fixtures
2023-05-30 10:38:20 -04:00
jphillips
6cee881d64
Update examples/qlora-openllama-3b/README.md
...
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2023-05-30 09:33:33 -05:00
Wing Lian
e6fdeb087f
add unit test for sharegpt tokenization
2023-05-30 10:28:17 -04:00
Wing Lian
48612f8376
cleanup from pr feedback
2023-05-30 09:56:30 -04:00
Wing Lian
d91a769b88
update docs
2023-05-29 20:37:32 -04:00
Wing Lian
6ef96f569b
default to qlora support, make gptq specific image
2023-05-29 20:34:41 -04:00
jphillips
ac85c0ed36
Add Readme, Clean up comments
2023-05-29 14:35:58 -05:00
Wing Lian
e43bcc6c4f
move CUDA_VERSION_BNB arg inside of stage build scope
2023-05-29 13:30:15 -04:00
jphillips
f1fbf666f7
Merge branch 'main' of https://github.com/OpenAccess-AI-Collective/axolotl into qlora-openllama-3b-example
2023-05-29 09:09:43 -05:00
jphillips
370d057096
Add qlora-openllama-3b example
2023-05-29 09:07:46 -05:00
Wing Lian
00323f0a6f
fix CUDA_VERSION_BNB env var
2023-05-29 08:06:22 -04:00
Viktorius Suwandi
e0ccaccce2
Update wandb_log_model on vicuna_13B_4bit_reflect.yml
2023-05-29 16:34:13 +07:00
Viktorius Suwandi
15e57ba6ee
Update wandb_log_model on config.yml
2023-05-29 16:33:20 +07:00
Viktorius Suwandi
4eb68ac3f7
Update wandb_log_model on config-3b.yml
2023-05-29 16:32:49 +07:00
Viktorius Suwandi
b6a539b53c
Update wandb_log_model on cerebras_1_3B_alpaca.yml
2023-05-29 16:32:20 +07:00
Viktorius Suwandi
abddcf4dfe
Update wandb_log_model on pythia_1_2B_alpaca.yml
2023-05-29 16:31:53 +07:00
Viktorius Suwandi
15aabd2903
Update wandb_log_model on llama_7B_jeopardy.yml
2023-05-29 15:44:01 +07:00
Viktorius Suwandi
232b931081
Update wandb_log_model on llama_65B_alpaca.yml
2023-05-29 15:43:43 +07:00
Viktorius Suwandi
0736f4f9c1
Update wandb_log_model on llama_13B_alpaca.yml
2023-05-29 15:43:20 +07:00
Viktorius Suwandi
d77d736631
Update wandb_log_model on llama_7B_alpaca.yml
2023-05-29 15:43:01 +07:00
Viktorius Suwandi
fad06befee
Update wandb_log_model on config.yml
2023-05-29 15:42:38 +07:00
Viktorius Suwandi
2aacf75ee1
Update wandb_log_model on galactica_1_3B.yml
2023-05-29 15:42:19 +07:00
Viktorius Suwandi
71871345a6
Update wandb_log_model on llama_7B_4bit.yml
2023-05-29 15:41:59 +07:00
Viktorius Suwandi
0d14e951a8
Update wandb_log_model on stability_3b.yml
2023-05-29 15:41:42 +07:00
Viktorius Suwandi
84fc217f79
Update wandb_log_model on gpt_neox_20b.yml
2023-05-29 15:41:24 +07:00
Viktorius Suwandi
f317296259
Update wandb_log_model on quickstart.yml
2023-05-29 15:40:58 +07:00
Viktorius Suwandi
42a971df32
Update wandb_log_model on sample.yml
2023-05-29 15:39:42 +07:00
Wing Lian
7f7fd68e8e
Merge pull request #104 from OpenAccess-AI-Collective/training-fixes-20230529
...
bnb fix, trainer debug fix
2023-05-29 02:19:03 -04:00
Wing Lian
21f17cca69
bnb fixes
2023-05-29 00:06:35 -04:00
Wing Lian
319e34bfb5
Merge pull request #101 from OpenAccess-AI-Collective/sharegpt-conv
...
refactor conversation plucking in sharegpt
2023-05-28 19:43:54 -04:00
Wing Lian
809ccebb38
use python setup install, bdist wheel is unreliable in installing extension
2023-05-28 15:49:13 -04:00
Wing Lian
21c8e2deab
refactor conversation plucking in sharegpt
2023-05-28 14:36:33 -04:00
Wing Lian
8fe12e3bc1
Merge pull request #100 from OpenAccess-AI-Collective/py310-tests
...
add py310 to the test matrix
2023-05-28 14:31:07 -04:00
Wing Lian
37fc85ac52
Merge pull request #99 from OpenAccess-AI-Collective/hf_use_auth_token
...
new hf_use_auth_token setting so login to hf isn't required
2023-05-28 14:30:04 -04:00
Wing Lian
658ed86cb5
add py310 to the test matrix
2023-05-28 14:25:57 -04:00
Wing Lian
fd5f9656a2
update for pr feedback
2023-05-28 14:23:27 -04:00
Wing Lian
1c33eb88a7
new hf_use_auth_token setting so login to hf isn't required
2023-05-28 13:08:49 -04:00
Wing Lian
a798ba1659
ensure libbitsandbytes*.so gets included with wheel
2023-05-28 12:28:37 -04:00
NanoCode012
666febcfb5
Merge pull request #97 from NanoCode012/feat/test-validation
...
Feat: Update validate_config and add tests
2023-05-29 00:38:22 +09:00
NanoCode012
52dd92a0cd
Feat: Update validate_config and add tests
2023-05-29 00:25:54 +09:00
Wing Lian
88889590ec
Merge pull request #90 from NanoCode012/feat/addict
...
Feat: Convert attrdict to addict
2023-05-28 10:43:07 -04:00
NanoCode012
f87bd20555
Fix incorrect syntax in test
2023-05-28 23:35:29 +09:00
NanoCode012
dd83a20c27
Update test to run on PR
2023-05-28 23:30:17 +09:00
NanoCode012
7bf2069afd
Apply black formatter
2023-05-28 23:14:04 +09:00
NanoCode012
923151ffab
Add test for DictDefault
2023-05-28 23:06:10 +09:00
NanoCode012
56f9ca5709
refactor: fix previous refactors
2023-05-28 23:06:10 +09:00
NanoCode012
8bd7a49cd7
Refactor to use DictDefault instead
2023-05-28 23:06:10 +09:00
NanoCode012
18d41cee4a
Add DictDefault
2023-05-28 23:06:10 +09:00
NanoCode012
93acb648bd
Fix load error
2023-05-28 23:06:10 +09:00
NanoCode012
bdfe7c9201
Convert attrdict to addict
2023-05-28 23:06:10 +09:00
Wing Lian
0d4a7f4c04
Merge pull request #67 from OpenAccess-AI-Collective/refactor-tokenizer-load
...
load the tokenizer seperately from the model
2023-05-28 08:49:34 -04:00
Wing Lian
af3aacbe16
Merge pull request #93 from OpenAccess-AI-Collective/dev-base
...
cuda properly compiled bitsandbytes for qlora support
2023-05-27 19:40:29 -04:00
Wing Lian
cc67862dd3
move list not in list logic to fn
2023-05-27 16:42:05 -04:00
Wing Lian
cf37980395
fix missing run coninuation
2023-05-27 15:28:54 -04:00
NanoCode012
ed2dd77e35
Merge pull request #89 from OpenAccess-AI-Collective/NanoCode012-update-action-version
...
Feat: Update actions version
2023-05-28 02:12:26 +09:00
NanoCode012
2b8c28bab8
Update actions version
2023-05-28 01:51:10 +09:00
Wing Lian
312b8d51d6
update docker to compile latest bnb to properly support qlora
2023-05-27 12:36:53 -04:00
NanoCode012
782996d94a
Merge pull request #86 from OpenAccess-AI-Collective/NanoCode012-warning-remote-code
...
Feat: Add warning for `trust_remote_code`
2023-05-28 01:29:35 +09:00
NanoCode012
b50d7d311c
Merge pull request #88 from OpenAccess-AI-Collective/NanoCode012-completion-prompter-no-inherit
...
Fix: Remove base class inherit for CompletionPrompter
2023-05-28 01:29:03 +09:00
Wing Lian
35af017001
Merge pull request #87 from OpenAccess-AI-Collective/add_prompter_tests
...
automated testing in github actions
2023-05-27 12:21:23 -04:00
Wing Lian
a653392287
use requirements file for tests
2023-05-27 12:17:46 -04:00
Wing Lian
72b6ca0d9f
cache pip
2023-05-27 12:16:54 -04:00
Wing Lian
7f53fd2ab6
alright, just local install it
2023-05-27 12:16:06 -04:00
Wing Lian
c29d33352c
move python path to same step as tests
2023-05-27 12:06:23 -04:00
Wing Lian
403af0b1d7
fix path and streamline pip installs
2023-05-27 11:58:37 -04:00
NanoCode012
9ac1884323
Fix: Remove base class inherit for CompletionPrompter
2023-05-28 00:51:35 +09:00
Wing Lian
d199d6c261
automated testing in github actions
2023-05-27 11:51:01 -04:00
NanoCode012
2824423d10
Add warning for trust_remote_code
2023-05-28 00:46:56 +09:00
NanoCode012
cb18856fc2
Merge pull request #85 from NanoCode012/fix/add-dataset-shard-readme
...
Feat: Add `dataset_shard_num` and `dataset_shard_idx` to Readme
2023-05-27 23:52:50 +09:00
NanoCode012
8626b54aab
Add dataset_shard_num and dataset_shard_idx
2023-05-27 23:51:17 +09:00
Wing Lian
87dffbc451
Merge pull request #75 from Thytu/refactor/rename-4b-to-gptq
...
refactor: change 4bit nomenclature to gptq
2023-05-27 09:37:57 -04:00
Wing Lian
147241ca66
Merge branch 'main' into refactor/rename-4b-to-gptq
2023-05-27 09:37:52 -04:00
Wing Lian
7e974decb7
Merge pull request #76 from OpenAccess-AI-Collective/truthy-validation
...
Truthy validation
2023-05-27 09:36:10 -04:00
Wing Lian
11fd39b1f5
Merge pull request #78 from OpenAccess-AI-Collective/falcoln-support
...
falcon: sane starter defaults and add lora support
2023-05-27 09:35:56 -04:00
Wing Lian
157420df13
sane starter defaults and add lora
2023-05-27 09:33:14 -04:00
Wing Lian
679ffd7395
Merge pull request #77 from OpenAccess-AI-Collective/falcoln-support
...
add example for falcon support
2023-05-27 09:18:48 -04:00
Wing Lian
d5f944ce2a
add example for falcoln support
2023-05-27 09:16:43 -04:00
Wing Lian
4c906339f7
fix auto linear modules for lora w/o any set already
2023-05-27 08:49:43 -04:00
Wing Lian
4c500f5830
checking for False is not sufficent for NoneType/unset configs
2023-05-27 08:43:48 -04:00
Thytu
7cf07fc8b3
refactor(example): rename 4bit-lora-7b by gptq-lora-7b
...
Signed-off-by: Thytu <vdmatos@gladia.io >
2023-05-27 12:37:53 +00:00
Thytu
dd0065773a
refactor(param): rename load_4bit config param by gptq
...
Signed-off-by: Thytu <vdmatos@gladia.io >
2023-05-27 12:36:03 +00:00
Wing Lian
ca1bb92337
Update src/axolotl/utils/models.py
...
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2023-05-26 17:51:24 -04:00
Wing Lian
933e970cb5
Update src/axolotl/utils/models.py
...
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2023-05-26 17:51:17 -04:00
Wing Lian
c3d256271e
fix wheel install glob
2023-05-26 10:37:02 -04:00
NanoCode012
46c5a44003
Merge pull request #69 from OpenAccess-AI-Collective/NanoCode012-quickstart-disable-xformers
...
Fix: Disable xformers for QuickStart config
2023-05-26 22:40:16 +09:00
NanoCode012
ec3c0314bf
Merge pull request #65 from NanoCode012/feat/target-linear
...
Feat: Add `cfg.lora_target_linear`
2023-05-26 22:39:38 +09:00
NanoCode012
79560934f9
Disable formers for QuickStart config
2023-05-26 22:23:38 +09:00
NanoCode012
353cebd838
Merge pull request #68 from OpenAccess-AI-Collective/NanoCode012-patch-1
...
Fix: Incorrect recommendation condition
2023-05-26 22:20:31 +09:00
NanoCode012
fe0e69f4f9
Fix recommendation condition
2023-05-26 22:19:50 +09:00
Wing Lian
1fc9b44e3d
fix wheel blobs in dockerfile
2023-05-26 07:40:11 -04:00
Wing Lian
32e6fe9286
load the tokenizer seperately from the model
2023-05-26 07:29:35 -04:00
NanoCode012
919623793a
Add cfg.lora_target_linear
2023-05-26 14:32:30 +09:00
Wing Lian
bbfc333a01
Merge pull request #62 from OpenAccess-AI-Collective/qlora-fixes
...
Qlora fixes
2023-05-26 00:28:16 -04:00
Wing Lian
a5bf838685
add logging and make sure model unloads to float16
2023-05-26 00:09:55 -04:00
Wing Lian
a4f12415a0
update readme and add typehints
2023-05-25 23:10:11 -04:00
Wing Lian
48f4c0571e
fix validation for qlora merge
2023-05-25 23:02:03 -04:00
Wing Lian
1987e5cf56
qlora and 4bit check so we are able to merge and unload
2023-05-25 22:55:13 -04:00
Wing Lian
e7e1a777bd
fix bool args according to python fire docs
2023-05-25 22:45:41 -04:00
Wing Lian
7b5e762be2
fix merge conflict failure, black format
2023-05-25 22:40:27 -04:00
Wing Lian
3f6017db9e
qlora merge and load requires that base model isn't loaded in 4 or 8 bit
2023-05-25 22:39:13 -04:00
Wing Lian
34c99f9812
fixes to make qlora actually work
2023-05-25 22:37:23 -04:00
NanoCode012
3815c054b6
Merge pull request #61 from NanoCode012/feat/update-readme
...
Feat: Update readme
2023-05-26 11:27:31 +09:00
NanoCode012
85326bfbf3
Update quickstart config
2023-05-26 11:15:57 +09:00
NanoCode012
e689069afd
Add xformers error
2023-05-26 11:12:03 +09:00
NanoCode012
d7d8bc739e
Add strict yml
2023-05-26 11:10:59 +09:00
NanoCode012
60e32ff457
Fix shard config
2023-05-26 11:09:28 +09:00
Wing Lian
259262bf42
fix xentropy wheel name typo
2023-05-25 17:25:38 -04:00
Wing Lian
2e56203b50
another fix for shard and train split
2023-05-25 17:23:57 -04:00
Wing Lian
be3d3963cd
Merge pull request #58 from OpenAccess-AI-Collective/shards-fix
...
shard fix
2023-05-25 16:32:31 -04:00
Wing Lian
ac79360161
shard fix
2023-05-25 16:31:59 -04:00
Wing Lian
b2fb61845e
Merge pull request #54 from OpenAccess-AI-Collective/winglian-patch-1
...
add discord link to #axolotl-help channel
2023-05-25 12:45:19 -04:00
Wing Lian
71d600fc43
Merge branch 'main' into winglian-patch-1
2023-05-25 12:45:13 -04:00
Wing Lian
4fd0c2d1b9
Merge pull request #57 from OpenAccess-AI-Collective/fixes-for-basic-samples
...
fixes w/ example for super basic lora starter
2023-05-25 12:43:22 -04:00
Wing Lian
943961fd10
missed ...
2023-05-25 12:42:56 -04:00
Wing Lian
d2a6f79fd1
change auth token setting back
2023-05-25 12:41:17 -04:00
Wing Lian
98b1bce57e
pr comments addressed
2023-05-25 12:25:07 -04:00
Wing Lian
004820209d
Update src/axolotl/prompters.py
...
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2023-05-25 12:21:02 -04:00
Wing Lian
8d6a28953f
fix relative path in flash-attn build:
2023-05-25 12:18:28 -04:00
Wing Lian
e396654319
fix tokenizer loading, got openllama 3b working
2023-05-25 12:15:12 -04:00
Wing Lian
a5d739b66b
fixes w/ example for super basic lora starter
2023-05-25 11:59:08 -04:00
Wing Lian
951facbb1f
Merge pull request #56 from OpenAccess-AI-Collective/fix-build-flash-attn
...
fix cd within flash-attn
2023-05-25 11:29:47 -04:00
Wing Lian
f5fa3d131b
fix cd within flash-attn
2023-05-25 11:29:15 -04:00
NanoCode012
7ec105041d
Merge pull request #48 from NanoCode012/feat/update-readme
...
Feat: Minor update readme from dev changes
2023-05-25 23:49:58 +09:00
NanoCode012
a9e502ef45
Update 4bit notes
2023-05-25 23:48:18 +09:00
NanoCode012
68f0c71424
Merge pull request #49 from NanoCode012/feat/gitignore
...
Feat: Update gitignore using standard Python template
2023-05-25 23:42:49 +09:00
NanoCode012
52fb6d8a34
Update gitignore using standard Python template
2023-05-25 23:07:27 +09:00
NanoCode012
f92245dbd6
Fix missing closing code block
2023-05-25 23:06:33 +09:00
NanoCode012
e65c203e9e
Add more detail on minimum GPU
2023-05-25 23:06:33 +09:00
NanoCode012
1377400c33
Add info on Runtime Error
2023-05-25 23:06:33 +09:00
NanoCode012
2c34f8d0c7
Update dataset type
2023-05-25 23:06:33 +09:00
NanoCode012
7bc28eb8a8
Add more data formats
2023-05-25 23:06:33 +09:00
NanoCode012
29273b5a5b
Add other minor configs
2023-05-25 23:06:33 +09:00
NanoCode012
05c18340d6
Update scheduler configs
2023-05-25 23:06:33 +09:00
NanoCode012
5b712afbe4
Update bf16 options
2023-05-25 23:06:33 +09:00
NanoCode012
9083910036
Update lora config
2023-05-25 23:06:33 +09:00
NanoCode012
8552218491
Improve Inference instruction
2023-05-25 23:06:33 +09:00
Wing Lian
de2a7335e6
Merge pull request #55 from OpenAccess-AI-Collective/missing-validation-file
...
add missing file
2023-05-25 09:58:51 -04:00
Wing Lian
1d7da3b389
add missing file
2023-05-25 09:58:29 -04:00
Wing Lian
e07bd8a441
add discord link to #axolotl-help channel
2023-05-25 09:45:45 -04:00
Wing Lian
d092cdb19b
Merge pull request #52 from OpenAccess-AI-Collective/bugfix-cfg-cfg
...
cfg.cfg fix, also de-dupe lora module list
2023-05-25 09:35:24 -04:00
Wing Lian
f523a0894c
stray s
2023-05-25 09:23:56 -04:00
Wing Lian
676d7da661
cfg.cfg fix, also de-dupe lora module list
2023-05-25 09:18:57 -04:00
Wing Lian
a617f1b65e
Merge pull request #44 from OpenAccess-AI-Collective/qlora-add-modules-tuple
...
fix tuple add to list
2023-05-24 23:46:40 -04:00
Wing Lian
a8771b0aad
fix tuple add to list
2023-05-24 23:46:04 -04:00
Wing Lian
cf48ff7cac
Merge pull request #41 from OpenAccess-AI-Collective/qlora-modules
...
attempt to find linear modules for qlora
2023-05-24 23:31:19 -04:00
Wing Lian
1cf21daf51
Update src/axolotl/utils/models.py
...
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2023-05-24 23:31:12 -04:00
Wing Lian
ffd1043607
attempt to find linear modules for qlora
2023-05-24 23:18:08 -04:00
Wing Lian
3369c4dcf8
Merge pull request #39 from OpenAccess-AI-Collective/dev
...
Dev to main
2023-05-24 23:03:22 -04:00
Wing Lian
bc97f9c584
remove dev specific remark
2023-05-24 23:00:53 -04:00
Wing Lian
ce34d64e8a
apply black formatting
2023-05-24 22:59:33 -04:00
Wing Lian
ce694e20a3
Merge branch 'main' of github.com:OpenAccess-AI-Collective/axolotl into dev
2023-05-24 22:59:09 -04:00
Wing Lian
cebea372da
Merge pull request #36 from OpenAccess-AI-Collective/qlora
...
Qlora
2023-05-24 22:57:37 -04:00
Wing Lian
1f5d83ea72
remove un-needed code, add validation
2023-05-24 22:47:43 -04:00
Wing Lian
6e7d4d5344
Merge pull request #35 from NanoCode012/update-readme
...
Feat: Rewrite Readme
2023-05-24 21:31:32 -04:00
NanoCode012
362821ce84
Add trust_remote_code config
2023-05-25 09:53:49 +09:00
NanoCode012
224d186ec9
Simplify docker instruction
2023-05-25 09:51:22 +09:00
NanoCode012
5417824b31
Add seq length
2023-05-25 09:50:43 +09:00
NanoCode012
e1a91b0918
Remove redundant formats
2023-05-25 09:48:18 +09:00
NanoCode012
2a1b5728e6
Add line break
2023-05-25 09:37:18 +09:00
NanoCode012
702f2eee4b
Fix inference command
2023-05-25 09:36:33 +09:00
NanoCode012
88bba24d9e
Clean up data readme
2023-05-25 09:34:35 +09:00
NanoCode012
ba9ac723f1
Update quickstart. Add common error and contribution section.
2023-05-25 09:32:04 +09:00
NanoCode012
db73b94a58
Add image. Add quickstart. Simplify dataset.
2023-05-25 09:32:04 +09:00
NanoCode012
00dfe43b1d
Add image
2023-05-25 09:32:04 +09:00
NanoCode012
9aab0b8cfe
Update Docker instructions
2023-05-25 09:32:04 +09:00
NanoCode012
857a80b70e
Format dataset types
2023-05-25 09:32:04 +09:00
NanoCode012
cba0048067
Update typo
2023-05-25 09:32:04 +09:00
NanoCode012
c22df8db9b
Add all dataset types
2023-05-25 09:32:04 +09:00
NanoCode012
68237ea90a
Add extra note to Readme
2023-05-25 09:32:04 +09:00
NanoCode012
4ee79f2641
Fix typo
2023-05-25 09:32:04 +09:00
NanoCode012
2b436680a0
Add new config options to Readme
2023-05-25 09:32:04 +09:00
NanoCode012
04d281312c
Feat: Rewrite Readme
2023-05-25 09:32:04 +09:00
Wing Lian
7e81ca720b
Update requirements.txt
...
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2023-05-24 15:44:48 -04:00
Wing Lian
3960936bf7
Merge pull request #37 from Thytu/main
...
fix: handles AutoTokenizer from untrusted source
2023-05-24 15:42:41 -04:00
Valentin De Matos
88ad05df54
fix: handles AutoTokenizer from untrusted source
...
Set trust_remote_code param depending of cfg.trust_remote_code when calling AutoTokenizer.from_pretrained
2023-05-24 20:57:10 +02:00
Wing Lian
e8aacfbd7c
more qlora support
2023-05-24 14:33:18 -04:00
Wing Lian
b9d07aa95a
prepare does all this already for qlora?
2023-05-24 14:32:39 -04:00
Wing Lian
3b4d055edd
integrate qlora? maybe?
2023-05-24 14:32:39 -04:00
Wing Lian
2ae936fbc4
fix missing fp16 kwarg
2023-05-23 20:44:24 -04:00
Wing Lian
fb100a9ee1
fix enum pass as value
2023-05-23 11:34:03 -04:00
Wing Lian
3a503770e4
Add qa style data for alpaca instructions, fix one_cycle scheduler
2023-05-22 22:58:10 -04:00
Wing Lian
b029a11e65
Merge pull request #34 from OpenAccess-AI-Collective/dev-unstable
...
lots of various improvements
2023-05-22 12:14:56 -04:00
Wing Lian
e3df3a9f5d
cuda/pytorch matrix builds
2023-05-22 12:14:21 -04:00
Wing Lian
f950a881e1
cuda, pytorch matrix for base builds
2023-05-22 12:12:08 -04:00
Wing Lian
de6da13e19
don't need to set here
2023-05-22 12:12:01 -04:00
Wing Lian
9493b1b137
be able to use adam bnb 8bit and one cycle scheduler w fsdp
2023-05-22 09:00:49 -04:00
Wing Lian
1b3e401241
Update src/axolotl/utils/models.py for info msg
...
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2023-05-21 23:01:35 -04:00
Wing Lian
3457810988
Update scripts/finetune.py
...
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2023-05-21 23:00:28 -04:00
Wing Lian
ae1719d30c
Update scripts/finetune.py for logging
...
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2023-05-21 23:00:23 -04:00
Wing Lian
98a6781f18
Update src/axolotl/utils/data.py for spelling
...
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2023-05-21 23:00:13 -04:00
Wing Lian
607a4d33f2
make sure to use train split if loading from hf
2023-05-21 22:04:39 -04:00
Wing Lian
99383f14a3
make one cycle lr div factor configurable
2023-05-21 20:25:06 -04:00
Wing Lian
0f74464652
fix new dataset prompt tokenizers
2023-05-21 18:57:09 -04:00
Wing Lian
e0602a9e54
add missing __init__
2023-05-21 16:36:41 -04:00
Wing Lian
2809f3f21b
pygmalion dataset prompts format, cached tokenized datasets should be hashed on the tokenizer too
2023-05-21 16:16:09 -04:00
Wing Lian
4ea9a66dbd
tokenization fixes
2023-05-21 08:33:06 -04:00
Wing Lian
ed37b2268d
Merge pull request #32 from NanoCode012/patch-2
...
Feat: Set `half` using `cfg.fp16` for 4bit
2023-05-20 18:21:02 -04:00
Wing Lian
1d5ab84486
optionally be able to specify alpaca or chat style prompts
2023-05-20 18:16:22 -04:00
NanoCode012
641f8012f9
Set half using cfg.fp16 for 4bit
2023-05-20 02:29:31 +09:00
Wing Lian
fa8bd14be4
update entrypoint and force min accelerate
2023-05-18 06:25:34 -04:00
Wing Lian
13650732f8
concise multiple choice and tldr summarize
2023-05-17 11:29:17 -04:00
Wing Lian
8c2f3cb0f8
support for replit lm
2023-05-17 08:49:03 -04:00
Wing Lian
b46bc02f0a
add alpaca multiple choice instruct dataset support
2023-05-16 21:45:34 -04:00
Wing Lian
e553c9080b
Merge pull request #29 from NanoCode012/patch-1
2023-05-16 07:12:06 -04:00
NanoCode012
2c73c81348
Add lora_modules_to_save
2023-05-16 19:22:00 +09:00
Wing Lian
f98e173b59
reorder options so debug can happen in the same prepare step
2023-05-15 22:26:30 -04:00
Wing Lian
5e37144754
fix prompters, especially the sharegpt prompter
2023-05-15 22:15:36 -04:00
Wing Lian
bdbca8fa6c
more fixes
2023-05-15 14:07:17 -04:00
Wing Lian
42410c783c
more fixes
2023-05-14 09:16:41 -04:00
Wing Lian
aef00b6c13
fix torch_dtype for model load
2023-05-14 08:44:22 -04:00
Wing Lian
0d28df0fd2
move filter to before saving so it doesn't happen everytime, update runpod manual script
2023-05-13 21:51:41 -04:00
Wing Lian
84c7bc4b68
whoops, gt vs lt
2023-05-12 14:03:25 -04:00
Wing Lian
aa3c3f97ae
optimize dataloading to use cache, fix model token embedding sizes
2023-05-12 13:53:27 -04:00
Wing Lian
f6d1fa4a85
Merge pull request #25 from NanoCode012/patch-2
...
Fix Trainer() got multiple values for keyword argument 'callbacks'
2023-05-11 09:20:15 -04:00
NanoCode012
89b7f26b9d
Merge branch 'main' into patch-2
2023-05-11 21:18:38 +09:00
Wing Lian
165da584b3
fix config for parity with previous change
...
5159d00a86 \#diff-65b4693504c4e8ffac76c7f2c90913faee381f802cf64e7f49c995a2134ed3b3R164
2023-05-11 08:13:09 -04:00
Wing Lian
4cc7ed8898
Merge pull request #27 from NanoCode012/patch-1
...
Fix save typo
2023-05-11 07:27:31 -04:00
NanoCode012
52aada7174
Fix typo
2023-05-11 20:22:30 +09:00
Wing Lian
688c73a81e
Merge pull request #26 from OpenAccess-AI-Collective/mpt-triton
...
Mpt triton
2023-05-10 16:02:05 -04:00
Wing Lian
2bc1a5bde1
black formatting
2023-05-10 16:01:08 -04:00
Wing Lian
7a490a4646
various fixes
2023-05-10 16:00:09 -04:00
NanoCode012
813aab378f
Fix Trainer() got multiple values for keyword argument 'callbacks'
2023-05-10 18:28:28 +09:00
Wing Lian
e2e68c3965
testing mpt triton
2023-05-09 20:57:40 -04:00
Wing Lian
a27d594788
fix conditional so alpaca doesn't choke
2023-05-09 20:57:07 -04:00
Wing Lian
1fb0376150
Merge pull request #23 from NanoCode012/patch-1
...
Fix: Save adapter for lora
2023-05-09 15:05:58 -04:00
Wing Lian
915c56cd97
Update finetune.py
2023-05-09 15:05:39 -04:00
Wing Lian
df9c5085b5
not everyone has bf16 available
2023-05-09 14:47:48 -04:00
Wing Lian
7967cd1039
add 4bit lora 7b
2023-05-09 14:38:32 -04:00
NanoCode012
cd2395987e
Don't save full model for lora
2023-05-10 03:18:38 +09:00
NanoCode012
71a1f7f38c
Save adapter for lora
2023-05-10 01:08:22 +09:00
Wing Lian
02c59832a3
push up redpajama 3b example
2023-05-08 19:19:18 -04:00
Wing Lian
3f9c9530ea
Merge pull request #15 from NanoCode012/feat/completion
...
Feat: Add Completion dataset type
2023-05-08 19:04:54 -04:00
NanoCode012
174b74ddc9
Rename variable to use same convention
2023-05-09 02:49:44 +09:00
NanoCode012
cf681537ec
Add CompletionPrompt type
2023-05-09 02:49:44 +09:00
Wing Lian
bd3c5a5cb3
Merge pull request #21 from NanoCode012/patch-1
...
Fix: Scheduler and optimizer condition
2023-05-08 13:34:44 -04:00
Wing Lian
bcbc99e655
Merge pull request #19 from NanoCode012/feat/callback-save-lora
...
Feat: Add callback save peft_model on_save
2023-05-08 13:34:07 -04:00
Wing Lian
b0d2594de9
Merge pull request #22 from NanoCode012/patch-2
...
Fix BNB OOM by pinning version
2023-05-08 13:33:52 -04:00
NanoCode012
fe582df7d3
Fix BNB OOM by pinning version
2023-05-09 02:10:31 +09:00
NanoCode012
36aaea02b9
Update trainer.py
2023-05-09 02:01:08 +09:00
NanoCode012
5b6690ac25
Fix condition scheduler
2023-05-09 01:44:12 +09:00
Wing Lian
a125693122
add support for trust_remote_code for mpt models
2023-05-08 12:07:27 -04:00
Wing Lian
709be5af81
use printf instead of echo in dockerfile for portability
2023-05-08 11:45:38 -04:00
NanoCode012
cc77bab526
Add callbacks to Trainer
2023-05-09 00:41:19 +09:00
NanoCode012
0d6708bfe4
Add callback save peft_model on_save
2023-05-09 00:38:27 +09:00
Wing Lian
807cca81c0
fix path name to sorkspace
2023-05-08 11:20:03 -04:00
Wing Lian
79deb35c68
setup runpod images
...
use github.ref_name
2023-05-08 10:48:32 -04:00
Wing Lian
7576d85c73
fix to cd to path in docker
2023-05-08 03:43:46 -04:00
Wing Lian
3b4b476828
use existing state of repo to build, not the checkout
2023-05-08 03:29:48 -04:00
Wing Lian
b5fe063687
fix base for dockerfile
2023-05-08 03:27:10 -04:00
Wing Lian
a12fb0a8da
Jeopardy bot! ( #17 )
...
* support for jeopardy dataset
* commit the final config for jeopardy bot
2023-05-08 03:21:40 -04:00
Wing Lian
a4329b1068
fix #16 load best model setting when using 8bit
2023-05-07 18:30:48 -04:00
Wing Lian
550502b321
use micro batch size for eval size if not specified
2023-05-07 18:26:05 -04:00
Wing Lian
fae36c7111
blah, wrong base tag
2023-05-07 17:54:26 -04:00
Wing Lian
a31746baa2
whoops, build from base image
2023-05-07 17:47:54 -04:00
Wing Lian
17345c8a4b
hanging slash typo
2023-05-07 17:38:56 -04:00
Wing Lian
9cd5d3fcfc
build on self hosted GPU runners
2023-05-07 17:25:31 -04:00
Wing Lian
990bec63e6
docker layer caching, build w axolotl from base build
2023-05-07 17:16:05 -04:00
Wing Lian
0c46806ae2
typo in git repo for pip
2023-05-07 16:00:21 -04:00
Wing Lian
66fa751c18
add huggingface packages and awscli
2023-05-07 11:51:57 -04:00
Wing Lian
21b74397de
fix typo and add apex
2023-05-07 11:48:47 -04:00
Wing Lian
3f11b47488
needs libaio-dev from apt
2023-05-07 11:23:43 -04:00
Wing Lian
ece46b2504
pip install packaging dep
2023-05-07 11:09:03 -04:00
Wing Lian
92d800a394
build dependencies and aws-cli
2023-05-07 11:02:26 -04:00
Wing Lian
2734e3f1a2
build base separately
...
fix arg order for image
fix dockerfile var excaping
move args around
2023-05-07 10:56:12 -04:00
Wing Lian
14ebd2e007
build base too
2023-05-07 09:48:41 -04:00
Wing Lian
4a79dabff0
fix push to docker hub
2023-05-07 08:52:49 -04:00
Wing Lian
47ad3890bc
fix whitespace and instruction on inference
2023-05-07 08:28:15 -04:00
Wing Lian
76b24bca2e
push to docker hub
...
set docker image name
2023-05-07 08:06:50 -04:00
Wing Lian
73450d9de7
TORCH_CUDA_ARCH_LIST should be an ARG
2023-05-07 07:28:57 -04:00
Wing Lian
97cf77891e
run this on self hosted runner for now
...
fix typo
fixes to docker build
need pip wheel
don't duplicate pip install
2023-05-07 07:21:25 -04:00
Wing Lian
e2599edab9
runs on larger git runner?
2023-05-07 04:12:47 -04:00
Wing Lian
75bc8561c0
don't push the image
2023-05-07 03:39:05 -04:00
Wing Lian
15bdbae805
run on git commit
2023-05-07 03:37:59 -04:00
Wing Lian
6603b3744e
try docker build on gitlab
...
require docker in gitlab
use kaniko to build docker in gitlab
2023-05-07 03:21:08 -04:00
Wing Lian
2634689774
build dockerfile in gha
2023-05-07 02:58:21 -04:00
Wing Lian
4818380fa6
update stablelm config
2023-05-07 01:58:23 -04:00
Wing Lian
247825bd57
refactor inference, warn if model is frozen
2023-05-07 01:54:15 -04:00
Wing Lian
cb9a887047
Merge pull request #13 from winglian/dev
...
merge dev branch for various fixes
2023-05-07 01:48:02 -04:00
Wing Lian
a15d823b29
Merge pull request #12 from NanoCode012/feat/eval_config
...
Add eval_batch_size for evaluation
2023-05-07 01:46:53 -04:00
NanoCode012
0e74b6402e
Add eval_batch_size for evaluation
2023-05-06 22:21:24 +09:00
Wing Lian
a10a8265ef
fix log sweep lr
2023-05-03 15:06:03 -04:00
Wing Lian
9105935b00
support for multi line inference input, log sweep over learning rates
2023-05-03 13:48:54 -04:00
Wing Lian
7748f3d6da
fix adam bnb optimizer grouped parameters, fix peft model 8bit conversion logic, black formatting
2023-05-01 16:31:46 -04:00
Wing Lian
fe9c29d73e
install peft from main branch
2023-05-01 12:24:04 -04:00
Wing Lian
2255bb7f4f
support llama-adapter zero init attention
2023-05-01 10:42:21 -04:00
Wing Lian
55baef0e03
use prebuilt wheels for flash-attn and deepspeed
2023-05-01 09:52:03 -04:00
Wing Lian
ad2b48c0fa
fdsp config dict fix, todo list, add torchdistx support
2023-04-30 13:32:07 -04:00
Wing Lian
9190ada23a
8bit and deepspeed changes
2023-04-30 06:50:35 -04:00
Wing Lian
4dbef0941f
update ds_config
2023-04-30 04:24:58 -04:00
Wing Lian
6dfdd2dec0
don't load models in 8bit unless they are using an adapter, also fix tokenizer load in exceptional case
2023-04-30 03:19:56 -04:00
Wing Lian
29936bba7f
fix fsdp training args
2023-04-30 00:56:28 -04:00
Wing Lian
78821815de
fix for zero value warmup steps
2023-04-30 00:34:12 -04:00
Wing Lian
5159d00a86
fix sharegpt tokenization, refactor tokenization debugging
2023-04-30 00:23:53 -04:00
Wing Lian
c0f50d9c61
wire up gradient checkpointing for 4bit
2023-04-28 22:28:41 -04:00
Wing Lian
4e705eda6d
Merge pull request #9 from winglian/dev
...
feature dump into main
2023-04-24 21:56:17 -04:00
Wing Lian
4a17a4c9a1
fix dataset handling, support galactica
2023-04-24 10:54:45 -04:00
Wing Lian
097d367af6
tweaks to data loading, 8 bit adam, accelerate and deepspeed
2023-04-24 09:41:35 -04:00
Wing Lian
4f2584f2dc
shuffle and split dataset after save/load
2023-04-24 09:41:35 -04:00
Wing Lian
8d437853c8
fix sharegpt handling from hf, don't worry about loading llama if using earlier transformers release
2023-04-24 09:41:35 -04:00
Wing Lian
8e2a5609b3
stablelm support
2023-04-24 09:41:34 -04:00
Wing Lian
94f5e415a3
various bugfixes
2023-04-24 09:41:34 -04:00
Eric Hartford
2624bc2f11
ignore config, add python 3.9 ( #8 )
2023-04-24 07:23:19 -04:00
Wing Lian
bb991fd870
fix bug when model_type not explicitly passed
2023-04-19 13:15:33 -04:00
Wing Lian
d65385912e
improve inference
2023-04-19 12:57:27 -04:00
Wing Lian
5749eb0a1c
fix runpod script
2023-04-19 08:39:54 -04:00
Wing Lian
7753cdee57
cleanup empty lines, tweak env for runpod setup
2023-04-19 08:24:58 -04:00
Wing Lian
f50de1b1cb
handle empty lines
2023-04-19 08:03:34 -04:00
Wing Lian
0a472e1e08
quickstart instructions for starting from runpod ( #5 )
2023-04-18 19:22:25 -04:00
Wing Lian
5cb7ea49a6
update readme w compat matrix
2023-04-18 14:42:37 -04:00
Wing Lian
8746b701fe
attempt xformers hijack attention
2023-04-18 14:03:50 -04:00
Wing Lian
6045345d6b
WIP large refactor to make finetune script a little more manageable ( #3 )
2023-04-18 14:01:38 -04:00