Wing Lian
8159cbd1ab
lm_eval harness post train ( #1926 )
...
* wip, lm_eval harness post train
* include latex parser
* add dtype and doc
* add validation when doing bench evals
* automatically add test dataset when doing benches
2024-10-10 15:04:17 -04:00
pandora
979534c851
add mistral templates ( #1927 )
...
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-10-10 09:22:53 -04:00
Boris Feld
6d3caadf90
Comet integration ( #1939 )
...
* Add first version of a Comet integration
* Remove debug prints
* Add test for Comet Configuration transformation to env variables
* Fix last lint warning
* Update Readme for Comet logging documentation
* Update Comet integration to be optional, update code and tests
* Add documentation for Comet configuration
* Add missing check
2024-10-09 16:03:37 -04:00
aarush gupta
dee77232fe
fix type annotations ( #1941 ) [skip ci]
2024-10-09 16:03:16 -04:00
NanoCode012
a560593b1d
fix(log): update perplexity log to clarify from eval split ( #1952 ) [skip ci]
2024-10-09 16:02:32 -04:00
Wing Lian
e1915f5625
Multimodal Vision Llama - rudimentary support ( #1940 )
...
---------
Co-authored-by: Sunny <sunny@Sunnys-MacBook-Air.local >
Co-authored-by: sunny <sunnyliu19981005@gmail.com >
2024-10-02 21:02:48 -04:00
Wing Lian
61aa291119
fix for empty lora+ lr embedding ( #1932 )
2024-09-27 15:58:35 -04:00
Wing Lian
b98d7d7098
update upstream deps versions and replace lora+ ( #1928 )
...
* update upstream deps versions and replace lora+
* typo transformers version
2024-09-26 11:33:41 -04:00
Wing Lian
d7eea2ff34
validation fixes 20240923 ( #1925 )
...
* validation fixes 20240923
* fix run name for wandb and defaults for chat template fields
* fix gradio inference with llama chat template
2024-09-24 14:05:58 -04:00
Keith Stevens
7b9f669a3a
Trigger the original tokenization behavior when no advanced turn settings are provided ( #1915 )
2024-09-14 08:22:54 -04:00
Wing Lian
5c42f11411
remove dynamic module loader monkeypatch as this was fixed upstream ( #1914 )
2024-09-13 22:19:54 -04:00
Wing Lian
6e354682e3
fix zero3 integration ( #1897 )
...
* fix zero3 integration
* bump transformers and accelerate too
2024-09-05 10:58:50 -04:00
Wing Lian
4e5400c732
support for auto_find_batch_size when packing ( #1885 )
...
* support for auto_find_batch_size when packing
* make sure to return data from validation
* make sure to return data from validation
* actually expose multipack_real_batches in the config
* calculate gathered efficiency in sampler
* tweak to fix auto find and use actual sampler len for multipack
* uncomment
* use args for bsz when not available from auto find
2024-09-03 20:02:44 -04:00
Chiwan Park
bdab3ec587
Fix RMSNorm monkey patch for Gemma models ( #1886 )
2024-09-01 18:34:24 -04:00
DocShotgun
15408d0f09
Update supported models for Liger Kernel ( #1875 )
...
* Update supported models for Liger Kernel
Add Mistral LCE, Gemma LCE, Gemma 2 without LCE (softcapping is not yet implemented for Gemma in Liger Kernel LCE forward), Phi3 without LCE
* move import to their appropriate conditions
* Integrate Phi3 LCE support
https://github.com/linkedin/Liger-Kernel/pull/103/
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2024-08-31 21:59:48 -04:00
Aman Gupta Karmani
7037e3c836
deepseekv2 liger support ( #1878 )
...
* deepseekv2 liger support
* add comment
* add missing impl
2024-08-27 23:52:40 -04:00
Aman Gupta Karmani
c1a61ae23c
fix liger plugin load issues ( #1876 )
2024-08-27 23:08:26 -04:00
Aman Gupta Karmani
159b8b9a74
monkey-patch transformers to simplify monkey-patching modeling code ( #1877 )
...
* monkey-patch transformers so that monkey-patched modeling code doesnt get overwritten
* unnecessary now
* add comment
2024-08-27 17:22:26 -07:00
Wing Lian
1e43660701
Sample pack trust remote code v2 ( #1873 )
...
* fix the multipack patch for remote code models
* add deepseek v2 lite example w fsdp
2024-08-27 13:39:24 -04:00
Chiwan Park
f6362d2a05
Add Liger Kernal support for Qwen2 ( #1871 )
2024-08-27 13:03:16 -04:00
Wing Lian
17af1d7081
clear cuda cache to help with memory leak/creep ( #1858 )
...
* clear cuda cache to help with memory leak/creep
* reverse order of gc
2024-08-26 15:50:26 -04:00
Chiwan Park
2dac1edf72
Fix drop_long_seq bug due to truncation in prompt tokenization strategies when using chat_template ( #1867 )
2024-08-26 12:56:12 -04:00
Wing Lian
6819c12cee
update specturm authors ( #1869 )
2024-08-26 12:00:36 -04:00
Wing Lian
8e29bdefdd
Spectrum plugin ( #1866 )
2024-08-25 17:54:02 -04:00
Wing Lian
f245964f22
better handling of llama-3 tool rolw ( #1782 )
2024-08-25 12:31:40 -04:00
Wing Lian
22f4eafa55
simplify logic ( #1856 )
2024-08-23 20:23:08 -04:00
Wing Lian
77a4b9cda2
change up import to prevent AttributeError ( #1863 )
...
* change up import to prevent AttributeError
* tweak patching check for updated upstream
2024-08-23 17:00:01 -04:00
Wing Lian
1f686c576c
Liger Kernel integration ( #1861 )
...
* add initial plugin support w Liger kernel patches
* integrate the input args classes
* fix liger plugin and dynamic configuration class
* drop untrainable samples and refactor config plugins integration
* fix incorrect inputs and circular imports
* fix bool comparison
* fix for dropping untraibable tokens
* fix licensing so liger integration is Apache 2.0
* add jamba support
* pylint ignore
2024-08-23 12:21:51 -04:00
Wing Lian
328fd4b3b7
add axolotl community license ( #1862 )
2024-08-23 11:40:21 -04:00
Wing Lian
fefa95e350
most model types now support flash attention 2 regardless of multipack support ( #1854 )
2024-08-22 16:39:23 -04:00
Wing Lian
2f8037fee6
ensure that the hftrainer deepspeed config is set before the trainer class is ever init'ed ( #1850 ) [skip ci]
2024-08-22 13:10:40 -04:00
JohanWork
7ed92e61c2
fix: prompt phi ( #1845 ) [skip ci]
...
* corecting phi system prompt
* phi test
* update
* add test
2024-08-22 11:46:57 -04:00
Wing Lian
9caa3eb699
make the train_on_eos default to turn so all eos tokens are treated the same ( #1847 ) [skip ci]
2024-08-22 11:45:37 -04:00
Wing Lian
5b0b774e38
ensure that the bias is also in the correct dtype ( #1848 ) [skip ci]
...
* ensure that the bias is also in the correct dtype
* add nightly for dpo-qlora-fsdp
2024-08-22 11:45:00 -04:00
Gal Cohen (galco)
9f917245f6
feat: add jamba chat_template ( #1843 )
...
* feat: add jamba chat_template
* fix: black
* feat: jamba fsdp+qlora
---------
Co-authored-by: Gal Cohen <galc@ai21.com >
2024-08-21 13:37:17 -04:00
Aman Gupta Karmani
649c19aba3
pretrain: fix with sample_packing=false ( #1841 )
2024-08-21 13:36:51 -04:00
Gal Cohen (galco)
5aac4bc284
fix: dont change quant storage dtype in case of fsdp ( #1837 )
...
* fix: dont change quant storage dtype in case of fsdp
* fix black
---------
Co-authored-by: Gal Cohen <galc@ai21.com >
2024-08-20 12:41:48 -04:00
Wing Lian
e29931259b
optionally save the final FSDP model as a sharded state dict ( #1828 )
...
* efficiently save very large llms when using FSDP
* fix parsing and index of sharded chunks
* only save fsdp on main process
* debugging for rename
* save sharded state dict
* remove unused new param
* get state dict directly
* tweak acc merge fsdp to shard the weight files
* sharded_state_dict alongside save_safetensors seems to hang on checkpoint save
2024-08-19 14:59:24 -04:00
Wing Lian
b1d2921222
add validation to prevent 8bit lora finetuning on H100s ( #1827 )
2024-08-16 21:32:00 -04:00
Wing Lian
803fed3e90
update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model ( #1821 )
...
* update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model
* There is already a condition check within the function. This outer one is not necessary
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
2024-08-16 10:41:51 -04:00
NanoCode012
68a3c7678a
fix: parse model_kwargs ( #1825 )
2024-08-16 07:51:19 -04:00
NanoCode012
f18925fb4b
fix: parse eager_attention ( #1824 )
2024-08-14 09:46:46 -04:00
Chiwan Park
0801f239cc
fix the incorrect max_length for chat template ( #1818 )
2024-08-09 11:50:31 -04:00
Wing Lian
5ee4b7325f
fix z3 leaf configuration when not using lists ( #1817 ) [skip ci]
2024-08-09 10:54:52 -04:00
Wing Lian
c56e0a79a5
logging improvements ( #1808 ) [skip ci]
...
* logging improvements
* fix sort
2024-08-06 10:31:50 -04:00
Wing Lian
35d5e59d78
set z3 leaf for deepseek v2 ( #1809 ) [skip ci]
...
* set z3 leaf for deepseek v2
* add deepseek v2 chat template
2024-08-06 09:30:46 -04:00
Wing Lian
fbbeb4fee0
remove un-necessary zero-first guard as it's already only called in a parent fn ( #1810 ) [skip ci]
2024-08-06 09:29:23 -04:00
Wing Lian
ecdda006de
One cycle lr ( #1803 )
...
* refactor one_cycle lr scheduler so it's reusable in more situations
* fix validation for lr_scheduler
* default to cosine anneal strategy
* one cycle lr exepects cos
2024-08-05 13:12:05 -04:00
ripes
7402eb9dcb
Fix setting correct repo id when pushing dataset to hub ( #1657 )
...
* use the ds hash as the dataset's config_name
* improve logging for loading/pushing ds to hub
* fix missing f string
2024-08-05 12:42:15 -04:00
Wing Lian
78b42a3fe1
fix roles to train defaults and make logging less verbose ( #1801 )
2024-07-30 20:58:17 -04:00