axolotl

Author	SHA1	Message	Date
pandora	31591bd94c	Fixing Validation - Mistral Templates (#1962 )	2024-10-12 21:40:39 -04:00
Wing Lian	09bf1ceacc	update hf deps (#1964 ) * update hf deps * remove deprecated set_caching_enabled	2024-10-12 18:19:48 -04:00
Afrizal Hasbi Azizy	df359c8a6e	Handle image input as string paths for MMLMs (#1958 ) * Update mm_chat.py Handle string image (paths) * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-10-11 13:34:13 -04:00
Wing Lian	76883851d2	add warning that sharegpt will be deprecated (#1957 ) * add warning that sharegpt will be deprecated * add helper script for chat_templates and document deprecation * Update src/axolotl/prompt_strategies/sharegpt.py Co-authored-by: NanoCode012 <nano@axolotl.ai> --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2024-10-11 13:33:20 -04:00
Adam Hazell	922db77521	Add MLFlow run name option in config (#1961 ) Co-authored-by: Adam Hazell <adam.hazell@mindfoundry.ai>	2024-10-11 13:33:06 -04:00
Thomas Cleberg	e73b8dff8d	Add Support for `revision` Dataset Parameter to specify reading from Huggingface Dataset Revision (#1912 ) * Add support for `revision` dataset parameter * only use revision on hf hub backed datasets * use revision tied to head * set download to use revision * feat: add config to model validator class * feat: add revision config to RL and tests for it --------- Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: NanoCode012 <nano@axolotl.ai>	2024-10-11 13:32:50 -04:00
Wing Lian	2fbc6b0c64	Axo logo new (#1956 ) * update axolotl ascii art * spacing for logo * cleanup dithering * cleanup ascii logo a bit	2024-10-10 15:57:37 -04:00
Wing Lian	8159cbd1ab	lm_eval harness post train (#1926 ) * wip, lm_eval harness post train * include latex parser * add dtype and doc * add validation when doing bench evals * automatically add test dataset when doing benches	2024-10-10 15:04:17 -04:00
pandora	979534c851	add mistral templates (#1927 ) Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-10-10 09:22:53 -04:00
Boris Feld	6d3caadf90	Comet integration (#1939 ) * Add first version of a Comet integration * Remove debug prints * Add test for Comet Configuration transformation to env variables * Fix last lint warning * Update Readme for Comet logging documentation * Update Comet integration to be optional, update code and tests * Add documentation for Comet configuration * Add missing check	2024-10-09 16:03:37 -04:00
aarush gupta	dee77232fe	fix type annotations (#1941 ) [skip ci]	2024-10-09 16:03:16 -04:00
NanoCode012	a560593b1d	fix(log): update perplexity log to clarify from eval split (#1952 ) [skip ci]	2024-10-09 16:02:32 -04:00
Wing Lian	e1915f5625	Multimodal Vision Llama - rudimentary support (#1940 ) --------- Co-authored-by: Sunny <sunny@Sunnys-MacBook-Air.local> Co-authored-by: sunny <sunnyliu19981005@gmail.com>	2024-10-02 21:02:48 -04:00
Wing Lian	61aa291119	fix for empty lora+ lr embedding (#1932 )	2024-09-27 15:58:35 -04:00
Wing Lian	b98d7d7098	update upstream deps versions and replace lora+ (#1928 ) * update upstream deps versions and replace lora+ * typo transformers version	2024-09-26 11:33:41 -04:00
Wing Lian	d7eea2ff34	validation fixes 20240923 (#1925 ) * validation fixes 20240923 * fix run name for wandb and defaults for chat template fields * fix gradio inference with llama chat template	2024-09-24 14:05:58 -04:00
Keith Stevens	7b9f669a3a	Trigger the original tokenization behavior when no advanced turn settings are provided (#1915 )	2024-09-14 08:22:54 -04:00
Wing Lian	5c42f11411	remove dynamic module loader monkeypatch as this was fixed upstream (#1914 )	2024-09-13 22:19:54 -04:00
Wing Lian	6e354682e3	fix zero3 integration (#1897 ) * fix zero3 integration * bump transformers and accelerate too	2024-09-05 10:58:50 -04:00
Wing Lian	4e5400c732	support for auto_find_batch_size when packing (#1885 ) * support for auto_find_batch_size when packing * make sure to return data from validation * make sure to return data from validation * actually expose multipack_real_batches in the config * calculate gathered efficiency in sampler * tweak to fix auto find and use actual sampler len for multipack * uncomment * use args for bsz when not available from auto find	2024-09-03 20:02:44 -04:00
Chiwan Park	bdab3ec587	Fix RMSNorm monkey patch for Gemma models (#1886 )	2024-09-01 18:34:24 -04:00
DocShotgun	15408d0f09	Update supported models for Liger Kernel (#1875 ) * Update supported models for Liger Kernel Add Mistral LCE, Gemma LCE, Gemma 2 without LCE (softcapping is not yet implemented for Gemma in Liger Kernel LCE forward), Phi3 without LCE * move import to their appropriate conditions * Integrate Phi3 LCE support https://github.com/linkedin/Liger-Kernel/pull/103/ --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-08-31 21:59:48 -04:00
Aman Gupta Karmani	7037e3c836	deepseekv2 liger support (#1878 ) * deepseekv2 liger support * add comment * add missing impl	2024-08-27 23:52:40 -04:00
Aman Gupta Karmani	c1a61ae23c	fix liger plugin load issues (#1876 )	2024-08-27 23:08:26 -04:00
Aman Gupta Karmani	159b8b9a74	monkey-patch transformers to simplify monkey-patching modeling code (#1877 ) * monkey-patch transformers so that monkey-patched modeling code doesnt get overwritten * unnecessary now * add comment	2024-08-27 17:22:26 -07:00
Wing Lian	1e43660701	Sample pack trust remote code v2 (#1873 ) * fix the multipack patch for remote code models * add deepseek v2 lite example w fsdp	2024-08-27 13:39:24 -04:00
Chiwan Park	f6362d2a05	Add Liger Kernal support for Qwen2 (#1871 )	2024-08-27 13:03:16 -04:00
Wing Lian	17af1d7081	clear cuda cache to help with memory leak/creep (#1858 ) * clear cuda cache to help with memory leak/creep * reverse order of gc	2024-08-26 15:50:26 -04:00
Chiwan Park	2dac1edf72	Fix `drop_long_seq` bug due to truncation in prompt tokenization strategies when using `chat_template` (#1867 )	2024-08-26 12:56:12 -04:00
Wing Lian	6819c12cee	update specturm authors (#1869 )	2024-08-26 12:00:36 -04:00
Wing Lian	8e29bdefdd	Spectrum plugin (#1866 )	2024-08-25 17:54:02 -04:00
Wing Lian	f245964f22	better handling of llama-3 tool rolw (#1782 )	2024-08-25 12:31:40 -04:00
Wing Lian	22f4eafa55	simplify logic (#1856 )	2024-08-23 20:23:08 -04:00
Wing Lian	77a4b9cda2	change up import to prevent AttributeError (#1863 ) * change up import to prevent AttributeError * tweak patching check for updated upstream	2024-08-23 17:00:01 -04:00
Wing Lian	1f686c576c	Liger Kernel integration (#1861 ) * add initial plugin support w Liger kernel patches * integrate the input args classes * fix liger plugin and dynamic configuration class * drop untrainable samples and refactor config plugins integration * fix incorrect inputs and circular imports * fix bool comparison * fix for dropping untraibable tokens * fix licensing so liger integration is Apache 2.0 * add jamba support * pylint ignore	2024-08-23 12:21:51 -04:00
Wing Lian	328fd4b3b7	add axolotl community license (#1862 )	2024-08-23 11:40:21 -04:00
Wing Lian	fefa95e350	most model types now support flash attention 2 regardless of multipack support (#1854 )	2024-08-22 16:39:23 -04:00
Wing Lian	2f8037fee6	ensure that the hftrainer deepspeed config is set before the trainer class is ever init'ed (#1850 ) [skip ci]	2024-08-22 13:10:40 -04:00
JohanWork	7ed92e61c2	fix: prompt phi (#1845 ) [skip ci] * corecting phi system prompt * phi test * update * add test	2024-08-22 11:46:57 -04:00
Wing Lian	9caa3eb699	make the train_on_eos default to turn so all eos tokens are treated the same (#1847 ) [skip ci]	2024-08-22 11:45:37 -04:00
Wing Lian	5b0b774e38	ensure that the bias is also in the correct dtype (#1848 ) [skip ci] * ensure that the bias is also in the correct dtype * add nightly for dpo-qlora-fsdp	2024-08-22 11:45:00 -04:00
Gal Cohen (galco)	9f917245f6	feat: add jamba chat_template (#1843 ) * feat: add jamba chat_template * fix: black * feat: jamba fsdp+qlora --------- Co-authored-by: Gal Cohen <galc@ai21.com>	2024-08-21 13:37:17 -04:00
Aman Gupta Karmani	649c19aba3	pretrain: fix with sample_packing=false (#1841 )	2024-08-21 13:36:51 -04:00
Gal Cohen (galco)	5aac4bc284	fix: dont change quant storage dtype in case of fsdp (#1837 ) * fix: dont change quant storage dtype in case of fsdp * fix black --------- Co-authored-by: Gal Cohen <galc@ai21.com>	2024-08-20 12:41:48 -04:00
Wing Lian	e29931259b	optionally save the final FSDP model as a sharded state dict (#1828 ) * efficiently save very large llms when using FSDP * fix parsing and index of sharded chunks * only save fsdp on main process * debugging for rename * save sharded state dict * remove unused new param * get state dict directly * tweak acc merge fsdp to shard the weight files * sharded_state_dict alongside save_safetensors seems to hang on checkpoint save	2024-08-19 14:59:24 -04:00
Wing Lian	b1d2921222	add validation to prevent 8bit lora finetuning on H100s (#1827 )	2024-08-16 21:32:00 -04:00
Wing Lian	803fed3e90	update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model (#1821 ) * update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model * There is already a condition check within the function. This outer one is not necessary Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2024-08-16 10:41:51 -04:00
NanoCode012	68a3c7678a	fix: parse model_kwargs (#1825 )	2024-08-16 07:51:19 -04:00
NanoCode012	f18925fb4b	fix: parse eager_attention (#1824 )	2024-08-14 09:46:46 -04:00
Chiwan Park	0801f239cc	fix the incorrect `max_length` for chat template (#1818 )	2024-08-09 11:50:31 -04:00

1 2 3 4 5 ...

922 Commits