axolotl

Author	SHA1	Message	Date
Wing Lian	ec4272c3a0	add ds zero3 to multigpu biweekly tests (#1900 ) * add ds zero3 to multigpu biweekly tests * fix for upstream api change * use updated accelerate and fix deepspeed tests * stringify the Path, and run multigpu tests if the multigpu tests change for a PR * use correct json rather than yaml * revert accelerate for deepspeed	2024-10-13 17:34:37 -04:00
Wing Lian	68b1369de9	Reward model (#1879 )	2024-10-13 15:11:13 -04:00
Wing Lian	cd2d89f467	wip add new proposed message structure (#1904 ) * wip add new proposed message structure * tokenization * wip * wip transform builder * wip make the chat dataset loadable * wip chatml + llama 3 new chat objects * chore: lint * chore: lint * fix tokenization * remove dacite dependency since we're using pydantic now * fix handling when already correctly split in messages * make sure to remove chat features from tokenized ds * move chat to be a input transform for messages * make sure llama3 has the bos token * remove non-working special token code * fix messages strat loader	2024-10-13 12:15:18 -04:00
Vincent Haines	1834cdc364	Add support for qwen 2.5 chat template (#1934 )	2024-10-12 21:41:43 -04:00
NanoCode012	ac128b7b1d	fix: update eval causal lm metrics to add perplexity (#1951 ) [skip ci]	2024-10-12 21:41:13 -04:00
pandora	31591bd94c	Fixing Validation - Mistral Templates (#1962 )	2024-10-12 21:40:39 -04:00
Wing Lian	d20b48a61e	only install torchao for torch versions >= 2.4.0 (#1963 )	2024-10-12 20:53:48 -04:00
Wing Lian	09bf1ceacc	update hf deps (#1964 ) * update hf deps * remove deprecated set_caching_enabled	2024-10-12 18:19:48 -04:00
Afrizal Hasbi Azizy	df359c8a6e	Handle image input as string paths for MMLMs (#1958 ) * Update mm_chat.py Handle string image (paths) * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-10-11 13:34:13 -04:00
Wing Lian	76883851d2	add warning that sharegpt will be deprecated (#1957 ) * add warning that sharegpt will be deprecated * add helper script for chat_templates and document deprecation * Update src/axolotl/prompt_strategies/sharegpt.py Co-authored-by: NanoCode012 <nano@axolotl.ai> --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2024-10-11 13:33:20 -04:00
Adam Hazell	922db77521	Add MLFlow run name option in config (#1961 ) Co-authored-by: Adam Hazell <adam.hazell@mindfoundry.ai>	2024-10-11 13:33:06 -04:00
Thomas Cleberg	e73b8dff8d	Add Support for `revision` Dataset Parameter to specify reading from Huggingface Dataset Revision (#1912 ) * Add support for `revision` dataset parameter * only use revision on hf hub backed datasets * use revision tied to head * set download to use revision * feat: add config to model validator class * feat: add revision config to RL and tests for it --------- Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: NanoCode012 <nano@axolotl.ai>	2024-10-11 13:32:50 -04:00
Wing Lian	2fbc6b0c64	Axo logo new (#1956 ) * update axolotl ascii art * spacing for logo * cleanup dithering * cleanup ascii logo a bit	2024-10-10 15:57:37 -04:00
Wing Lian	8159cbd1ab	lm_eval harness post train (#1926 ) * wip, lm_eval harness post train * include latex parser * add dtype and doc * add validation when doing bench evals * automatically add test dataset when doing benches	2024-10-10 15:04:17 -04:00
pandora	979534c851	add mistral templates (#1927 ) Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-10-10 09:22:53 -04:00
Boris Feld	6d3caadf90	Comet integration (#1939 ) * Add first version of a Comet integration * Remove debug prints * Add test for Comet Configuration transformation to env variables * Fix last lint warning * Update Readme for Comet logging documentation * Update Comet integration to be optional, update code and tests * Add documentation for Comet configuration * Add missing check	2024-10-09 16:03:37 -04:00
aarush gupta	dee77232fe	fix type annotations (#1941 ) [skip ci]	2024-10-09 16:03:16 -04:00
NanoCode012	a560593b1d	fix(log): update perplexity log to clarify from eval split (#1952 ) [skip ci]	2024-10-09 16:02:32 -04:00
Wing Lian	e8d3da0081	upgrade pytorch from 2.4.0 => 2.4.1 (#1950 ) * upgrade pytorch from 2.4.0 => 2.4.1 * update xformers for updated pytorch version * handle xformers version case for torch==2.3.1	2024-10-09 11:53:56 -04:00
Wing Lian	4ca0a47cfb	add 2.4.1 to base models (#1953 )	2024-10-09 08:43:11 -04:00
Wing Lian	e1915f5625	Multimodal Vision Llama - rudimentary support (#1940 ) --------- Co-authored-by: Sunny <sunny@Sunnys-MacBook-Air.local> Co-authored-by: sunny <sunnyliu19981005@gmail.com>	2024-10-02 21:02:48 -04:00
Wing Lian	844331005c	bump transformers to 4.45.1 (#1936 )	2024-09-30 13:56:12 -04:00
Wing Lian	61aa291119	fix for empty lora+ lr embedding (#1932 )	2024-09-27 15:58:35 -04:00
Wing Lian	b98d7d7098	update upstream deps versions and replace lora+ (#1928 ) * update upstream deps versions and replace lora+ * typo transformers version	2024-09-26 11:33:41 -04:00
Wing Lian	d7eea2ff34	validation fixes 20240923 (#1925 ) * validation fixes 20240923 * fix run name for wandb and defaults for chat template fields * fix gradio inference with llama chat template	2024-09-24 14:05:58 -04:00
Keith Stevens	7b9f669a3a	Trigger the original tokenization behavior when no advanced turn settings are provided (#1915 )	2024-09-14 08:22:54 -04:00
Wing Lian	5c42f11411	remove dynamic module loader monkeypatch as this was fixed upstream (#1914 )	2024-09-13 22:19:54 -04:00
Wing Lian	3853ab7ae9	bump accelerate to 0.34.2 (#1901 ) * bump accelerate * add fixture to predownload the test model * change fixture	2024-09-07 14:39:31 -04:00
Wing Lian	6e354682e3	fix zero3 integration (#1897 ) * fix zero3 integration * bump transformers and accelerate too	2024-09-05 10:58:50 -04:00
Alpay Ariyak	ab461d83c4	Fix documentation for pre-tokenized dataset (#1894 ) It's currently asking to not add BOS and EOS, stating that Axolotl adds them, but this is not true	2024-09-05 23:11:31 +09:00
Wing Lian	93b769a979	lint fix and update gha regex (#1899 )	2024-09-05 09:58:21 -04:00
Tijmen de Haan	f18f4268b5	Docs for AMD-based HPC systems (#1891 ) * Add documentation for installing on AMD-based HPC systems. * Accept suggestion to add note about deepspeed Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * Update _quarto.yml with amd_hpc doc --------- Co-authored-by: Tijmen de Haan <tijmen.dehaan@gmail.com> Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2024-09-05 18:33:19 +09:00
Wing Lian	dca1fe47d4	fix optimizer + fsdp combination in example (#1893 )	2024-09-04 11:28:47 -04:00
Wing Lian	4e5400c732	support for auto_find_batch_size when packing (#1885 ) * support for auto_find_batch_size when packing * make sure to return data from validation * make sure to return data from validation * actually expose multipack_real_batches in the config * calculate gathered efficiency in sampler * tweak to fix auto find and use actual sampler len for multipack * uncomment * use args for bsz when not available from auto find	2024-09-03 20:02:44 -04:00
Wing Lian	0aeb277456	add e2e smoke tests for llama liger integration (#1884 ) * add e2e smoke tests for llama liger integration * fix import * don't use __main__ for test * consolidate line	2024-09-01 19:29:37 -04:00
Chiwan Park	bdab3ec587	Fix RMSNorm monkey patch for Gemma models (#1886 )	2024-09-01 18:34:24 -04:00
Wing Lian	3c6b9eda2e	run pytests with varied pytorch versions too (#1883 )	2024-08-31 22:49:35 -04:00
DocShotgun	15408d0f09	Update supported models for Liger Kernel (#1875 ) * Update supported models for Liger Kernel Add Mistral LCE, Gemma LCE, Gemma 2 without LCE (softcapping is not yet implemented for Gemma in Liger Kernel LCE forward), Phi3 without LCE * move import to their appropriate conditions * Integrate Phi3 LCE support https://github.com/linkedin/Liger-Kernel/pull/103/ --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-08-31 21:59:48 -04:00
Wing Lian	ce33e1ed83	pin liger-kernel to latest 0.2.1 (#1882 ) [skip ci]	2024-08-30 17:51:18 -04:00
Byron Hsu	e3a38450de	Add liger kernel to features (#1881 ) [skip ci]	2024-08-29 08:19:18 -04:00
Aman Gupta Karmani	7037e3c836	deepseekv2 liger support (#1878 ) * deepseekv2 liger support * add comment * add missing impl	2024-08-27 23:52:40 -04:00
Aman Gupta Karmani	c1a61ae23c	fix liger plugin load issues (#1876 )	2024-08-27 23:08:26 -04:00
Aman Gupta Karmani	159b8b9a74	monkey-patch transformers to simplify monkey-patching modeling code (#1877 ) * monkey-patch transformers so that monkey-patched modeling code doesnt get overwritten * unnecessary now * add comment	2024-08-27 17:22:26 -07:00
Wing Lian	1e43660701	Sample pack trust remote code v2 (#1873 ) * fix the multipack patch for remote code models * add deepseek v2 lite example w fsdp	2024-08-27 13:39:24 -04:00
Chiwan Park	f6362d2a05	Add Liger Kernal support for Qwen2 (#1871 )	2024-08-27 13:03:16 -04:00
Wing Lian	17af1d7081	clear cuda cache to help with memory leak/creep (#1858 ) * clear cuda cache to help with memory leak/creep * reverse order of gc	2024-08-26 15:50:26 -04:00
Chiwan Park	2dac1edf72	Fix `drop_long_seq` bug due to truncation in prompt tokenization strategies when using `chat_template` (#1867 )	2024-08-26 12:56:12 -04:00
Wing Lian	6819c12cee	update specturm authors (#1869 )	2024-08-26 12:00:36 -04:00
Wing Lian	8e29bdefdd	Spectrum plugin (#1866 )	2024-08-25 17:54:02 -04:00
Wing Lian	f245964f22	better handling of llama-3 tool rolw (#1782 )	2024-08-25 12:31:40 -04:00

1 2 3 4 5 ...

1626 Commits