axolotl

Author	SHA1	Message	Date
Sung Ching Liu	0ef1f011fe	Merge branch 'main' into flx_attn_support	2025-02-11 23:31:56 -05:00
Sung Ching Liu	44f64ab627	Update faq.qmd (#2319 ) * Update faq.qmd Added Q&A for being stuck on saving preprocessed datasets * Update faq.qmd added details on preprocessing on cpu * Update faq.qmd * Update faq.qmd	2025-02-11 13:18:31 -05:00
NanoCode012	826f1b1494	feat(doc): Add multi-node torchrun info (#2304 )	2025-02-08 06:02:02 -05:00
NanoCode012	526e5ee8b8	fix(config): missing config not being documented and fix model_ override (#2317 ) * fix(config): missing config not being documented and fix model_ space override * fix: delete redundant field	2025-02-08 06:01:48 -05:00
NanoCode012	fd8cb32547	chore: remove redundant py310 from tests (#2316 )	2025-02-07 21:34:16 -05:00
NanoCode012	e48e2df4dd	feat: update FA to 2.7.4.post1 which includes torch2.6 binary (#2315 )	2025-02-07 21:34:01 -05:00
Wing Lian	b7616022ab	bump transformers to 4.48.3 (#2318 )	2025-02-07 21:33:44 -05:00
Wing Lian	1faf1a5c5a	batch add of spectrum snr results (#2320 )	2025-02-07 21:33:14 -05:00
Sunny Liu	c0a1d205c7	packed doc mask starts at 1, 0 means masked out	2025-02-07 14:44:52 -05:00
NanoCode012	5bbad5ef93	feat: add torch2.6 to ci (#2311 )	2025-02-07 07:28:54 -05:00
Wing Lian	a971eb4ce6	Torch 2.6 support for base docker image (#2312 )	2025-02-05 09:24:02 -05:00
Sunny Liu	d0e739da24	attempt at getting around bf16 error	2025-02-04 21:57:21 -05:00
Sunny Liu	3f6be519d5	stack	2025-02-04 21:25:13 -05:00
Sunny Liu	adcbc7459b	misc	2025-02-04 21:17:50 -05:00
Sunny Liu	470ba65c44	make doc mask instead of the whole block mask in collator	2025-02-04 20:27:39 -05:00
NanoCode012	a620d481e2	fix: drop long seq even if not sample packing (#2211 ) * fix: drop long seq even if not sample packing * fix: logging import * fix: cfg passed being none * fix: try to fix logging * fix: refactor call to not use accelerate log * fix: try to fix circular import issue * fix: don't drop when skip prepare * chore: remove duplicate line * fix: update warning to mention that sequences will be trimmed * fix: do not drop seq if input_ids don't exist * fix: increase RM unittest sequence length to reduce trim warnings * fix: solve conflicts * fix: default min_seq_len in case of None	2025-02-04 09:43:35 -05:00
Sunny Liu	8e1adc154d	stuff	2025-02-02 20:36:14 -05:00
Sunny Liu	e5b36900e4	misc	2025-02-02 20:32:03 -05:00
Sunny Liu	9f6c89b12b	undo my stupidity	2025-02-02 20:25:53 -05:00
Sunny Liu	b0871c8d3b	attempt - mask padding	2025-02-02 20:18:49 -05:00
bursteratom	d3ea379a23	figure out slight diff from flash result	2025-02-02 01:45:54 -05:00
bursteratom	0ebab63309	test	2025-02-02 01:27:15 -05:00
bursteratom	e98581f6f5	BLOCK SIZE	2025-02-02 01:22:23 -05:00
bursteratom	b832b11c8f	stuff	2025-02-02 00:51:43 -05:00
bursteratom	b692d394b1	more test	2025-02-02 00:48:57 -05:00
bursteratom	2319e5276d	more test	2025-02-02 00:48:15 -05:00
bursteratom	9a43a0925d	more test	2025-02-02 00:45:30 -05:00
bursteratom	10de67e8ea	more test	2025-02-02 00:43:41 -05:00
bursteratom	fa7355404c	test	2025-02-02 00:38:35 -05:00
bursteratom	907424a2e8	stuff	2025-02-02 00:29:09 -05:00
Sunny Liu	3f4fd3c1eb	remove padding self attention	2025-02-01 22:47:10 -05:00
Wing Lian	158330ab60	[feature] sweeps (#2171 )	2025-02-01 21:11:18 -05:00
Wing Lian	80e1468b8d	better handling of multipack dataset length (#2296 )	2025-02-01 21:10:34 -05:00
Sunny Liu	48c3c47071	vanills mask	2025-02-01 14:23:37 -05:00
Sunny Liu	3ed9c117fb	try vanilla mask	2025-02-01 14:09:13 -05:00
Wing Lian	a20f17689b	set MODAL_IMAGE_BUILDER_VERSION=2024.10 to 2024.10 to test latest builder (#2302 ) * set MODAL_IMAGE_BUILDER_VERSION=2024.10 to 2024.10 to test latest builder * chore: lint * remove fastapi and pydantic extras	2025-01-31 20:19:20 -05:00
Wing Lian	78ce268848	KD Trainer w logprobs (#2303 ) * refactor trainer to prevent circular dependencies later fix loader default KD dataset loading and KD with logprobs filter bad rows make batch smaller handle padding/collation for KD datasets make it work flipped the slice cross entropy loss coefficient during KD make sure to multiply against the correct loss chore: lint triton wip no where support v2 trial no torch.exp inside triton kernel no log etc no torch.tensor v3 fix kwarg don't use triton for now better rescaling for temperatures hash for temperature too use kd_alpha in the correct loss method fix kd loss so it's causal (fixes repeating tokens) var naming and add todo chore: lint refactor so we can easily add new loss functions add license block remove references to triton kd for now handle token/logprob shifting support for custom trainer classes from plugins refactor kd chat template loader move more things to kd plugin remove moved class from import make plugin setup concise increase logging around loading plugins add copyrights remove duplicate code more info on preprocess for kd and fix import be a bit pickier about loading dynamic prompt strategies kd sample packing make loss torch script compat support streaming for processing sft datasts? improve iterable support ensure that batch vs single is done properly tweak check for batched prompt data reward can use same batch check fix reward trainer calls for tokenization improve check for batched reward model doesn't work well with batched add kd trainer e2e test linting rename test files so it gets picked up make the kd e2e fit in vram for ci and add lora version set lora_dropout explicitly lower lr make sure to set tokenizer from l3 70b and save safetensors make sure to use the correct tokenizer fix adapter model check make sure to use tensorboard to capture loss for checks chore: lint chore: lint improve logprob masking and shift in trainer more fixes try tests for kd on l40s don't shift student logits for kd no batching for kd chat templates make sure to truncate logprobs if there are more than top_k change up logic so we always truncate to top_k use iter instead of tuple fix finding the top-k rather than assuming first position has the correct val apply z-score scaling to kd kd loss needs to be calculated in full precision Always re-normalize teacher distribution various fixes * support for configurable top-k/softmax ordering * add attribute check for filter rows and lint * fix logic * handle none case for conversion to int * fix student logit off by one * set kd_temp to 1.0 for test loss * address PR feedback	2025-01-31 20:18:52 -05:00
NanoCode012	d425d5d3c3	fix: add warning for invalid eval_steps or save_steps (#2298 )	2025-01-31 08:58:25 -05:00
Wing Lian	cf17649ef3	Misc fixes 20250130 (#2301 ) * misc fixes for garbage collection and L40S w NCCL P2P * patch bnb fix for triton check * chore: lint * change up import * try patching differently * remove patch for bnb fix for now * more verbose checks and tweak train loss threshold	2025-01-31 08:58:04 -05:00
Sunny Liu	84960003ed	reset llama_patch_multipack.py	2025-01-30 14:40:18 -05:00
Sunny Liu	93a268e43d	--no-verify fixes silly mistake	2025-01-30 14:08:26 -05:00
Sunny Liu	065f6d477e	flex batching WIP	2025-01-30 14:04:59 -05:00
Dan Saunders	6f294c3d8d	refactor README; hardcode links to quarto docs; add additional quarto doc pages (#2295 ) * refactor README; hardcode links to quarto docs; add additional quarto doc pages * updates * review comments * update --------- Co-authored-by: Dan Saunders <dan@axolotl.ai>	2025-01-30 12:49:21 -05:00
Sunny Liu	96ad741cd5	flex batching WIP	2025-01-30 12:35:25 -05:00
Wing Lian	6f713226dd	make save_safetensors: true the default (#2292 ) * make save_safetensors: true the default * revert change to model output check	2025-01-30 11:48:48 -05:00
Wing Lian	1063d82b51	match the cuda version for 2.4.1 build w/o tmux (#2299 )	2025-01-30 11:46:09 -05:00
salman	ac471a697a	updating to fused (#2293 )	2025-01-30 11:45:56 -05:00
Wing Lian	8779997ba5	native support for modal cloud from CLI (#2237 ) * native support for modal cloud from CLI * do lm_eval in cloud too * Fix the sub call to lm-eval * lm_eval option to not post eval, and append not extend * cache bust when using branch, grab sha of latest image tag, update lm-eval dep * allow minimal yaml for lm eval * include modal in requirements * update link in README to include utm * pr feedback * use chat template * revision support * apply chat template as arg * add wandb name support, allow explicit a100-40gb * cloud is optional * handle accidental setting of tasks with a single task str * document the modal cloud yaml for clarity [skip ci] * cli docs * support spawn vs remote for lm-eval * Add support for additional docker commands in modal image build * cloud config shouldn't be a dir * Update README.md Co-authored-by: Charles Frye <cfrye59@gmail.com> * fix annotation args --------- Co-authored-by: Charles Frye <cfrye59@gmail.com>	2025-01-30 11:34:02 -05:00
bursteratom	ba88bc7840	wip flex block mask creation	2025-01-29 00:25:25 -05:00
Eric Tang	268543a3be	Ray Train Axolotl Integration (#2251 ) * current not clean working version move torch trainer to do_cli update code with config changes and clean up edit config cleanup add run name to trainer * address comments * use axolotl train in multigpu tests and add ray tests for multi-gpu * accelerate uses underscores for main_process_port arg * chore: lint * fix order of accelerate args * include ray train in docker images * current not clean working version move torch trainer to do_cli update code with config changes and clean up edit config cleanup add run name to trainer * address comments * use axolotl train in multigpu tests and add ray tests for multi-gpu * accelerate uses underscores for main_process_port arg * chore: lint * fix order of accelerate args * include ray train in docker images * fix bf16 resolution behavior * move dtype logic * x Signed-off-by: SumanthRH <sumanthrh@anyscale.com> * rename Signed-off-by: SumanthRH <sumanthrh@anyscale.com> * add to sidebar Signed-off-by: SumanthRH <sumanthrh@anyscale.com> * Apply suggestions from code review Co-authored-by: Eric Tang <46737979+erictang000@users.noreply.github.com> * Update docs/ray-integration.qmd Co-authored-by: Eric Tang <46737979+erictang000@users.noreply.github.com> * pre-commit fixes Signed-off-by: SumanthRH <sumanthrh@anyscale.com> * use output_dir instead of hardcoded saves path Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * bugfix storage dir * change type\ for resources_per_worker --------- Signed-off-by: SumanthRH <sumanthrh@anyscale.com> Co-authored-by: Wing Lian <wing@axolotl.ai> Co-authored-by: SumanthRH <sumanthrh@anyscale.com> Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com> Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2025-01-29 00:10:19 -05:00

1 2 3 4 5 ...

1922 Commits