axolotl

Author	SHA1	Message	Date
NanoCode012	13d458d0ae	feat: update readme with inference instructions	2025-02-06 21:29:36 +07:00
NanoCode012	ebd406af1d	fix: lin_attn_mask in wrong dtype	2025-02-06 15:25:33 +07:00
NanoCode012	caa49a9d7d	fix: use existing model config	2025-02-06 00:12:14 +07:00
NanoCode012	c15ea6b956	fix: load vocab_size	2025-02-05 23:46:59 +07:00
NanoCode012	578fa764c8	chore: moved feature map into linear attention	2025-02-05 19:40:11 +07:00
NanoCode012	0e6efaa10c	fix: manually set auto-map	2025-02-05 19:35:15 +07:00
NanoCode012	c4cb622590	fix: remove redundant files	2025-02-05 19:34:06 +07:00
NanoCode012	0f82bd2d18	chore: improve instruction and made linearize optional	2025-02-05 19:33:15 +07:00
NanoCode012	49746b184f	chore: flatten directory structure and register to autoclass to save	2025-02-05 19:17:57 +07:00
NanoCode012	9e1c4de13c	fix: assign linear head instead of loading state dict	2025-02-05 18:24:31 +07:00
NanoCode012	2d5f692fc0	refactor: move to modeling file and remove axolotl imports	2025-02-05 18:16:39 +07:00
NanoCode012	2fd5c45c2e	chore: refactor register linear llama	2025-02-05 18:03:04 +07:00
NanoCode012	8294e6218f	fix: freeze base_model and register config into Auto class	2025-02-05 15:59:06 +07:00
NanoCode012	253dcdd0cf	fix: proprerly return causal model	2025-02-05 15:56:57 +07:00
NanoCode012	4cc60df876	fix: config to allow optional input	2025-02-05 15:52:30 +07:00
NanoCode012	2bc7833a4e	feat: integrate new modelling into cli	2025-02-04 19:46:05 +07:00
NanoCode012	1fb8d86396	fix: handle num_items_in_batch	2025-02-04 19:32:20 +07:00
NanoCode012	adeefc1991	feat: refactor into modeling code	2025-02-04 19:29:42 +07:00
NanoCode012	fb88269dcb	fix: set model_accepts_loss_kwargs=False	2025-02-04 02:01:05 +07:00
NanoCode012	433cf4a8c7	fix: compute_loss return sig	2025-02-04 01:53:18 +07:00
NanoCode012	0b7b58c8be	feat: migrate to transformers 4.48 attention sig	2025-02-04 01:52:35 +07:00
NanoCode012	81731adc1d	fix: missing input arg	2025-02-04 01:51:33 +07:00
NanoCode012	a1715aa317	chore: add todo	2025-02-03 22:47:25 +07:00
NanoCode012	ce0cd470f7	feat: add convert linear attention cli	2025-02-03 22:46:09 +07:00
NanoCode012	311d6eb5da	feat: add lolcats with fixed typed	2025-02-03 22:38:19 +07:00
Wing Lian	158330ab60	[feature] sweeps (#2171 )	2025-02-01 21:11:18 -05:00
Wing Lian	80e1468b8d	better handling of multipack dataset length (#2296 )	2025-02-01 21:10:34 -05:00
Wing Lian	a20f17689b	set MODAL_IMAGE_BUILDER_VERSION=2024.10 to 2024.10 to test latest builder (#2302 ) * set MODAL_IMAGE_BUILDER_VERSION=2024.10 to 2024.10 to test latest builder * chore: lint * remove fastapi and pydantic extras	2025-01-31 20:19:20 -05:00
Wing Lian	78ce268848	KD Trainer w logprobs (#2303 ) * refactor trainer to prevent circular dependencies later fix loader default KD dataset loading and KD with logprobs filter bad rows make batch smaller handle padding/collation for KD datasets make it work flipped the slice cross entropy loss coefficient during KD make sure to multiply against the correct loss chore: lint triton wip no where support v2 trial no torch.exp inside triton kernel no log etc no torch.tensor v3 fix kwarg don't use triton for now better rescaling for temperatures hash for temperature too use kd_alpha in the correct loss method fix kd loss so it's causal (fixes repeating tokens) var naming and add todo chore: lint refactor so we can easily add new loss functions add license block remove references to triton kd for now handle token/logprob shifting support for custom trainer classes from plugins refactor kd chat template loader move more things to kd plugin remove moved class from import make plugin setup concise increase logging around loading plugins add copyrights remove duplicate code more info on preprocess for kd and fix import be a bit pickier about loading dynamic prompt strategies kd sample packing make loss torch script compat support streaming for processing sft datasts? improve iterable support ensure that batch vs single is done properly tweak check for batched prompt data reward can use same batch check fix reward trainer calls for tokenization improve check for batched reward model doesn't work well with batched add kd trainer e2e test linting rename test files so it gets picked up make the kd e2e fit in vram for ci and add lora version set lora_dropout explicitly lower lr make sure to set tokenizer from l3 70b and save safetensors make sure to use the correct tokenizer fix adapter model check make sure to use tensorboard to capture loss for checks chore: lint chore: lint improve logprob masking and shift in trainer more fixes try tests for kd on l40s don't shift student logits for kd no batching for kd chat templates make sure to truncate logprobs if there are more than top_k change up logic so we always truncate to top_k use iter instead of tuple fix finding the top-k rather than assuming first position has the correct val apply z-score scaling to kd kd loss needs to be calculated in full precision Always re-normalize teacher distribution various fixes * support for configurable top-k/softmax ordering * add attribute check for filter rows and lint * fix logic * handle none case for conversion to int * fix student logit off by one * set kd_temp to 1.0 for test loss * address PR feedback	2025-01-31 20:18:52 -05:00
NanoCode012	d425d5d3c3	fix: add warning for invalid eval_steps or save_steps (#2298 )	2025-01-31 08:58:25 -05:00
Wing Lian	cf17649ef3	Misc fixes 20250130 (#2301 ) * misc fixes for garbage collection and L40S w NCCL P2P * patch bnb fix for triton check * chore: lint * change up import * try patching differently * remove patch for bnb fix for now * more verbose checks and tweak train loss threshold	2025-01-31 08:58:04 -05:00
Dan Saunders	6f294c3d8d	refactor README; hardcode links to quarto docs; add additional quarto doc pages (#2295 ) * refactor README; hardcode links to quarto docs; add additional quarto doc pages * updates * review comments * update --------- Co-authored-by: Dan Saunders <dan@axolotl.ai>	2025-01-30 12:49:21 -05:00
Wing Lian	6f713226dd	make save_safetensors: true the default (#2292 ) * make save_safetensors: true the default * revert change to model output check	2025-01-30 11:48:48 -05:00
Wing Lian	1063d82b51	match the cuda version for 2.4.1 build w/o tmux (#2299 )	2025-01-30 11:46:09 -05:00
salman	ac471a697a	updating to fused (#2293 )	2025-01-30 11:45:56 -05:00
Wing Lian	8779997ba5	native support for modal cloud from CLI (#2237 ) * native support for modal cloud from CLI * do lm_eval in cloud too * Fix the sub call to lm-eval * lm_eval option to not post eval, and append not extend * cache bust when using branch, grab sha of latest image tag, update lm-eval dep * allow minimal yaml for lm eval * include modal in requirements * update link in README to include utm * pr feedback * use chat template * revision support * apply chat template as arg * add wandb name support, allow explicit a100-40gb * cloud is optional * handle accidental setting of tasks with a single task str * document the modal cloud yaml for clarity [skip ci] * cli docs * support spawn vs remote for lm-eval * Add support for additional docker commands in modal image build * cloud config shouldn't be a dir * Update README.md Co-authored-by: Charles Frye <cfrye59@gmail.com> * fix annotation args --------- Co-authored-by: Charles Frye <cfrye59@gmail.com>	2025-01-30 11:34:02 -05:00
Eric Tang	268543a3be	Ray Train Axolotl Integration (#2251 ) * current not clean working version move torch trainer to do_cli update code with config changes and clean up edit config cleanup add run name to trainer * address comments * use axolotl train in multigpu tests and add ray tests for multi-gpu * accelerate uses underscores for main_process_port arg * chore: lint * fix order of accelerate args * include ray train in docker images * current not clean working version move torch trainer to do_cli update code with config changes and clean up edit config cleanup add run name to trainer * address comments * use axolotl train in multigpu tests and add ray tests for multi-gpu * accelerate uses underscores for main_process_port arg * chore: lint * fix order of accelerate args * include ray train in docker images * fix bf16 resolution behavior * move dtype logic * x Signed-off-by: SumanthRH <sumanthrh@anyscale.com> * rename Signed-off-by: SumanthRH <sumanthrh@anyscale.com> * add to sidebar Signed-off-by: SumanthRH <sumanthrh@anyscale.com> * Apply suggestions from code review Co-authored-by: Eric Tang <46737979+erictang000@users.noreply.github.com> * Update docs/ray-integration.qmd Co-authored-by: Eric Tang <46737979+erictang000@users.noreply.github.com> * pre-commit fixes Signed-off-by: SumanthRH <sumanthrh@anyscale.com> * use output_dir instead of hardcoded saves path Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * bugfix storage dir * change type\ for resources_per_worker --------- Signed-off-by: SumanthRH <sumanthrh@anyscale.com> Co-authored-by: Wing Lian <wing@axolotl.ai> Co-authored-by: SumanthRH <sumanthrh@anyscale.com> Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com> Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2025-01-29 00:10:19 -05:00
salman	54dd7abfc1	Process reward models (#2241 ) * adding model_cfg to set num_labels * using a num_labels field instead * linting * WIP stepwise prompt tokenizer * this should work? * trainer working? * pushing to runpod * fixing saving * updating conf * updating config, adding docs * adding stepwise supervision docpage * updating tests * adding test for dataset * fixing tests * linting * addressing some comments * adding additional cfg fields support * updating tests, fixing cfg * fixing tests * updating loss * Update test_process_reward_model_smollm2.py * updating loss values and seed * dumb pre-commit	2025-01-29 00:08:33 -05:00
salman	c071a530f7	removing 2.3.1 (#2294 )	2025-01-28 23:23:44 -05:00
mashdragon	c015a76a23	Num epochs float (#2282 ) [skip ci] * Change num_epochs type to float * Handle float value for num_epochs in trainer.py	2025-01-28 23:23:26 -05:00
NanoCode012	067b442596	chore: refactor SaveModelCallback to stop handle fractional save_steps (#2291 ) [skip ci]	2025-01-28 23:22:10 -05:00
Wing Lian	0b52f06227	bump bnb to 0.45.1 (#2289 ) [skip ci]	2025-01-28 23:21:25 -05:00
Wing Lian	887513285d	support for custom lr groups for non-embedding modules (#2213 ) * support for custom lr groups for non-embedding modules invert name check for group modules include lr_groups in training args additional conditional for creating optimizer fix regular params as w weight decay fix lookup and add docs * address pr feedback	2025-01-24 12:56:28 -05:00
Wing Lian	20620771f1	Pretrain multipack (#2278 ) * fix for pretrain with packing * fix model name and loss expected * make sure to check with micro batch size for pretraining * change loss threshholds based on parametrization * make tests smaller for CI * fix pretrain packing * fix pretrain packing test * address pr feedback	2025-01-24 12:55:20 -05:00
NanoCode012	6086162488	chore(doc): improve explanation for _steps and _strategy (#2270 )	2025-01-24 10:07:02 -05:00
mashdragon	b2774af66c	Take `split` param from config in all load_dataset instances (#2281 )	2025-01-24 10:06:50 -05:00
NanoCode012	74f9782fc3	chore(doc): fix explanation on gcs creds retrieval (#2272 )	2025-01-24 10:05:58 -05:00
Wing Lian	8a7a0b07dc	support for latest transformers release 4.48.1 (#2256 )	2025-01-23 21:17:57 -05:00
Wing Lian	8fb72cbc0b	use the extracted field_messages to parse the role fields (#2265 )	2025-01-21 15:39:30 -05:00
Adithya Kamath	bb9d4102c4	Add 5000 line history limit to tmux for docker cloud (#2268 )	2025-01-21 15:39:17 -05:00

1 2 3 4 5 ...

1864 Commits