Commit Graph

  • c5355b9301 Built site for gh-pages Quarto GHA Workflow Runner 2025-09-03 05:53:29 +00:00
  • 53a0c1f39c feat: add peft_trainable_token_indices (#3062) NanoCode012 2025-09-03 12:48:01 +07:00
  • 4cc6038d52 chore: update pre-commit hooks (#3122) [skip ci] github-actions[bot] 2025-09-03 01:41:34 -04:00
  • e48aa8a5b1 feat(doc): improve visibility for colab notebooks (#3110) [skip ci] NanoCode012 2025-09-03 12:40:53 +07:00
  • 24aba5caca Clamping the len of dataloader to minimum of 1 (#3100) [skip ci] xuyifann 2025-09-02 22:40:27 -07:00
  • 24b470d77e Built site for gh-pages Quarto GHA Workflow Runner 2025-09-02 17:19:18 +00:00
  • 06bebcb65f run cu128-2.8.0 e2e tests on B200 (#3126) Wing Lian 2025-09-02 13:13:23 -04:00
  • 81cb0968fe Built site for gh-pages Quarto GHA Workflow Runner 2025-09-02 16:14:17 +00:00
  • 231a67e70b Streaming SFT support (#3101) Dan Saunders 2025-09-02 12:08:44 -04:00
  • b64d2f50d2 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-29 17:58:30 +00:00
  • 0094a2d744 support for tiledmlp for GPT-OSS (#3116) Wing Lian 2025-08-29 13:52:49 -04:00
  • 7ed40f1d70 automatically set env vars for single gpu deepspeed zero3 (#3118) [skip ci] Wing Lian 2025-08-29 13:36:47 -04:00
  • 5b6ec2820f patch for ds_grads_remaining in deepspeed (#3102) [skip ci] VED 2025-08-29 21:42:09 +05:30
  • e37a768960 feat: add baseten to lmeval feat/lmeval-baseten NanoCode012 2025-08-29 18:02:26 +07:00
  • 34cb679fb2 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-28 13:16:01 +00:00
  • 6afba3871d Add support for PyTorch 2.8.0 (#3106) Wing Lian 2025-08-28 09:10:40 -04:00
  • dc338c3b0e Update .coderabbit.yaml (#3109) [skip ci] Dan Saunders 2025-08-27 09:50:52 -04:00
  • 83ae13bb9f Built site for gh-pages Quarto GHA Workflow Runner 2025-08-27 08:15:26 +00:00
  • d0d2fc5606 Tokens per second logging [skip-e2e] (#3072) salman 2025-08-27 09:10:14 +01:00
  • 5e55a04a68 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-26 13:36:01 +00:00
  • e1131e9619 make always skip_move_to_device default as true (#3084) Wing Lian 2025-08-26 09:30:22 -04:00
  • c4c4b90638 add tokenizer_save_jinja_files to keep legacy behavior of including chat template in tokenizer_config.json (#3093) Wing Lian 2025-08-26 09:30:04 -04:00
  • 0e9945e3b9 deploy training jobs to baseten w truss in axolotl cli (#3086) [skip ci] Wing Lian 2025-08-26 09:29:50 -04:00
  • 9ad702ab99 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-26 09:52:59 +00:00
  • 0de254a0d0 feat: add gemma3_text attention handling for lora kernels (#3103) NanoCode012 2025-08-26 16:47:26 +07:00
  • d3bea3a2eb broken streaming-v2 Dan Saunders 2025-08-25 16:51:36 +00:00
  • 2e2302aae3 remove unused Dan Saunders 2025-08-25 15:46:25 +00:00
  • 3a35076513 seems to be working? Dan Saunders 2025-08-24 00:49:13 +00:00
  • 915434f0f3 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-24 03:43:00 +00:00
  • 79ddaebe9a Add ruff, remove black, isort, flake8, pylint (#3092) Dan Saunders 2025-08-23 23:37:33 -04:00
  • 21ba1cd3f1 wire up squash_position_ids squash_position_ids Wing Lian 2025-08-23 16:21:28 -04:00
  • 372f471792 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-22 18:34:46 +00:00
  • eea7a006e1 make multipack sampler patch explicit (#3096) Dan Saunders 2025-08-22 14:29:10 -04:00
  • 78a039e1be add depr warning for preprocess --iterable streaming Dan Saunders 2025-08-22 16:02:10 +00:00
  • 69f356163e fix Dan Saunders 2025-08-22 15:39:28 +00:00
  • 53bbca2591 bugfix for sample packing Dan Saunders 2025-08-22 04:33:48 +00:00
  • 49bd6ece4a remove unused Dan Saunders 2025-08-22 00:37:43 +00:00
  • 42b38a718a remove eval streaming (not HF supported) Dan Saunders 2025-08-22 00:12:17 +00:00
  • 4121bcbc33 fix kd test Dan Saunders 2025-08-21 19:37:15 +00:00
  • 0caa24eab0 comments Dan Saunders 2025-08-21 17:35:24 +00:00
  • 68bb70bbae fix test Dan Saunders 2025-08-21 04:14:09 +00:00
  • 5d8d7ef327 lint Dan Saunders 2025-08-20 19:33:23 +00:00
  • 7836da9ed9 remove unuse Dan Saunders 2025-08-20 19:26:26 +00:00
  • 7eba3795fe fixes Dan Saunders 2025-08-20 17:41:33 +00:00
  • 1b7b67d06e smoke test Dan Saunders 2025-08-20 16:17:08 +00:00
  • 0843dc678a separate out train and eval datasets streaming; cleanup Dan Saunders 2025-08-20 15:08:31 +00:00
  • 067158e24a nits Dan Saunders 2025-08-20 13:46:44 +00:00
  • aa5a497a2c nits Dan Saunders 2025-08-20 13:46:29 +00:00
  • 2176962231 separate out train and eval dataset streaming Dan Saunders 2025-08-20 05:17:05 +00:00
  • 10335d5df9 add multidata strats Dan Saunders 2025-08-20 04:44:07 +00:00
  • e4e8ffd40c nits Dan Saunders 2025-08-20 04:28:18 +00:00
  • 846aa41baa nits Dan Saunders 2025-08-20 04:15:06 +00:00
  • 7bb52d00bb progress on streaming Dan Saunders 2025-08-20 03:33:59 +00:00
  • 3b2dd05798 remove iterable CLI arg Dan Saunders 2025-08-20 00:18:42 +00:00
  • b6431083be nit Dan Saunders 2025-08-19 18:12:09 +00:00
  • 16ff01df85 separate streaming and pretraining Dan Saunders 2025-08-19 18:05:05 +00:00
  • d5f76bb9fe Built site for gh-pages Quarto GHA Workflow Runner 2025-08-22 11:32:00 +00:00
  • ab4d604a8f upgrade peft for 0.17.1 (#3094) Wing Lian 2025-08-22 07:26:30 -04:00
  • c3e1882de5 progress tui Dan Saunders 2025-08-22 02:43:16 -04:00
  • 889b27ecf1 tui Dan Saunders 2025-08-22 05:08:02 +00:00
  • 8e72c085ff Built site for gh-pages Quarto GHA Workflow Runner 2025-08-21 19:09:51 +00:00
  • 0fa752e58b upgrade flash-attn to 2.8.3 for gpt-oss attn sink support (#3082) Wing Lian 2025-08-21 15:04:10 -04:00
  • 08e517ea48 Update .coderabbit.yaml (#3091) [skip ci] Dan Saunders 2025-08-20 22:14:13 -04:00
  • 4cecf29034 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-20 19:23:53 +00:00
  • 07fd22f39b better handling of lora w bias with fsdp2 and handling of files when saving model checkpoint (#3090) Wing Lian 2025-08-20 15:17:48 -04:00
  • 5c816ac2a0 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-20 12:57:58 +00:00
  • 06eaf6c448 misc fixes (#3085) Wing Lian 2025-08-20 08:52:26 -04:00
  • 54c4911355 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-20 00:31:06 +00:00
  • 050210e637 fix: Sweep runs overwrite each other because output_dir from base config is reused (#3080) goggle 2025-08-20 09:25:20 +09:00
  • 4870638734 initial impl of streaming preprocessing streaming-on-the-fly-preprocess Dan Saunders 2025-08-19 23:10:54 +00:00
  • b25078397c nit Dan Saunders 2025-08-19 18:12:09 +00:00
  • ba681125d7 separate streaming and pretraining Dan Saunders 2025-08-19 18:05:05 +00:00
  • 20158e1c54 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-19 17:36:18 +00:00
  • 05cedbfb1e add baseten info for gpt-oss recipe (#3078) Wing Lian 2025-08-19 13:30:37 -04:00
  • c3db6dd307 remove hardcode no-seq-len Dan Saunders 2025-08-19 15:41:32 +00:00
  • 9a6e9d8d15 no sequence length support Dan Saunders 2025-08-19 10:25:37 -04:00
  • cf8c93e2ee wip diffusion-next-token-trainer Wing Lian 2025-08-19 09:36:57 -04:00
  • 1f75287a3a diffusion custom models approach diffusion-custom-models Dan Saunders 2025-08-19 04:09:46 +00:00
  • 64f349b7bb diffusion alt: custom loss impl diffusion-custom-loss Dan Saunders 2025-08-18 20:50:34 +00:00
  • 260ebe4c93 diffusion alt: custom loss impl Dan Saunders 2025-08-18 20:50:20 +00:00
  • 63d2280999 nits Dan Saunders 2025-08-18 19:17:24 +00:00
  • b210db2d15 fixes Dan Saunders 2025-08-18 19:09:09 +00:00
  • 556a69118f sample generation, tests fixes Dan Saunders 2025-08-18 18:25:04 +00:00
  • 8569675b26 Merge branch 'main' into diffusion Dan Saunders 2025-08-18 10:07:55 -04:00
  • ee80de3cfb Built site for gh-pages Quarto GHA Workflow Runner 2025-08-18 12:50:03 +00:00
  • 3cf22ae23b tag v0.12.2 v0.12.2 release-v0.12.x Wing Lian 2025-08-18 08:48:53 -04:00
  • c10eb811fa data_parallel_size in in VllmserveCliArgs (#3074) VED 2025-08-18 18:14:37 +05:30
  • 0eef385b1a [feat] truncation support with excess_length_strategy (#3068) [skip ci] VED 2025-08-18 18:09:13 +05:30
  • bb65157dcf fix conditional for None values split-batches-sizes Wing Lian 2025-08-17 12:49:48 -04:00
  • 7fd3d8abc4 handle batch size correchtly when using split and dispatch batches Wing Lian 2025-08-16 22:05:31 -04:00
  • 077b5a4358 cleanup; tests draft Dan Saunders 2025-08-16 02:44:44 +00:00
  • 866a618cb4 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-16 01:30:52 +00:00
  • ecbe8b2b61 [GPT-OSS] improve FSDP shard merging and documentation for GPT-OSS (#3073) Wing Lian 2025-08-15 21:25:01 -04:00
  • 234b7b3126 nits Dan Saunders 2025-08-16 00:14:44 +00:00
  • f1f9851422 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-15 14:58:20 +00:00
  • 130ef7c51a Various fixes for VLMs (#3063) Wing Lian 2025-08-15 10:52:57 -04:00
  • e19be0c2d9 add back in reinit_weights (clobbered?); masking / pretrain fixes Dan Saunders 2025-08-15 02:21:25 +00:00
  • 479a454ae3 fixes + improvements Dan Saunders 2025-08-14 16:11:37 -04:00
  • 0a9341acde nits Dan Saunders 2025-08-14 01:53:24 -04:00
  • d8b63804bc cleanup Dan Saunders 2025-08-14 01:51:13 -04:00