Commit Graph

  • c1b920f291 Fixing OSX installation (#2231) salman 2025-01-07 13:42:01 +00:00
  • bcd9ad44e0 flex attention support Sunny 2025-01-06 19:54:11 -05:00
  • 981ad965d0 allow minimal yaml for lm eval Wing Lian 2025-01-06 17:41:10 -05:00
  • 7ba701a355 cache bust when using branch, grab sha of latest image tag, update lm-eval dep Wing Lian 2025-01-06 16:19:08 -05:00
  • 0390bce7aa lm_eval option to not post eval, and append not extend Wing Lian 2025-01-06 11:52:07 -05:00
  • 2741d8de23 Fix the sub call to lm-eval Wing Lian 2025-01-06 11:44:55 -05:00
  • 27a88f37cd do lm_eval in cloud too Wing Lian 2025-01-06 11:17:14 -05:00
  • 61ad375bf4 config validation for flex attention bursteratom 2025-01-05 23:27:49 -05:00
  • e7912a4a66 Merge branch 'main' into hymba_multipack2 hymba_multipack2 Sunny Liu 2025-01-05 23:15:57 -05:00
  • 6da8abc01f native support for modal cloud from CLI Wing Lian 2025-01-05 21:49:53 -05:00
  • 8937f97bcf Built site for gh-pages Quarto GHA Workflow Runner 2024-12-31 20:23:13 +00:00
  • 3915abee4c make sure padding is labeled as -100 for pretraining (#2227) Wing Lian 2024-12-31 15:22:18 -05:00
  • c7b095d77f resume from optimizer checkpoint only optimizer-checkpoint Wing Lian 2024-12-30 07:51:44 -05:00
  • 1dd7f087b3 support for custom lr groups for non-embedding modules grouped_lr_squashed Wing Lian 2024-12-22 13:31:29 -05:00
  • d24a9954d2 Built site for gh-pages Quarto GHA Workflow Runner 2024-12-24 21:19:46 +00:00
  • 7a38dbe674 fix: allow trainer builder to use custom jinja chat template (#2219) NJordan72 2024-12-24 16:18:50 -05:00
  • 7fea60b36c Built site for gh-pages Quarto GHA Workflow Runner 2024-12-23 14:09:26 +00:00
  • e0a2eb2ebd fix untrained tokens if specified explicitly from a list (#2210) Wing Lian 2024-12-23 09:08:28 -05:00
  • d852d7af7a inference - don't default w accelerate, fix base model (#2216) [skip ci] Wing Lian 2024-12-23 07:48:41 -05:00
  • 3742deb1de add deepspeed example with torch compile enabled (#2212) [skip ci] Wing Lian 2024-12-22 12:11:39 -05:00
  • 31f0da0300 Built site for gh-pages Quarto GHA Workflow Runner 2024-12-21 22:39:24 +00:00
  • 2312caaa98 GC every n steps (#2209) Wing Lian 2024-12-21 17:38:33 -05:00
  • 423891d1ad Built site for gh-pages Quarto GHA Workflow Runner 2024-12-21 02:44:43 +00:00
  • 307cf7c685 move the dataset loading from remote/disk to a shared function so we can re-use for RL (#2204) Wing Lian 2024-12-20 21:43:52 -05:00
  • 70541145f1 adding test_datasets compat with pretraining_dataset (streaming) (#2206) [skip ci] Dan Saunders 2024-12-20 21:43:33 -05:00
  • 2717b97103 adding yaml dumper preserving input config format Dan Saunders 2024-12-20 20:39:40 +00:00
  • e0adf11b76 removing extra pytest xdist args Dan Saunders 2024-12-19 02:51:39 +00:00
  • 544f2a8a27 moving tests around for flash_attn install Dan Saunders 2024-12-18 19:36:23 +00:00
  • d4e29e5b67 adding split_heads argument for retaining original (Q, K) dimensionanlity Dan Saunders 2024-12-18 05:56:29 +00:00
  • 80ba0d8dd1 isolating problematic test Dan Saunders 2024-12-18 03:30:35 +00:00
  • dda9b25994 fixes post-rebase Dan Saunders 2024-12-18 01:38:56 +00:00
  • 0e9c0c6680 plugin implementation Dan Saunders 2024-12-18 01:26:41 +00:00
  • b7cc117394 convert-differential-transformer test coverage Dan Saunders 2024-12-17 20:46:19 +00:00
  • 1fadc5cfe5 duplicate code ignore Dan Saunders 2024-12-17 18:54:49 +00:00
  • 6425d052bc differential flash attention 2; cleanup Dan Saunders 2024-12-17 18:44:47 +00:00
  • 594c42f169 moving monkeypatch Dan Saunders 2024-12-17 14:12:03 +00:00
  • ae494776e4 pre-commit fix Dan Saunders 2024-12-17 13:52:34 +00:00
  • 503c4e9ffa fix model save / load logic Dan Saunders 2024-12-17 04:43:08 +00:00
  • 845dbede53 various improvemnents Dan Saunders 2024-12-13 15:17:52 -05:00
  • 7108ca72b4 various improvemnents Dan Saunders 2024-12-13 15:03:45 -05:00
  • af1d8d69af training fixes, patching, minor cleanup Dan Saunders 2024-12-13 00:06:22 -05:00
  • e162d36fe9 adding CLI command for convert-diff-transformer Dan Saunders 2024-12-11 23:11:19 -05:00
  • 7af20b52d6 Adding script for doing conversion; fixes and updates Dan Saunders 2024-12-11 21:35:47 -05:00
  • 866d7b3040 initial diff attn layer / model conversion implementation (support for llama arch) Dan Saunders 2024-12-11 14:51:53 -05:00
  • 23ac14540b Basic evaluate CLI command / codepath (#2188) Dan Saunders 2024-12-16 15:46:31 -05:00
  • 26cd287cab switching test hymba order bursteratom 2024-12-11 16:15:44 -05:00
  • cce7007bf8 rebased hymba multipack bursteratom 2024-12-11 14:52:05 -05:00
  • 42bd32a233 add outputs (symlink) to gitignore [skip ci] (#2205) Wing Lian 2024-12-19 20:14:43 -05:00
  • 87bc102b7b Built site for gh-pages Quarto GHA Workflow Runner 2024-12-19 16:45:53 +00:00
  • 5b8fb5e939 remove cicd pytest xdist args (#2201) Dan Saunders 2024-12-19 11:44:53 -05:00
  • fae6b2df10 Update cicd.sh djsaunde-patch-1 Dan Saunders 2024-12-18 22:44:43 -05:00
  • 3b18c3c5d3 Built site for gh-pages Quarto GHA Workflow Runner 2024-12-17 22:47:42 +00:00
  • bd2a594b89 use DataCollatorWithFlattening when not sample packing (#2167) Wing Lian 2024-12-17 17:46:44 -05:00
  • 3798229d85 handle torch_compile set to auto (#2172) [skip ci] Wing Lian 2024-12-17 16:42:41 -05:00
  • 10cfecf02e fix: use apply_chat_template to find turn boundaries and allow tool_calling field (#2179) [skip ci] NanoCode012 2024-12-18 04:42:21 +07:00
  • 102997a4b9 Built site for gh-pages Quarto GHA Workflow Runner 2024-12-17 19:00:58 +00:00
  • 339f3c67e2 dataset tags don't support https uris (#2195) Wing Lian 2024-12-17 13:58:53 -05:00
  • d91feaffc8 upgrade to liger 0.5.2 (#2181) [skip ci] Wing Lian 2024-12-17 13:58:21 -05:00
  • e246ceffa4 use axolotl contribs for fix_untrained_tokens (#2194) [skip ci] Wing Lian 2024-12-17 13:57:16 -05:00
  • 8ddc18ec8d move the setting of PYTORCH_CUDA_ALLOC_CONF to the cli rather than train module (#2183) [skip ci] Wing Lian 2024-12-17 13:56:48 -05:00
  • 1c14c4a15c Add hub model id config options to all example yml files (#2196) [skip ci] Sunny Liu 2024-12-17 11:24:30 -05:00
  • 5e8d76ed19 Built site for gh-pages Quarto GHA Workflow Runner 2024-12-17 16:02:14 +00:00
  • 1f623e6cc8 transformers 4.47.1 (#2187) Wing Lian 2024-12-17 11:01:21 -05:00
  • 96af760e08 add option for liger_pref_rl liger-dpo Wing Lian 2024-12-16 18:31:16 -05:00
  • 973d5e57d1 Built site for gh-pages Quarto GHA Workflow Runner 2024-12-16 20:47:25 +00:00
  • f865464ae5 Basic evaluate CLI command / codepath (#2188) Dan Saunders 2024-12-16 15:46:31 -05:00
  • cfa80dace0 import typo Wing Lian 2024-12-16 14:27:26 -05:00
  • 0a661980ca wip for liger dpo integration Wing Lian 2024-12-16 14:16:36 -05:00
  • 33090486d7 [feature] add pytorch profiling (#2182) Wing Lian 2024-12-16 12:38:43 -05:00
  • 8428b3f2c7 feat: add dpo liger feat/pref_liger NanoCode012 2024-12-16 22:19:27 +07:00
  • 60c98a4353 stuff enable_tp bursteratom 2024-12-13 15:44:51 -05:00
  • 9eaae5925a set labels and fix datasets block pretrain-dataset Wing Lian 2024-12-13 13:04:24 -05:00
  • d000851eeb allow pretrain to be used with sft Wing Lian 2024-12-13 12:58:37 -05:00
  • 79612da5c8 perform flakey patched tests in individual runner pytest-each-flakey Wing Lian 2024-12-12 23:22:28 -05:00
  • e80b7f4d8c Built site for gh-pages Quarto GHA Workflow Runner 2024-12-13 01:18:20 +00:00
  • effc4dc409 pin to 4.47.0 (#2180) Wing Lian 2024-12-12 20:17:12 -05:00
  • 7ac9cbebb9 make sure to set forward first activation-offloading-torchtune Wing Lian 2024-12-12 17:29:34 -05:00
  • 15f2fa4c8e fix detab usage Wing Lian 2024-12-12 17:24:18 -05:00
  • 43a2f9a155 fix enable_act_offloading Wing Lian 2024-12-12 17:22:34 -05:00
  • 8b79f1cbf6 use as class methods Wing Lian 2024-12-12 17:19:43 -05:00
  • 3872d5eaed WIP experimental management of patches on custom model Wing Lian 2024-12-12 17:09:02 -05:00
  • d657ff9c94 Update README.md base-model-readme-update Dan Saunders 2024-12-12 15:01:29 -05:00
  • c760d2b815 test accelerator bursteratom 2024-12-12 12:29:35 -05:00
  • 02629c7cdf parity for nightly ci - make sure to install setuptools (#2176) [skip ci] Wing Lian 2024-12-11 20:14:55 -05:00
  • 78a4aa86d6 evaluation_strategy was fully deprecated in recent release (#2169) [skip ci] Wing Lian 2024-12-11 20:14:24 -05:00
  • 2014f58181 set os environ RANK bursteratom 2024-12-11 11:45:07 -05:00
  • b5f9dd44f2 set os environ RANK bursteratom 2024-12-11 11:40:20 -05:00
  • b17b1aada7 initialise process group for tp bursteratom 2024-12-11 11:37:21 -05:00
  • 85381b6b15 initialise process group for tp bursteratom 2024-12-11 11:35:16 -05:00
  • acde081321 test lora tp bursteratom 2024-12-11 11:19:34 -05:00
  • e4c68a0cbc test lora tp bursteratom 2024-12-11 11:11:52 -05:00
  • 3855f5c3d3 tp example tp auto bursteratom 2024-12-11 11:03:39 -05:00
  • 5dd566dc63 tp example bursteratom 2024-12-11 11:01:23 -05:00
  • 42389c1f78 enable tensor parallel bursteratom 2024-12-11 10:38:14 -05:00
  • c2b8feb446 Built site for gh-pages Quarto GHA Workflow Runner 2024-12-10 21:26:21 +00:00
  • d009ead101 fix build w pyproject to respect insalled torch version (#2168) Wing Lian 2024-12-10 16:25:25 -05:00
  • ec4ebeb953 Built site for gh-pages Quarto GHA Workflow Runner 2024-12-09 19:21:11 +00:00
  • 6aa31b44c6 make sure to checkout tag before creating release (#2164) v0.6.0 Wing Lian 2024-12-09 14:20:16 -05:00
  • 9001859b0b fix release command (#2163) [skip ci] Wing Lian 2024-12-09 14:12:45 -05:00
  • 4396f8c3a2 Built site for gh-pages Quarto GHA Workflow Runner 2024-12-09 19:04:11 +00:00