Commit Graph

  • 3156c605d4 diffusion training plugin Dan Saunders 2025-08-14 01:48:22 -04:00
  • d1de6f5f3d Add option to skip slow tests in PRs (#3060) [skip ci] salman 2025-08-14 03:57:51 +01:00
  • 950186348c Built site for gh-pages Quarto GHA Workflow Runner 2025-08-14 01:28:54 +00:00
  • 48b7ae1677 use updated patch releasE (#3066) Wing Lian 2025-08-13 21:23:05 -04:00
  • 506e3a3907 fix: fsdp_config validation being None (#3061) [skip ci] NanoCode012 2025-08-14 08:21:50 +07:00
  • 6e44f09a50 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-13 23:46:30 +00:00
  • 09145de8fa upgrade transformers==4.55.1 and bitsandbytes==0.47.0 (#3064) Wing Lian 2025-08-13 19:41:07 -04:00
  • 7d8e8c9ac2 nit [skip-e2e] fix-preview Salman Mohammadi 2025-08-13 12:58:30 +01:00
  • 7c2466b739 nit Salman Mohammadi 2025-08-13 12:58:13 +01:00
  • 3146cb56dd docs Salman Mohammadi 2025-08-13 12:53:58 +01:00
  • 338b82519f Built site for gh-pages Quarto GHA Workflow Runner 2025-08-13 10:45:04 +00:00
  • e0a2523a3b Workaround to unblock docs build in main (#3055) Wing Lian 2025-08-13 06:39:39 -04:00
  • c09b0a3bbf reverting change Salman Mohammadi 2025-08-13 11:27:15 +01:00
  • e05acccd77 linting Salman Mohammadi 2025-08-13 11:24:22 +01:00
  • c44abad531 debugging CI Salman Mohammadi 2025-08-13 11:24:05 +01:00
  • 817d70e669 debugging CI Salman Mohammadi 2025-08-13 10:45:41 +01:00
  • a28eb600e9 feat: add readme and better examples NanoCode012 2025-08-13 13:57:15 +07:00
  • 4b16f363bc fix: move NanoCode012 2025-08-13 10:46:42 +07:00
  • 0f2d196476 Remove deprecated configuration files: deleted config.qmd and finetune copy.yml to streamline project structure and eliminate unused resources. 775-option-to-drop-vs-truncate-on-rows-longer-than-context-length mhenrhcsen 2025-08-12 21:23:34 +02:00
  • f1a8474400 Remove transscribe.py file and clean up optimizer.py and rl.py for improved formatting and consistency. mhenrhcsen 2025-08-12 21:20:48 +02:00
  • dc5887c652 pre-commit: fix rl.py imports/types; add legacy drop_long_rl_seq wrapper; resolve config schema; run formatting mhenrhcsen 2025-08-12 21:12:07 +02:00
  • 54b542d312 remove unused files mhenrhcsen 2025-08-12 21:09:40 +02:00
  • 30a89b07b9 Refactor AxolotlInputConfig: clean up sequence_len and sequence_len_overflow_handling fields, ensuring consistent descriptions and removing conflict markers. mhenrhcsen 2025-08-12 21:03:28 +02:00
  • 746c03b097 Clean up conflict markers; finalize RL data split implementation; fix config schema conflicts; add truncation+post-filter behavior and alias handling mhenrhcsen 2025-08-12 20:53:28 +02:00
  • 47b3fe8af3 Resolve merge conflicts: unify pretraining utils imports, add alias handling; fix rl.py per new RL dataset API; resolve config schema conflict and add sequence_len_overflow_handling field mhenrhcsen 2025-08-12 20:45:26 +02:00
  • f5a3e3529e RL datasets: warn and drop unsalvageable over-length prompts post-truncate; add post-truncate filter; support alias config key 'excess_token_handling' mhenrhcsen 2025-08-12 20:37:41 +02:00
  • 03f5a7fd16 adding back Salman Mohammadi 2025-08-12 18:35:12 +01:00
  • 3d9b96a94f testing revert Salman Mohammadi 2025-08-12 15:53:43 +01:00
  • 42c16024a2 docs Salman Mohammadi 2025-08-12 15:34:46 +01:00
  • 272a456ec0 fix: remove lora in fft config NanoCode012 2025-08-12 20:31:48 +07:00
  • 7e83268662 feat: add wip fft offload config NanoCode012 2025-08-07 16:14:11 +07:00
  • b2a8c37a27 fix: use smaller model NanoCode012 2025-08-07 13:15:43 +07:00
  • 603166d9c5 feat: add example config NanoCode012 2025-08-07 13:12:57 +07:00
  • e8c9517ac8 feat: add to multipack NanoCode012 2025-08-07 13:11:40 +07:00
  • 0bbad9202c feat: add glm4moemoe to z3 NanoCode012 2025-08-07 13:10:48 +07:00
  • cb042e9775 feat: add cce for glm4_moe & deepseek v3 NanoCode012 2025-08-07 13:04:50 +07:00
  • ec94d632f3 docs Salman Mohammadi 2025-08-12 14:07:55 +01:00
  • e8bd3b0b3b Merge branch 'fix-preview' of github.com:axolotl-ai-cloud/axolotl into fix-preview Salman Mohammadi 2025-08-12 13:42:56 +01:00
  • 5a08b94668 update workflow Salman Mohammadi 2025-08-12 12:29:09 +01:00
  • ecb8c1f4b3 Merge branch 'main' into fix-preview salman 2025-08-12 09:43:39 +01:00
  • ab57be6526 render docs on python file change to preview api ref Salman Mohammadi 2025-08-12 09:43:23 +01:00
  • 9c0fa60220 fsdp2 w evals fixed upstream fa-check Wing Lian 2025-08-11 16:26:42 -04:00
  • 8efdc59796 just assume that fa supports window Wing Lian 2025-08-11 16:09:11 -04:00
  • 172b08b209 integration check for transformers#40002 Wing Lian 2025-08-11 10:06:11 -04:00
  • 160ba459ea tag v0.12.1 v0.12.1 Wing Lian 2025-08-11 09:37:40 -04:00
  • 7a09f76644 fix ray train and add fsdp2 smoke test for ray trainer (#3053) Wing Lian 2025-08-11 09:31:54 -04:00
  • 47304c7f8a use exec instead of subprocess to make ctrl+c nicer for cli (#3044) Wing Lian 2025-08-10 20:22:20 -04:00
  • 3d45620008 remove prepare-from-posids patch (#3052) [skip ci] Wing Lian 2025-08-11 09:34:41 -04:00
  • ce20e838b5 chore: update pre-commit hooks (#3050) [skip ci] github-actions[bot] 2025-08-11 09:32:21 -04:00
  • d4d84d48af fix ray train and add fsdp2 smoke test for ray trainer (#3053) Wing Lian 2025-08-11 09:31:54 -04:00
  • c9640bca2c attempt to fix quartodoc render for yields Wing Lian 2025-08-10 22:23:09 -04:00
  • 9b12c05660 use exec instead of subprocess to make ctrl+c nicer for cli (#3044) Wing Lian 2025-08-10 20:22:20 -04:00
  • 686933194e fix vllm tagging and add cloud images w/o tmux (#3049) [skip ci] Wing Lian 2025-08-10 20:21:56 -04:00
  • d12b461d19 follow up fix for plugin registration (#3054) [skip ci] Wing Lian 2025-08-10 20:21:38 -04:00
  • d6b81b3683 update training args check for new defaults (#3051) [skip ci] Wing Lian 2025-08-10 11:26:22 -04:00
  • 832b457557 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-09 18:39:29 +00:00
  • 05f1b4b2e8 run monkeypatch tests in seperate runner (#3047) Wing Lian 2025-08-09 14:34:07 -04:00
  • 7cfc80ec77 set dev version (#3045) [skip ci] Wing Lian 2025-08-08 13:56:53 -04:00
  • 0da6a95efa Add citation.tff (#3043) [skip ci] salman 2025-08-08 16:18:42 +01:00
  • 4ac3489822 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-08 12:29:45 +00:00
  • 2c8497e489 tag for v0.12.0 release (#3041) v0.12.0 Wing Lian 2025-08-08 08:24:09 -04:00
  • ec81cc4fc3 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-08 12:20:50 +00:00
  • f70d4de8c7 feat(doc): add links to new features on README (#2980) [skip ci] NanoCode012 2025-08-08 19:16:43 +07:00
  • 0ae06d756d use nanmean for loss aggregation (CP fix) (#3033) Dan Saunders 2025-08-08 08:15:17 -04:00
  • cf7cbdfd0e Built site for gh-pages Quarto GHA Workflow Runner 2025-08-08 12:14:47 +00:00
  • 2974670bf8 Feat: add arcee (#3028) NanoCode012 2025-08-08 19:09:11 +07:00
  • 50f2b94d50 add 120b and deepspeed zero3 examples (#3035) [skip ci] Wing Lian 2025-08-08 08:04:56 -04:00
  • eb2c87b525 Example for Slurm and various fixes (#3038) [skip ci] Wing Lian 2025-08-08 08:02:03 -04:00
  • 4db7f023c6 feat(doc): standardize the axolotl install to a release (#3040) [skip ci] NanoCode012 2025-08-08 19:00:26 +07:00
  • ceb67f0b89 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-08 11:51:10 +00:00
  • 4273d5cf7e feat: update nd parallelism readme (#3039) NanoCode012 2025-08-08 18:45:36 +07:00
  • 26f906cd89 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-08 06:35:51 +00:00
  • c5e5aba547 Add 2.8.0 base images and uv images (#3034) Wing Lian 2025-08-08 02:30:16 -04:00
  • e5ae08a364 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-08 01:27:51 +00:00
  • 9d5c95db6f Add support for Accelerate CP, ND examples, and fix for parallel config w fsdp (#3019) Wing Lian 2025-08-07 21:22:15 -04:00
  • ca796fb56e feat(doc): update gpt-oss readme (#3029) [skip ci] NanoCode012 2025-08-07 20:26:42 +07:00
  • 597953bef0 clear cache before clean up (#3031) [skip ci] VED 2025-08-07 18:55:58 +05:30
  • 39fbd3b2b5 fix: lora kernels for mistral3 (#3027) [skip ci] NanoCode012 2025-08-07 20:25:37 +07:00
  • 06f481c809 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-07 09:39:40 +00:00
  • 46dfacf255 ND Parallel Doc Nits (#3032) salman 2025-08-07 10:34:26 +01:00
  • 4bce713b39 allow custom trainer_cls to be defined as a module reference in the YAML (#3024) [skip ci] Wing Lian 2025-08-06 22:49:19 -04:00
  • 4de9b4bcff Built site for gh-pages Quarto GHA Workflow Runner 2025-08-07 00:25:40 +00:00
  • d09290f2f4 Lora kernels bias support (#3025) Dan Saunders 2025-08-06 20:20:08 -04:00
  • 940281cec5 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-06 18:34:40 +00:00
  • e442ff22aa fix keyerror on load_in_8bit/load_in_4bit access in _set_quantization_config (#3023) Wing Lian 2025-08-06 14:28:52 -04:00
  • 3a01ba3a16 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-06 13:53:34 +00:00
  • ba3dba3e4f add kernels for gpt oss models (#3020) Wing Lian 2025-08-06 09:47:55 -04:00
  • 75e142195a Built site for gh-pages Quarto GHA Workflow Runner 2025-08-06 12:07:50 +00:00
  • 97e86c6d47 drop old patches and code that are no longer needed (#3007) [skip ci] Wing Lian 2025-08-06 08:02:39 -04:00
  • 784f8c0e95 fix:kd_distillation key_error logprobs (#2990) VED 2025-08-06 17:32:07 +05:30
  • e3177c3210 feat: add complete optimizer docs (#3017) [skip ci] NanoCode012 2025-08-06 19:01:51 +07:00
  • 71710635d0 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-06 05:12:49 +00:00
  • 70faea331f add support for connecting via prime-intellect (#3021) Wing Lian 2025-08-06 01:06:52 -04:00
  • b62bddd5ba Built site for gh-pages Quarto GHA Workflow Runner 2025-08-06 04:18:49 +00:00
  • 8021c718ce use skip_move_to_device for all cases (#3015) Wing Lian 2025-08-06 00:13:12 -04:00
  • 764ee74967 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-05 20:34:40 +00:00
  • 42f5e6f9e9 upgrade transformers==4.55.0 (#3018) Wing Lian 2025-08-05 16:29:12 -04:00
  • 38b7b2d908 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-04 20:39:09 +00:00
  • ab49d16e34 Dion optimizer support (#3014) Wing Lian 2025-08-04 16:33:30 -04:00
  • d29ff6cdd2 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-04 14:29:25 +00:00