Commit Graph

  • b86117c718 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-07 19:29:33 +00:00
  • 21f1bf4805 chore: update pre-commit hooks (#2870) [skip ci] github-actions[bot] 2025-07-07 15:26:15 -04:00
  • de2c5ba103 mark flaky geglu tests and add torch seed (#2876) [skip ci] Wing Lian 2025-07-07 15:24:16 -04:00
  • 9c0d7ee761 TiledMLP support (#2865) Wing Lian 2025-07-07 15:23:49 -04:00
  • fff7e070d0 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-07 18:18:44 +00:00
  • 22d4a838dc feat(doc): add vllm and fa2 incompat error to faq (#2877) NanoCode012 2025-07-08 01:13:37 +07:00
  • a108e5db56 use latest version of cce fork for SP fix (#2871) [skip ci] Wing Lian 2025-07-07 13:05:11 -04:00
  • aa1c91ff22 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-07 14:16:56 +00:00
  • faff0cff41 manage jinja templates as nicely formatted files (#2795) Wing Lian 2025-07-07 10:11:48 -04:00
  • 759cefb741 setup defaults for dataloader to ensure GPU is kept busy (#2632) [skip ci] Wing Lian 2025-07-07 10:10:58 -04:00
  • 69cd49a7aa update transformers to 4.53.1 (#2844) [skip ci] Wing Lian 2025-07-07 09:35:22 -04:00
  • 454eea049f Merge branch 'main' into print_venv print_venv salman 2025-07-07 10:01:00 +01:00
  • 276bb1e53f Built site for gh-pages Quarto GHA Workflow Runner 2025-07-07 02:00:36 +00:00
  • 5a961ecadf Fix: do not call preprocess in multimodal or pretraining case (#2861) NanoCode012 2025-07-07 09:55:33 +08:00
  • b37ddf9778 don't use tokenizer parallelism when using packing (#2862) [skip ci] Wing Lian 2025-07-06 21:55:09 -04:00
  • bf38e507fb respect shuffle_merged_datasets for single dataset too (#2866) [skip ci] Wing Lian 2025-07-06 21:20:41 -04:00
  • b79996bdc4 tweak loss shared-prepared-ci Wing Lian 2025-07-06 19:42:43 -04:00
  • 68368de7ed add seed for stable reproducibility Wing Lian 2025-07-06 19:29:51 -04:00
  • a94c4a014b tweak acceptable loss from changed hyperparams Wing Lian 2025-07-06 19:25:26 -04:00
  • 0102ca5943 fix cfg merge Wing Lian 2025-07-06 19:11:46 -04:00
  • 97e8c01a70 tweak losses Wing Lian 2025-07-06 18:55:16 -04:00
  • 5c4705b185 unset fa Wing Lian 2025-07-06 13:27:55 -04:00
  • 47a88da330 set mbsz and revert non-packed test Wing Lian 2025-07-06 12:27:25 -04:00
  • 07ab737a55 set tokenizer_config in fixture Wing Lian 2025-07-06 12:24:21 -04:00
  • c40da3b5eb use shared fixture for preprocessed alpaca dataset Wing Lian 2025-07-06 11:44:31 -04:00
  • 31a0dce0b4 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-05 13:26:55 +00:00
  • a5946ff1f0 build fa2 from source for base image with torch2.6 and cu124 (#2867) Wing Lian 2025-07-05 09:21:18 -04:00
  • d00bd99279 Merge branch 'print_venv' of github.com:axolotl-ai-cloud/axolotl into print_venv Salman Mohammadi 2025-07-04 12:44:49 +01:00
  • 2b41bfe9eb reverting Salman Mohammadi 2025-07-04 12:40:58 +01:00
  • 5bbbd599b4 Merge branch 'main' into print_venv salman 2025-07-04 12:36:13 +01:00
  • 26c782183d merging commands Salman Mohammadi 2025-07-04 12:35:20 +01:00
  • 70ca1b2291 fix nightlies to use correct cache (#2848) [skip ci] Wing Lian 2025-07-03 12:21:39 -04:00
  • 1f2f285173 fix: missing key in enum feat/phi_35_vision NanoCode012 2025-06-18 06:27:30 +07:00
  • 98e912e416 feat: add custom processing strategy for phi35 vl NanoCode012 2025-06-17 16:23:42 +07:00
  • e1528fb381 feat: add phi_35_vl support NanoCode012 2025-06-17 16:15:01 +07:00
  • 8065fed126 adding venv to prompt Salman Mohammadi 2025-07-02 15:27:42 +01:00
  • 585909f5ef Built site for gh-pages Quarto GHA Workflow Runner 2025-07-02 12:14:24 +00:00
  • dc114f2f51 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-02 12:11:18 +00:00
  • 8ae5a2311b feat: update handling for mistraltokenizer decode and multiprocessing pickling fix (#2790) NanoCode012 2025-07-02 19:07:18 +07:00
  • 6383630155 Fix: tokenize stall due to not shuffling dataset (#2845) NanoCode012 2025-07-02 19:06:00 +07:00
  • f2b352f2e5 Add sample_packing_sequentially to trainer args (#2853) [skip ci] Vincenzo di Cicco 2025-07-02 14:05:35 +02:00
  • bf5928d0ee feat(doc): update docker tag examples (#2851) [skip ci] NanoCode012 2025-07-02 19:05:01 +07:00
  • d1224db8f4 Decouple generate_during_eval from wandb to support other visualizers (#2849) [skip ci] Dhruv Mullick 2025-07-02 06:04:40 -06:00
  • 594a5a8ba6 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-02 07:09:21 +00:00
  • 327b4e48e9 Add installation instructions for pip and Docker to README.md (#2854) mhenrichsen 2025-07-02 09:03:52 +02:00
  • 79a3dc725a Built site for gh-pages Quarto GHA Workflow Runner 2025-06-30 02:21:58 +00:00
  • 8a6eba9312 Built site for gh-pages Quarto GHA Workflow Runner 2025-06-30 02:16:39 +00:00
  • 35fdbce102 Ensure device mesh patching is applied (#2842) Dan Saunders 2025-06-29 22:16:32 -04:00
  • cb811f8bf1 upgrade to flash-attn 2.8.0.post2 (#2828) Wing Lian 2025-06-29 22:11:16 -04:00
  • 8324395e98 Built site for gh-pages Quarto GHA Workflow Runner 2025-06-30 02:10:42 +00:00
  • 7563e1bd30 set a different triton cache for each test to avoid blocking writes to cache (#2843) Wing Lian 2025-06-29 22:05:21 -04:00
  • 9e44c05ea8 Built site for gh-pages Quarto GHA Workflow Runner 2025-06-28 19:34:23 +00:00
  • 81893c775c Accelerate 1.8.1 and BNB 0.46.0 update (#2815) Wing Lian 2025-06-28 15:29:19 -04:00
  • 8eba033dc4 fix: correct attention class retrieval for gemma3n model in lora_kernels.py fix/gemma3n-text-attention mhenrhcsen 2025-06-27 19:30:09 +02:00
  • a9c0f43202 fix: update attention class import logic for gemma3n model mhenrhcsen 2025-06-27 19:27:36 +02:00
  • 8758dee211 Built site for gh-pages Quarto GHA Workflow Runner 2025-06-27 15:25:39 +00:00
  • a1a740608d add assertion for packing patch to _get_unpad_data (#2840) Wing Lian 2025-06-27 11:20:23 -04:00
  • ec15a7a691 Support --lora-on-cpu flag for DPO model merging (#2766) [skip ci] kallewoof 2025-06-28 00:19:24 +09:00
  • 0a7a216b60 allow for different sequence_len for evaluations (#2836) [skip ci] Wing Lian 2025-06-27 11:02:51 -04:00
  • 7865f02be7 Built site for gh-pages Quarto GHA Workflow Runner 2025-06-27 14:44:26 +00:00
  • d8280d45c1 feat: add chat_template kwargs (#2837) NanoCode012 2025-06-27 21:38:46 +07:00
  • 24f2887e87 don't fail during preprocess for sampling from iterable dataset (#2825) [skip ci] Wing Lian 2025-06-27 10:37:53 -04:00
  • 29289a4de9 feat: replace old colab notebook with newer one (#2838) [skip ci] NanoCode012 2025-06-27 21:35:47 +07:00
  • a24957fa04 fix for iterable datasets and pickling (#2831) [skip ci] Wing Lian 2025-06-27 10:35:23 -04:00
  • c4f4f81bed Merge branch 'main' into map-dataset-fetcher-fix map-dataset-fetcher-fix Dan Saunders 2025-06-26 11:20:05 -04:00
  • 4ebd4aae3d handle possibly empty batch Dan Saunders 2025-06-26 10:59:27 -04:00
  • 6034bb8cec Built site for gh-pages Quarto GHA Workflow Runner 2025-06-26 14:54:07 +00:00
  • 927bf530bc fix(doc): default messages example used wrong key (#2832) NanoCode012 2025-06-26 21:47:31 +07:00
  • 18954ba100 chore: update pre-commit hooks (#2821) [skip ci] github-actions[bot] 2025-06-26 10:46:53 -04:00
  • 979632f59c SP restore buffers sp-restore-buffers Dan Saunders 2025-06-26 02:44:58 +00:00
  • 29348a89b2 Built site for gh-pages Quarto GHA Workflow Runner 2025-06-25 17:22:39 +00:00
  • d8cf66edbd use fork for multiprocess start method for packing in parallel (#2830) Wing Lian 2025-06-25 13:17:33 -04:00
  • 840a824aff Built site for gh-pages Quarto GHA Workflow Runner 2025-06-25 13:55:35 +00:00
  • 181cc3106b fix: catch httperror from ratelimiting hf when checking user token (#2827) NanoCode012 2025-06-25 20:50:13 +07:00
  • 20106116da fix: 'NoneType' object has no attribute 'column_names' (#2822) [skip ci] NanoCode012 2025-06-25 20:49:55 +07:00
  • a27c4f8771 feat: add falcon-h1 into axolotl (#2811) [skip ci] Younes B 2025-06-25 15:49:42 +02:00
  • bb1109b81d feat: update CCE to use axolotl's fork (#2813) [skip ci] NanoCode012 2025-06-25 20:49:22 +07:00
  • 7e84479334 Built site for gh-pages Quarto GHA Workflow Runner 2025-06-25 12:39:05 +00:00
  • 8c69ec3a1e gating _gather_outputs (causes increased vram usage) (#2829) Dan Saunders 2025-06-25 08:33:55 -04:00
  • 26c3e80b1f Built site for gh-pages Quarto GHA Workflow Runner 2025-06-24 19:04:40 +00:00
  • 46675496a3 log config (#2819) Dan Saunders 2025-06-24 14:59:30 -04:00
  • b594f18f6e just redact api keys dump-config Dan Saunders 2025-06-24 14:19:00 -04:00
  • 328d99f54b Built site for gh-pages Quarto GHA Workflow Runner 2025-06-24 03:56:57 +00:00
  • c6b5d35e5d fix: re-add gemma3 patch (#2817) NanoCode012 2025-06-24 10:51:30 +07:00
  • 604ad748a7 Built site for gh-pages Quarto GHA Workflow Runner 2025-06-24 03:13:47 +00:00
  • 12c826816d chunked cross entropy loss (#2625) Wing Lian 2025-06-23 23:08:46 -04:00
  • 700791deb9 Merge branch 'main' into dump-config Dan Saunders 2025-06-23 09:46:08 -04:00
  • d6d2cc673b remove none-valued config before dumping Dan Saunders 2025-06-23 13:35:53 +00:00
  • cf2e5b58c5 Built site for gh-pages Quarto GHA Workflow Runner 2025-06-23 13:13:25 +00:00
  • 1d8f500709 deepspeed fix (#2820) Dan Saunders 2025-06-23 09:07:57 -04:00
  • a65dbe779f fix: suspected eval vram increased usage fix/eval-accu NanoCode012 2025-06-23 18:44:03 +07:00
  • 83525f14a0 revert pre-commit changes Dan Saunders 2025-06-21 09:55:40 -04:00
  • 68c0e31fd1 moving text art; adding sensitive value redaction + sorting Dan Saunders 2025-06-21 09:49:06 -04:00
  • 22f930c658 log config Dan Saunders 2025-06-20 20:52:20 +00:00
  • 159f0531f9 chore: fix docstring comment from distributed pr chore/docstring-distributed NanoCode012 2025-06-20 05:48:34 +07:00
  • 67c4c43e8d Built site for gh-pages Quarto GHA Workflow Runner 2025-06-19 15:33:21 +00:00
  • 0b61cd445a v0.10.1 release v0.10.1 release-0.10.x Wing Lian 2025-06-19 11:29:39 -04:00
  • 0494359c6c update trl to 0.18.2 (#2814) Wing Lian 2025-06-19 11:27:59 -04:00
  • d02103948c Built site for gh-pages Quarto GHA Workflow Runner 2025-06-19 15:22:06 +00:00
  • 26c39e1ca7 fix(doc): address exitcode formatting to help search (#2809) [skip ci] NanoCode012 2025-06-19 22:19:52 +07:00