Commit Graph

  • a6d28d19b1 feat: add glm and glm4 multipack and cce (#2546) NanoCode012 2025-04-23 21:27:51 +07:00
  • 38a031f9f2 Built site for gh-pages Quarto GHA Workflow Runner 2025-04-22 14:19:07 +00:00
  • 32e335dd51 fix missing host/port for vllm (#2543) Wing Lian 2025-04-22 10:16:48 -04:00
  • 0179021780 fix attribute error feat_hqq Sunny Liu 2025-04-21 22:29:24 -04:00
  • c4910da015 update more tests + better hqq validation Sunny Liu 2025-04-21 22:17:08 -04:00
  • db7e92f6a6 check if self.cfg.quantization exists when directly setting load_in_4bit Sunny Liu 2025-04-21 21:42:23 -04:00
  • 136b37e4d4 restore support for legacy cfg.load_in_xbit Sunny Liu 2025-04-21 21:32:01 -04:00
  • 92644513c4 update relora Sunny Liu 2025-04-21 21:22:44 -04:00
  • 266ef3f479 skip set_quant_config if quantization not given Sunny Liu 2025-04-21 17:17:41 -04:00
  • fcef8c95fe skip set_quant_config if quantization not given Sunny Liu 2025-04-21 17:17:20 -04:00
  • 136407c556 update multigpu/test_qwen2 Sunny Liu 2025-04-21 17:04:17 -04:00
  • 3251b3235f update test_mixtral Sunny Liu 2025-04-21 17:01:07 -04:00
  • 1aa9f7d952 update multigpu/test_eval, multigpu/test_llama Sunny Liu 2025-04-21 16:49:08 -04:00
  • a20e753321 update test_falcon_samplepack Sunny Liu 2025-04-21 16:29:49 -04:00
  • cb121ab91b update test_mixtral [skip e2e] Sunny Liu 2025-04-21 16:27:26 -04:00
  • b59640a4c7 amend model loading for hqq + fix hqq version Sunny Liu 2025-04-21 15:53:43 -04:00
  • f0a189131b amend model loading for hqq + fix hqq version Sunny Liu 2025-04-21 15:53:29 -04:00
  • c8fb5baad6 amend unittests pt2 Sunny Liu 2025-04-21 13:28:52 -04:00
  • 9be971d47c update test_models.py to conform to new quantization config Sunny Liu 2025-04-21 11:34:37 -04:00
  • ffd4ef1ece nit Sunny Liu 2025-04-21 11:28:59 -04:00
  • 320aff1867 update config doc Sunny Liu 2025-04-21 10:59:04 -04:00
  • ac24eba2ac include HQQLinear in find target_linear Sunny Liu 2025-04-20 12:48:14 -04:00
  • 8a5ad8aee3 typo Sunny Liu 2025-04-19 22:28:12 -04:00
  • 843b50fdaa rigorous qlora validation Sunny Liu 2025-04-19 22:26:04 -04:00
  • 098ffcc5a2 removed redundant hqq config validation Sunny Liu 2025-04-19 17:32:44 -04:00
  • ba8e29c841 quantization config refactoring - better integration Sunny Liu 2025-04-19 17:24:02 -04:00
  • 143b2e082c nit [skip e2e] Sunny Liu 2025-04-19 14:06:01 -04:00
  • aba484de97 WIP quant config refactor Sunny Liu 2025-04-19 01:32:36 -04:00
  • f6f5f89c6d fix more typo Sunny Liu 2025-04-18 16:54:36 -04:00
  • 8926fe9981 lax config requirement - qlora + hqq Sunny Liu 2025-04-18 15:48:23 -04:00
  • 987c5217a0 fix typos Sunny Liu 2025-04-18 14:35:42 -04:00
  • feaef03cb9 didn't realise model_config.quantization_config is just a regular dict Sunny Liu 2025-04-18 11:24:04 -04:00
  • ba5d917845 add e2e test for hqq training Sunny Liu 2025-04-18 11:09:41 -04:00
  • 0e9b060b4d add doc + requirement for hqq Sunny Liu 2025-04-18 00:26:35 -04:00
  • 0c40d12a18 more comprehensive hqq config options Sunny Liu 2025-04-18 00:15:59 -04:00
  • f55b3c805b hqq_nbits triggers prepare_model_for_kbit_training Sunny Liu 2025-04-17 00:37:02 -04:00
  • a64601f957 fix wrong variable name Sunny Liu 2025-04-16 17:50:42 -04:00
  • eb7bc70b99 fix dumb mistake Sunny Liu 2025-04-16 16:58:24 -04:00
  • db6c76b147 forgot to return data in check Sunny Liu 2025-04-16 16:37:51 -04:00
  • 99730ce40a hqq integration Sunny Liu 2025-04-16 16:27:09 -04:00
  • 1767d357d1 Built site for gh-pages Quarto GHA Workflow Runner 2025-04-21 14:34:05 +00:00
  • 7651550850 make sure to download fixtures for kd test (#2541) Wing Lian 2025-04-21 10:31:50 -04:00
  • 341e95aac9 prevent rate limiting to hf when using dispatch batches (#2536) [skip ci] Wing Lian 2025-04-21 10:31:35 -04:00
  • b882dfb63f Fixed Rex Scheduler Warm Up (#2535) [skip ci] Catgat 2025-04-21 10:30:55 -04:00
  • f889f18ca8 Built site for gh-pages Quarto GHA Workflow Runner 2025-04-21 14:26:32 +00:00
  • b640db1dbc don't run multigpu tests twice, run SP in separate test (#2542) Wing Lian 2025-04-21 10:24:13 -04:00
  • bc807ddaee Built site for gh-pages Quarto GHA Workflow Runner 2025-04-18 16:59:56 +00:00
  • 4ce469d32e fix: upgrade liger to 0.5.8 and use native Gemma3 patches (#2527) Chiwan Park 2025-04-19 01:57:40 +09:00
  • a0670abc94 add output for train loss in assertian err smaller-rand-model Wing Lian 2025-04-18 08:11:11 -07:00
  • 356fbc174f Built site for gh-pages Quarto GHA Workflow Runner 2025-04-18 00:29:55 +00:00
  • 60a8f0958d zero val fix for beta (#2538) Wing Lian 2025-04-17 17:27:19 -07:00
  • 08f287b57f swap llama tests for 7m param model Wing Lian 2025-04-17 09:52:35 -07:00
  • b4c7d9c29d fix perplexity scores Wing Lian 2025-04-17 07:58:53 -07:00
  • f8e92407ff Update src/axolotl/common/datasets.py preprocess_grpo-fix Sung Ching Liu 2025-04-09 09:36:05 -04:00
  • c12906134d Update src/axolotl/prompt_strategies/base.py Sung Ching Liu 2025-04-09 09:35:22 -04:00
  • 8154d26614 nit Sunny Liu 2025-04-08 16:06:35 -04:00
  • fefcbc300d barebone-ify the test so we get rid of unneeded processes Sunny Liu 2025-04-08 14:55:14 -04:00
  • 7d479348ee custom reward function loading, proeprly done Sunny Liu 2025-04-08 14:40:20 -04:00
  • ce0259db13 add outputdir bursteratom 2025-04-08 09:17:49 -04:00
  • 2798817cf9 Update tests/e2e/solo/test_grpo.py Sung Ching Liu 2025-04-08 09:07:04 -04:00
  • 0e1b081e49 add unit test Sunny Liu 2025-04-07 22:07:43 -04:00
  • 8df37ad91f propoer import from file_path after all else fails Sunny Liu 2025-04-07 19:14:22 -04:00
  • 9b74298328 Update src/axolotl/prompt_strategies/base.py Sung Ching Liu 2025-04-02 14:33:52 -04:00
  • ae8738aa87 skip check_datasets_label during debug for grpo Sunny Liu 2025-03-19 12:26:11 -04:00
  • ec52561a0c import from filepath if can't import_module Sunny Liu 2025-03-15 20:25:53 -04:00
  • eadb16c709 test import-wihtin-import relative path Sunny Liu 2025-03-12 00:19:09 -04:00
  • d2637fb01d first pass at modifying tests to use llama-7m Wing Lian 2025-04-16 21:14:04 -07:00
  • 04e0259d9d Built site for gh-pages Quarto GHA Workflow Runner 2025-04-16 22:05:16 +00:00
  • 9da730d6a4 fix(doc): cut cross entropy installation instructions broken in qmd (#2532) NanoCode012 2025-04-17 05:02:51 +07:00
  • 32637fad00 fix: preprocess yielding whole dataset to each worker (#2503) [skip ci] NanoCode012 2025-04-17 05:02:35 +07:00
  • f776f889a1 adding codecov reporting (#2372) [skip ci] Dan Saunders 2025-04-16 18:02:17 -04:00
  • 67e2d07be0 Built site for gh-pages Quarto GHA Workflow Runner 2025-04-16 21:51:47 +00:00
  • 69eda209a6 re-enable DS zero3 ci with updated transformers (#2533) Wing Lian 2025-04-16 14:48:40 -07:00
  • 42b3a43d66 Built site for gh-pages Quarto GHA Workflow Runner 2025-04-16 17:53:19 +00:00
  • b8c633aa97 batch api HF adapter for ring-flash-attn; cleanup and improvements (#2520) Dan Saunders 2025-04-16 13:50:48 -04:00
  • 682a9cf79b Fix: add delinearization and make qlora work with fsdp2 (#2515) NanoCode012 2025-04-16 13:31:39 +07:00
  • 256d474bda Built site for gh-pages Quarto GHA Workflow Runner 2025-04-16 05:19:29 +00:00
  • 271b24cccc feat: update cce to latest (#2521) NanoCode012 2025-04-16 12:17:10 +07:00
  • 198d775d6d make sure the all of the model is on the same device, so this test will pass on multigpu (#2524) [skip ci] Wing Lian 2025-04-15 22:15:42 -07:00
  • 0aa7c72c59 bump transformers to 4.51.3 transformers-4513 Wing Lian 2025-04-14 07:49:18 -07:00
  • e42a1ae51f Built site for gh-pages Quarto GHA Workflow Runner 2025-04-12 14:27:43 +00:00
  • e4307fb7d7 feat: add examples for deepcoder (#2517) NanoCode012 2025-04-12 21:25:23 +07:00
  • dd8bad06d0 remove strict=false from example yamls [skip ci] (#2523) [skip ci] Wing Lian 2025-04-12 07:25:11 -07:00
  • de8a625dd7 make e2e tests a bit faster by reducing test split size (#2522) [skip ci] Wing Lian 2025-04-12 07:24:43 -07:00
  • 8b9c695c04 Built site for gh-pages Quarto GHA Workflow Runner 2025-04-11 13:55:50 +00:00
  • 51267ded04 chore: update doc links (#2509) NanoCode012 2025-04-11 20:53:18 +07:00
  • 756a0559c1 feat(doc): explain deepspeed configs (#2514) [skip ci] NanoCode012 2025-04-11 20:52:43 +07:00
  • 9a8e3e9c7b Feat(examples): add deepcogito (#2516) [skip ci] NanoCode012 2025-04-11 20:52:23 +07:00
  • 7e7180fa10 add mocks for loading datasets in cli train tests (#2497) [skip ci] Wing Lian 2025-04-11 09:51:59 -04:00
  • 3ecb239742 Built site for gh-pages Quarto GHA Workflow Runner 2025-04-10 15:35:28 +00:00
  • 22c562533d Update rlhf.qmd (#2519) Sung Ching Liu 2025-04-10 11:33:09 -04:00
  • d9bb8f5267 Built site for gh-pages Quarto GHA Workflow Runner 2025-04-10 05:36:39 +00:00
  • 16823e1de6 feat: add CNAME (#2513) NanoCode012 2025-04-10 12:34:25 +07:00
  • c4f651c697 Built site for gh-pages Quarto GHA Workflow Runner 2025-04-09 18:03:56 +00:00
  • e0420b3528 fix: allow merge lora on pre-quantized model (#2511) NanoCode012 2025-04-10 01:01:42 +07:00
  • 9f986f5e71 Add Llama4 maverick examples (#2512) Wing Lian 2025-04-09 14:01:28 -04:00
  • deb01959d2 raising value error flex_patching_update Salman Mohammadi 2025-04-09 17:54:24 +01:00
  • 76ae4ae238 Merge branch 'main' into flex_patching_update Salman Mohammadi 2025-04-09 16:51:05 +01:00
  • 82597bd868 Create CNAME Wing Lian 2025-04-09 10:48:30 -04:00
  • 747dafe5b2 Add Llama4 maverick examples maverick-example Wing Lian 2025-04-09 08:27:46 -04:00