Commit Graph

  • a34c15dc8f Built site for gh-pages Quarto GHA Workflow Runner 2024-12-04 17:27:26 +00:00
  • d7d2fd366e update from unsloth-zoo with additional fixes (#2122) Wing Lian 2024-12-04 12:26:08 -05:00
  • e2882dd749 drop unnecessary BNB_CUDA_VERSION env var from docker as it just results in warnings (#2121) [skip ci] Wing Lian 2024-12-04 12:25:47 -05:00
  • 4698eed43f set pixtral chat template bursteratom 2024-12-04 12:11:21 -05:00
  • f84c3b37e7 lint bursteratom 2024-12-04 11:59:45 -05:00
  • a1790f2652 replace tensorboard checks with helper function (#2120) [skip ci] Wing Lian 2024-12-03 21:06:20 -05:00
  • 418ad2b586 add missing fixture decorator for predownload dataset (#2117) [skip ci] Wing Lian 2024-12-03 18:08:46 -05:00
  • 71d3b08658 Built site for gh-pages Quarto GHA Workflow Runner 2024-12-03 20:07:05 +00:00
  • d87df2c776 prepare plugins needs to happen so registration can occur to build the plugin args (#2119) Wing Lian 2024-12-03 15:06:09 -05:00
  • bc18c1ef60 Built site for gh-pages Quarto GHA Workflow Runner 2024-12-03 13:59:16 +00:00
  • 1ef70312ba fix optimizer reset for relora sft (#1414) Wing Lian 2024-12-03 08:58:23 -05:00
  • 81ef3e45f7 fix(readme): update cuda instructions during preprocess (#2114) [skip ci] NanoCode012 2024-12-03 20:58:03 +07:00
  • fe2735ff6f Built site for gh-pages Quarto GHA Workflow Runner 2024-12-03 13:23:18 +00:00
  • bd8436bc6e feat: add cut_cross_entropy (#2091) NanoCode012 2024-12-03 20:22:22 +07:00
  • c318e302a8 Built site for gh-pages Quarto GHA Workflow Runner 2024-12-03 12:44:13 +00:00
  • fc6188cd76 fix merge conflict of duplicate max_steps in config for relora (#2116) Wing Lian 2024-12-03 07:42:41 -05:00
  • e965886e95 Built site for gh-pages Quarto GHA Workflow Runner 2024-12-03 05:03:56 +00:00
  • b9bb02406a fix so inference can be run against quantized models without adapters (#1834) Wing Lian 2024-12-03 00:02:38 -05:00
  • ff4794cd8e Add ds model card, rebased (#2101) [skip ci] Sunny Liu 2024-12-03 00:02:02 -05:00
  • 822c904092 fix(vlm): handle legacy conversation data format and check image in data (#2018) [skip ci] NanoCode012 2024-12-03 12:01:31 +07:00
  • 7d52f4fbf8 Built site for gh-pages Quarto GHA Workflow Runner 2024-12-03 01:16:36 +00:00
  • d5f58b6509 Check torch version for ADOPT optimizer + integrating new ADOPT updates (#2104) Sunny Liu 2024-12-02 20:15:39 -05:00
  • 9f6d0b5587 use pytest sugar and verbose for more info during ci (#2112) [skip ci] Wing Lian 2024-12-02 20:14:40 -05:00
  • 53963c792c make the eval size smaller for the resume test (#2111) [skip ci] Wing Lian 2024-12-02 18:32:29 -05:00
  • 88e0de56c9 Built site for gh-pages Quarto GHA Workflow Runner 2024-12-02 23:28:42 +00:00
  • a4f4a56d77 build causal_conv1d and mamba-ssm into the base image (#2113) Wing Lian 2024-12-02 18:27:46 -05:00
  • 2a7052452c Built site for gh-pages Quarto GHA Workflow Runner 2024-12-02 22:29:55 +00:00
  • ce5bcff750 various tests fixes for flakey tests (#2110) Wing Lian 2024-12-02 17:28:58 -05:00
  • 31ae5b10fe Built site for gh-pages Quarto GHA Workflow Runner 2024-12-02 13:48:07 +00:00
  • b620ed94d0 Add Exact Deduplication Feature to Preprocessing Pipeline (#2072) Oliver Molenschot 2024-12-02 05:47:10 -08:00
  • a5c3737efa Built site for gh-pages Quarto GHA Workflow Runner 2024-11-30 01:39:58 +00:00
  • 5f1d98e8fc add e2e tests for Unsloth qlora and test the builds (#2093) Wing Lian 2024-11-29 20:38:49 -05:00
  • e060877a9c Built site for gh-pages Quarto GHA Workflow Runner 2024-11-30 01:38:42 +00:00
  • 1cf7075d18 support seperate lr for embeddings, similar to loraplus (#1910) [skip ci] Wing Lian 2024-11-29 20:38:20 -05:00
  • f4cabc2351 fix: ds3 and fsdp lmbench eval (#2102) [ski[p ci] NanoCode012 2024-11-30 08:37:49 +07:00
  • 6e0fb4a6b2 add finetome dataset to fixtures, check eval_loss in test (#2106) [skip ci] Wing Lian 2024-11-29 20:37:32 -05:00
  • c39971c659 stuff bursteratom 2024-11-27 10:52:36 -05:00
  • 33a178c788 val config pixtral chat template bursteratom 2024-11-27 10:36:23 -05:00
  • db15605e7e pixral chat template bursteratom 2024-11-27 10:34:19 -05:00
  • 9e112bc8b5 lint bursteratom 2024-11-27 10:33:35 -05:00
  • e038410778 lint bursteratom 2024-11-27 10:24:37 -05:00
  • f4385c3cf4 add special tokens bursteratom 2024-11-27 10:18:45 -05:00
  • d58c772df6 pixtral flash-attn false bursteratom 2024-11-27 10:16:17 -05:00
  • 69265a53b5 stuff bursteratom 2024-11-27 09:53:41 -05:00
  • 724b660d56 move shared pytest conftest to top level tests (#2099) [skip ci] Wing Lian 2024-11-22 15:05:42 -05:00
  • f8acc72dd8 proof of concept for sage attention sageattention Wing Lian 2024-11-22 14:47:19 -05:00
  • 51c9e1a035 .gitignore improvements (#349) [skip ci] Aman Karmani 2024-11-22 08:08:54 -08:00
  • 05ea741e04 Built site for gh-pages Quarto GHA Workflow Runner 2024-11-22 15:10:52 +00:00
  • 45c0825587 updated colab notebook (#2074) Sunny Liu 2024-11-22 10:09:10 -05:00
  • 94fc223f6c actions/create-release is unmaintained, and doesn't create proper release notes (#2098) [skip ci] Wing Lian 2024-11-21 14:32:41 -05:00
  • fb04127bee Built site for gh-pages Quarto GHA Workflow Runner 2024-11-21 18:37:42 +00:00
  • 151abb7a67 fix None-type not iterable error when deepspeed is left blank w/ use_… (#2087) Sunny Liu 2024-11-21 13:36:51 -05:00
  • 93b6c5fc8f Built site for gh-pages Quarto GHA Workflow Runner 2024-11-21 18:25:42 +00:00
  • bf416bdfd0 bump_liger_0.4.2 (#2096) Sunny Liu 2024-11-21 13:24:52 -05:00
  • 6066436d94 Built site for gh-pages Quarto GHA Workflow Runner 2024-11-21 02:29:37 +00:00
  • 838b74d05b Add Ascend NPU support (#1758) Mengqing Cao 2024-11-21 10:28:41 +08:00
  • 7d0a2548cf Built site for gh-pages Quarto GHA Workflow Runner 2024-11-20 19:08:48 +00:00
  • 2e99bb303e fix inference when no chat_template is set, fix unsloth dora check (#2092) Wing Lian 2024-11-20 14:07:54 -05:00
  • 3362409fd4 Built site for gh-pages Quarto GHA Workflow Runner 2024-11-20 19:06:59 +00:00
  • 68a26f1005 Fix duplication of plugin callbacks (#2090) Chirag Jain 2024-11-21 00:36:08 +05:30
  • db51a9e4cb use pep440 instead of semver (#2088) [skip ci] Wing Lian 2024-11-19 15:02:10 -05:00
  • 3732e2f75d Built site for gh-pages Quarto GHA Workflow Runner 2024-11-19 17:45:43 +00:00
  • 8961364bc9 release 0.5.2 (#2086) Wing Lian 2024-11-19 12:44:42 -05:00
  • 18b8f636f1 Built site for gh-pages Quarto GHA Workflow Runner 2024-11-19 17:44:21 +00:00
  • e9c3a2aec0 add missing dunder-init for monkeypatches and add tests for install from sdist (#2085) v0.5.2 Wing Lian 2024-11-19 12:43:30 -05:00
  • 88e1bfcb0d Built site for gh-pages Quarto GHA Workflow Runner 2024-11-19 16:32:46 +00:00
  • 02ca3f93b0 set manifest and fix for source dist (#2084) v0.5.1.post1 Wing Lian 2024-11-19 11:31:56 -05:00
  • 5f6f9186e4 make sure action has permission to create release (#2083) [skip ci] Wing Lian 2024-11-19 10:43:02 -05:00
  • afc0dab0f1 make sure action has permission to create release v0.5.1 Wing Lian 2024-11-19 10:41:19 -05:00
  • 87be7ed128 Built site for gh-pages Quarto GHA Workflow Runner 2024-11-19 15:36:50 +00:00
  • 6679e20f47 release version 0.5.1 (#2082) Wing Lian 2024-11-19 10:35:59 -05:00
  • ec59d4cb83 remove deprecated extra metadata kwarg from pydantic Field (#2081) [skip ci] Wing Lian 2024-11-19 10:30:10 -05:00
  • a77c8a71cf fix brackets on docker ci builds, add option to skip e2e builds [skip e2e] (#2080) [skip ci] Wing Lian 2024-11-19 10:29:31 -05:00
  • 775311f98f add optimizer step to prevent warning in tests (#1502) [skip ci] Wing Lian 2024-11-19 10:19:03 -05:00
  • f007c38e49 Feat: Drop long samples and shuffle rl samples (#2040) [skip ci] NanoCode012 2024-11-19 22:18:24 +07:00
  • adf4276221 Built site for gh-pages Quarto GHA Workflow Runner 2024-11-19 07:24:02 +00:00
  • d9b71edf84 bump transformers for fsdp-grad-accum fix, remove patch (#2079) Wing Lian 2024-11-19 02:23:09 -05:00
  • afb8218c67 fix the monkeypatch zero3-8bit-lora Wing Lian 2024-11-19 02:12:33 -05:00
  • 1ff78d6347 remove temp_dir decorator as we're using fixtures now Wing Lian 2024-11-19 01:28:27 -05:00
  • 613a217142 monkeypatch for zero3 w 8bit lora Wing Lian 2024-11-19 00:45:00 -05:00
  • 127953af4e zero3 can'y use 8bit optimizer Wing Lian 2024-10-31 12:12:25 -04:00
  • 920ea77bdf reduce number of steps Wing Lian 2024-10-30 14:51:10 -04:00
  • ef60e3e851 bi-weekly 8bit lora zero3 check Wing Lian 2024-08-22 12:16:52 -04:00
  • a458b86bc4 Built site for gh-pages Quarto GHA Workflow Runner 2024-11-18 19:58:53 +00:00
  • c07bd2fa65 Readme updates v2 (#2078) Wing Lian 2024-11-18 14:58:03 -05:00
  • 3637e57b9b Built site for gh-pages Quarto GHA Workflow Runner 2024-11-18 19:00:27 +00:00
  • ed079d434a static assets, readme, and badges update v1 (#2077) Wing Lian 2024-11-18 13:59:32 -05:00
  • 8403c67156 don't build bdist (#2076) [skip ci] Wing Lian 2024-11-18 12:36:03 -05:00
  • 9871fa060b optim e2e tests to run a bit faster (#2069) [skip ci] Wing Lian 2024-11-18 12:35:31 -05:00
  • bcb34943e5 Built site for gh-pages Quarto GHA Workflow Runner 2024-11-18 16:54:30 +00:00
  • 70cf79ef52 upgrade autoawq==0.2.7.post2 for transformers fix (#2070) Wing Lian 2024-11-18 11:53:37 -05:00
  • c06b8f0243 increase worker count to 8 for basic pytests (#2075) [skip ci] Wing Lian 2024-11-18 11:52:35 -05:00
  • cb8bfab9cc multipack support for phi moe phi-moe Wing Lian 2024-11-15 22:56:25 -05:00
  • 58729e38bc Built site for gh-pages Quarto GHA Workflow Runner 2024-11-16 01:36:49 +00:00
  • 0c8b1d824a Update get_unpad_data patching for multipack (#2013) Chirag Jain 2024-11-16 07:05:50 +05:30
  • fd70eec577 fix: loading locally downloaded dataset (#2056) [skip ci] NanoCode012 2024-11-16 08:35:26 +07:00
  • 2118dd4e18 Built site for gh-pages Quarto GHA Workflow Runner 2024-11-16 00:12:00 +00:00
  • d42f202046 Fsdp grad accum monkeypatch (#2064) Wing Lian 2024-11-15 19:11:04 -05:00
  • 0dabde1962 support for schedule free and e2e ci smoke test (#2066) [skip ci] Wing Lian 2024-11-15 19:10:14 -05:00
  • 15f1462ccd support passing trust_remote_code to dataset loading (#2050) [skip ci] Wing Lian 2024-11-15 19:09:48 -05:00