Commit Graph

  • 2a6d7b3d35 Built site for gh-pages Quarto GHA Workflow Runner 2024-04-08 05:26:18 +00:00
  • 9430b6e868 Remove validate_quantized_dora (#1485) xzuyn 2024-04-08 01:25:23 -04:00
  • 07ef4d4915 Built site for gh-pages Quarto GHA Workflow Runner 2024-04-07 02:56:09 +00:00
  • 934fc851da drop empty token from beginning if tokenizer has no bos_token (in the case of qwen) (#1490) Wing Lian 2024-04-06 19:55:19 -07:00
  • 2ad78ef22d Built site for gh-pages Quarto GHA Workflow Runner 2024-04-06 12:04:57 +00:00
  • bda48f0150 fix: reduce sample_packing warning (#1484) NanoCode012 2024-04-06 21:04:07 +09:00
  • 744f7082f5 fix for fsdp for models that aren't qwen2 or jamba fsdp-fix Wing Lian 2024-04-05 17:02:54 -07:00
  • 0cc41fe329 Built site for gh-pages Quarto GHA Workflow Runner 2024-04-05 03:48:26 +00:00
  • bf4cd67252 feat: validate sample packing requires flash_attention (#1465) NanoCode012 2024-04-05 12:47:32 +09:00
  • 233e0e1d33 Built site for gh-pages Quarto GHA Workflow Runner 2024-04-05 01:21:53 +00:00
  • d87d0e2f5b Built site for gh-pages Quarto GHA Workflow Runner 2024-04-05 01:21:21 +00:00
  • 05b0b7e8ca add support for cohere chat template (#1478) Wing Lian 2024-04-04 18:20:50 -07:00
  • 87ca3f98c6 don't use deepspeed or fsdp when merging loras (#1479) Wing Lian 2024-04-04 18:20:32 -07:00
  • 05f7034288 use deterministic seed for random LISA layers 20240404-lisa-determinism Wing Lian 2024-04-04 18:16:55 -07:00
  • 029315ff94 Built site for gh-pages Quarto GHA Workflow Runner 2024-04-04 23:34:33 +00:00
  • e0fcef403f refactor utils.data module for line count linter (#1476) Wing Lian 2024-04-04 16:33:42 -07:00
  • c2b64e4dcf Feat: update doc (#1475) [skip ci] NanoCode012 2024-04-04 13:43:40 +09:00
  • 1fce4cdb41 Built site for gh-pages Quarto GHA Workflow Runner 2024-04-03 19:06:42 +00:00
  • 5760099bd4 fix toc Hamel Husain 2024-04-03 12:05:49 -07:00
  • 82dd304538 Built site for gh-pages Quarto GHA Workflow Runner 2024-04-02 12:43:11 +00:00
  • 5aa50974ce Pretrain multipack v2 (#1470) Wing Lian 2024-04-02 05:42:16 -07:00
  • 0d8b87270c Built site for gh-pages Quarto GHA Workflow Runner 2024-04-02 08:37:31 +00:00
  • cae608f587 Added pip install ninja to accelerate installation of flash-attn (#1461) James Melvin Ebenezer 2024-04-02 14:06:41 +05:30
  • 021e6ba2bb Built site for gh-pages Quarto GHA Workflow Runner 2024-04-02 03:49:52 +00:00
  • 586bd8d221 fix pretraining_ on odd datasets (#1463) Nick Doiron 2024-04-01 23:48:59 -04:00
  • dfe591435f make lisa training example work on one 24gb gpu lisa Aman Karmani 2024-04-02 03:19:54 +00:00
  • 093b870d50 Built site for gh-pages Quarto GHA Workflow Runner 2024-04-01 15:02:07 +00:00
  • 86b7d22f35 Reorganize Docs (#1468) Hamel Husain 2024-04-01 08:00:52 -07:00
  • 829e8352ea Built site for gh-pages Quarto GHA Workflow Runner 2024-04-01 12:48:33 +00:00
  • 0b103775ad reduce verbosity of the special tokens (#1472) Wing Lian 2024-04-01 05:47:27 -07:00
  • 7de1178650 Built site for gh-pages Quarto GHA Workflow Runner 2024-04-01 12:43:42 +00:00
  • 946b497c3f feat: add deepspeed 3 with cpuoffload (#1466) NanoCode012 2024-04-01 21:42:52 +09:00
  • d0cc59f1e4 Built site for gh-pages Quarto GHA Workflow Runner 2024-04-01 11:55:49 +00:00
  • 0ddfb24fcf LISA (#1469) Wing Lian 2024-04-01 04:54:53 -07:00
  • 5dd9364c00 example config for lisa Aman Karmani 2024-04-01 07:00:59 +00:00
  • 6185cd5227 fix LISA by ensuring params are not frozen during __init__ Aman Karmani 2024-04-01 06:57:28 +00:00
  • b357c93f23 improve lisa callback logging Aman Karmani 2024-04-01 04:54:00 +00:00
  • 21a5094226 fix default and fix attribute traversal for layers Wing Lian 2024-03-31 00:27:04 -04:00
  • 3a9ad7c66e add lisa support Wing Lian 2024-03-30 22:55:15 -04:00
  • 9b3d6cadb4 Built site for gh-pages Quarto GHA Workflow Runner 2024-03-29 20:44:14 +00:00
  • 89134f2143 make sure to install causal_conv1d in docker (#1459) Wing Lian 2024-03-29 16:43:25 -04:00
  • cf30b8be99 Built site for gh-pages Quarto GHA Workflow Runner 2024-03-29 15:05:42 +00:00
  • 6086be85f7 qwen2_moe support w multipack (#1455) Wing Lian 2024-03-29 11:04:53 -04:00
  • 4a92a3b9ee Nightlies fix v4 (#1458) [skip ci] Wing Lian 2024-03-29 11:04:34 -04:00
  • 46a73e3d1a fix yaml parsing for workflow (#1457) [skip ci] Wing Lian 2024-03-29 10:21:08 -04:00
  • da3415bb5a fix how nightly tag is generated (#1456) [skip ci] Wing Lian 2024-03-29 09:29:17 -04:00
  • 8cb127abeb configure nightly docker builds (#1454) [skip ci] Wing Lian 2024-03-29 08:25:45 -04:00
  • fc430b26e6 Built site for gh-pages Quarto GHA Workflow Runner 2024-03-29 06:39:11 +00:00
  • 05b398a072 fix some of the edge cases for Jamba (#1452) Wing Lian 2024-03-29 02:38:02 -04:00
  • 6afe66dd82 Built site for gh-pages Quarto GHA Workflow Runner 2024-03-29 04:20:31 +00:00
  • e634118f90 Support loading datasets saved via save_to_disk (#1432) Keith Stevens 2024-03-29 13:19:36 +09:00
  • c922a3fedc Built site for gh-pages Quarto GHA Workflow Runner 2024-03-29 01:04:31 +00:00
  • 02af0820f7 Jamba (#1451) Wing Lian 2024-03-28 21:03:22 -04:00
  • 12b34e6584 Built site for gh-pages Quarto GHA Workflow Runner 2024-03-27 14:20:11 +00:00
  • 4155e9988f fix layer_replication arg to peft (#1446) Wing Lian 2024-03-27 10:18:56 -04:00
  • c437562a97 Built site for gh-pages Quarto GHA Workflow Runner 2024-03-27 14:17:45 +00:00
  • 25afd35842 support layer replication for peft and fix rslora integration (#1445) Wing Lian 2024-03-27 10:16:47 -04:00
  • 53806a2b90 Built site for gh-pages Quarto GHA Workflow Runner 2024-03-26 20:47:47 +00:00
  • da265dd796 fix for accelerate env var for auto bf16, add new base image and expand torch_cuda_arch_list support (#1413) Wing Lian 2024-03-26 13:46:19 -07:00
  • 93b43bb493 Built site for gh-pages Quarto GHA Workflow Runner 2024-03-26 19:20:35 +00:00
  • e07347b188 Remove seq_len arg in rotary_emb (#1443) WenboPan 2024-03-27 03:19:44 +08:00
  • 6cf409cfd6 Built site for gh-pages Quarto GHA Workflow Runner 2024-03-26 19:19:42 +00:00
  • bcdc9b1601 Fix falcon tokenization step (#1441) [skip ci] Far El 2024-03-26 15:19:34 -04:00
  • c19d060a74 turn sample_packing on for training (#1438) [skip ci] Satpal Singh Rathore 2024-03-27 00:49:04 +05:30
  • 601b77bc9d make sure to capture non-null defaults from config validation (#1415) Wing Lian 2024-03-26 12:18:47 -07:00
  • 88fe47f542 Built site for gh-pages Quarto GHA Workflow Runner 2024-03-25 06:35:46 +00:00
  • ff939d8a64 fix(dataset): normalize tokenizer config and change hash from tokenizer class to tokenizer path (#1298) main-base NanoCode012 2024-03-25 15:34:54 +09:00
  • b349161781 Built site for gh-pages Quarto GHA Workflow Runner 2024-03-25 04:50:15 +00:00
  • 324d59ea0d docs: update link to docs of advance topic in README.md (#1437) Phuc Van Phan 2024-03-25 11:49:27 +07:00
  • 3941ea7615 Built site for gh-pages Quarto GHA Workflow Runner 2024-03-25 03:01:35 +00:00
  • f1ebaa07c6 chore(config): refactor old mistral config (#1435) NanoCode012 2024-03-25 12:00:44 +09:00
  • e5cee6d054 Built site for gh-pages Quarto GHA Workflow Runner 2024-03-22 22:23:50 +00:00
  • 34ba634b8c Fix ORPO multi gpu (#1433) Wing Lian 2024-03-22 15:22:58 -07:00
  • 9457bb0640 Built site for gh-pages Hamel Husain 2024-03-21 22:41:28 -07:00
  • 21558905cd Built site for gh-pages Quarto GHA Workflow Runner 2024-03-22 05:37:48 +00:00
  • 4e69aa48ab Update docs.yml Hamel Husain 2024-03-21 22:36:57 -07:00
  • e9bee26e1b Built site for gh-pages Hamel Husain 2024-03-21 22:31:44 -07:00
  • aeb8188bc3 Initializing gh-pages branch Hamel Husain 2024-03-21 22:31:27 -07:00
  • 629450cecd Bootstrap Hosted Axolotl Docs w/Quarto (#1429) Hamel Husain 2024-03-21 22:28:36 -07:00
  • 2a1589f6f6 strip out hacky qlora-fsdp workarounds now that qlora-fsdp fixes are upstreamed (#1428) Wing Lian 2024-03-21 11:56:13 -04:00
  • 7d55607368 HF / FEAT: Optimize HF tags (#1425) [skip ci] Younes Belkada 2024-03-21 16:55:56 +01:00
  • 7803f0934f fixes for dpo and orpo template loading (#1424) Wing Lian 2024-03-20 11:36:24 -04:00
  • e6b78c1fca override the entire create_optimzier method 4bit-optimizers Wing Lian 2024-03-19 23:19:56 -04:00
  • a236f5eab5 add support for 4bit optimizers Wing Lian 2024-03-19 22:57:40 -04:00
  • dd449c5cd8 support galore once upstreamed into transformers (#1409) Wing Lian 2024-03-19 09:26:35 -04:00
  • 40a88e8c4a Feat: Add sharegpt multirole (#1137) NanoCode012 2024-03-19 20:51:49 +09:00
  • 43bdc5d3de Add a config not to shuffle merged dataset (#1394) [skip ci] Seungduk Kim 2024-03-19 20:51:00 +09:00
  • b1e3e1b25f fix(config): passing gradient_checkpoint_kwargs (#1412) NanoCode012 2024-03-19 12:57:43 +09:00
  • 2ea70ebbd8 ORPO (#1419) Wing Lian 2024-03-18 13:10:00 -04:00
  • 10328b3429 Simplify creating parameters scatter_moe Casper Hansen 2024-03-18 12:32:59 +00:00
  • 5bfc470d57 Stop transformers from using all memory Casper Hansen 2024-03-18 11:47:47 +00:00
  • e8c8ea64b3 Update README.md (#1418) jbl 2024-03-17 20:47:46 -07:00
  • 04168801c9 Simplify conversion + more debug Casper Hansen 2024-03-17 20:21:46 +00:00
  • d43a79b7bf device_map auto Casper 2024-03-17 19:52:56 +01:00
  • 884d81331e Initialize ParallelExperts on device of first expert Casper 2024-03-17 19:51:31 +01:00
  • 2ea75b4160 temporary: inference validation script Casper 2024-03-17 19:48:52 +01:00
  • d485a08393 chore(script): remove redundant setting (#1411) NanoCode012 2024-03-16 21:10:38 +09:00
  • f083aed2c7 Fix(readme): Improve README QuickStart info (#1408) NanoCode012 2024-03-16 21:10:22 +09:00
  • 868c33954d Feat(readme): Add instructions for Google GPU VM instances (#1410) NanoCode012 2024-03-16 21:10:05 +09:00
  • 9c221a6761 code review feedback scatter_moe_eric Eric Hartford 2024-03-15 14:10:22 -07:00