* Add: SFTPlugin with llmcompressor
* Update: review comments!
* Add:llmcompressor instalable
* pre commit hooks
* Use: warning over warn
* Revert: TODO's
* Update llmcompressor version to latest
* Apply suggestions from @markurtz
Co-authored-by: Mark Kurtz <mark.j.kurtz@gmail.com>
* Address review comments from @markurtz
* Add: llcompressor installable
* Rename: sft.yaml to sparse-finetuning.yaml
* Use: absolute import
* Update model config
* Move: LLMCompressorPlugin into it's own submodule
* Add: `llm_compressor` integration documentation
* Rebase and updates!
* Tests, Style, Updates
* Add: .qmd file
* Address Review Comments:
* deleted redundant docs/llm_compressor.qmd
* incorporated feedback in integration README.md
* added llmcompressor integration to docs/custom_integrations.qmd
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
* Add: line about further optimizations using llmcompressor
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
* Apply patch from @winglian
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
* Fix: Test
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
* additional fixes for docker and saving compressed
* split llmcompressor from vllm checks
* Reset session between tests
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
* move decorator to test method instead of class
* make sure to reset the session after each test
* move import of llmcompressor to reset session inside test
---------
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Co-authored-by: Mark Kurtz <mark.j.kurtz@gmail.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
* misc fixes for garbage collection and L40S w NCCL P2P
* patch bnb fix for triton check
* chore: lint
* change up import
* try patching differently
* remove patch for bnb fix for now
* more verbose checks and tweak train loss threshold
* add helper to verify the correct model output file exists
* more checks using helper
* chore: lint
* fix import and relora model check
* workaround for trl trainer saves
* remove stray print
* support for true batches with multipack
* patch the map dataset fetcher to handle batches with packed indexes
* patch 4d mask creation for sdp attention
* better handling for BetterTransformer
* patch general case for 4d mask
* setup forward patch. WIP
* fix patch file
* support for multipack w/o flash attention for llama
* cleanup
* add warning about bf16 vs fp16 for multipack with sdpa
* bugfixes
* add 4d multipack tests, refactor patches
* update tests and add warnings
* fix e2e file check
* skip sdpa test if not at least torch 2.1.1, update docs
* use tensorboard to see if resume from checkpoint works
* make sure e2e test is either fp16 or bf16
* set max_steps and save limit so we have the checkpoint when testing resuming
* fix test parameters