feature: better device mapping for large models (#918)

* fix: improved memory handling when model is bigger than existing VRAM * feature: add lora_on_cpu flag to do LoRA loading on CPU (RAM) For big models where the models are taking up the entire GPU VRAM, the LoRA part will fail unless it is loaded on CPU only. * doc: add README * fix: enable progress bars in do_merge_lora() * doc: mention gpu_memory_limit and lora_on_cpu in merge part of README * Update src/axolotl/utils/models.py Co-authored-by: Wing Lian <wing.lian@gmail.com> * fix: remove deletion of removed model_kwargs key * fix: validate that gpu_memory_limit and max_memory are not both set --------- Co-authored-by: Karl-Johan Alm <kalle@gmail.com> Co-authored-by: Wing Lian <wing.lian@gmail.com>
2024-01-05 22:22:21 +09:00
parent 63fb3eb426
commit bdfefaf054
4 changed files with 52 additions and 6 deletions
--- a/src/axolotl/cli/init.py
+++ b/src/axolotl/cli/init.py
@@ -73,7 +73,7 @@ def do_merge_lora(
    safe_serialization = cfg.save_safetensors is True

    LOG.info("running merge of LoRA with base model")
-    model = model.merge_and_unload()
+    model = model.merge_and_unload(progressbar=True)
    model.to(dtype=cfg.torch_dtype)

    if cfg.local_rank == 0:
@@ -81,6 +81,7 @@ def do_merge_lora(
        model.save_pretrained(
            str(Path(cfg.output_dir) / "merged"),
            safe_serialization=safe_serialization,
+            progressbar=True,
        )
        tokenizer.save_pretrained(str(Path(cfg.output_dir) / "merged"))