* fix: improved memory handling when model is bigger than existing VRAM
* feature: add lora_on_cpu flag to do LoRA loading on CPU (RAM)
For big models where the models are taking up the entire GPU VRAM, the LoRA part will fail unless it is loaded on CPU only.
* doc: add README
* fix: enable progress bars in do_merge_lora()
* doc: mention gpu_memory_limit and lora_on_cpu in merge part of README
* Update src/axolotl/utils/models.py
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* fix: remove deletion of removed model_kwargs key
* fix: validate that gpu_memory_limit and max_memory are not both set
---------
Co-authored-by: Karl-Johan Alm <kalle@gmail.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* fix: switch to using the HuggingFace Transformers NEFT implementation
* linter
* add support for noisy_embedding_alpha with a warning about it being renamed
* restore pre/posttrain_hooks
* move validation of NEFT noise alpha into validate_config()
* linter
* mixtral multipack
* use mixtral model
* sample yml
* calculate cu_seqlens properly
* use updated flash ettention setting
* attn var checks
* force use of flash attention 2 for packing
* lint
* disable future fix for now
* update support table
* Support device_map sequential (and others). Support max_memory in cfg.
* Update documentation in README accordingly.
* Update README.md
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* Feat: Update to handle wandb env better
* chore: rename wandb_run_id to wandb_name
* feat: add new recommendation and update config
* fix: indent and pop disabled env if project passed
* feat: test env set for wandb and recommendation
* feat: update to use wandb_name and allow id
* chore: add info to readme
* isolate torch from the requirements.txt
* fix typo for removed line ending
* pin transformers and accelerate to latest releases
* try w auto-gptq==0.5.1
* update README to remove manual peft install
* pin xformers to 0.0.22
* bump flash-attn to 2.3.3
* pin flash attn to exact version
* allow overriding of model_config parameters from the YML
* remove old logging, update readme
* move the updating of model config to the load_model_config function
* add warning for deprecated rope_scaling in the root of the YML config
* Update README with some explanations
* revert commit-hook change
* add more explanation about batch size and gradient accum
* not use latex foromat
* decorate
* git hook again
* Attach a link that explains about LoRA hyperparameters
* update table of content
* Explanation about lora_modules_to_save
I'm using the Axolotl script to train models on https://modal.com serverless GPUs. Unfortunately, their environment seems to have some kind of bug where if I try to run `datasets.filter` with too high a `num_proc`, it throws an error and dies.
This PR adds a new configuration option `dataset_processes`, which lets you explicitly set the number of processes used to map/filter the dataset. If not included, this defaults to the current behavior of setting that to `os.cpu_count()`.