* Feat: Auto add to modules_to_save when adding tokens
* fix: swap to error instead of warning
* feat: add check when special_tokens differ and add test
* start at index 0
* add test to check for missing turns
* apply black
* Update test_prompt_tokenizers.py
* Update src/axolotl/monkeypatch/fastchat_conversation_turns.py
Co-authored-by: Motoki Wu <tokestermw@gmail.com>
* fix linting
* apply black
* add more tests for llama/sharegpt
* make logic clearer
---------
Co-authored-by: Motoki Wu <tokestermw@gmail.com>
* fix: switch to using the HuggingFace Transformers NEFT implementation
* linter
* add support for noisy_embedding_alpha with a warning about it being renamed
* restore pre/posttrain_hooks
* move validation of NEFT noise alpha into validate_config()
* linter
* add check for zero3
* freeze parameters
* fixes for deepspeed loading
* fix model parameter check
* unfrozen parameters in example mixtral and logging when unfreezing
* Respect sequence_len in config for `type: llama2_chat`
It was hardcoded to `4096` I am not sure why? This updates it to pull from the config.
cc: @winglian
* Update llama2_chat.py
* apply black formatting
* fix tokenizer
* update test data
* lint fixtures
* mixtral multipack
* use mixtral model
* sample yml
* calculate cu_seqlens properly
* use updated flash ettention setting
* attn var checks
* force use of flash attention 2 for packing
* lint
* disable future fix for now
* update support table
* support for mamba
* more mamba fixes
* use fork for mamba kwargs fix
* grad checkpointing doesn't work
* fix extras for mamaba
* mamba loss fix
* use fp32 and remove verbose logging
* mamba fixes
* fix collator for mamba
* set model_type on training_args
* don't save safetensors for mamba
* update mamba config to disable safetensor checkpooints, install for tests
* no evals for mamba tests
* handle save_pretrained
* handle unused safetensors arg
* feat: add check for quantized model
* chore: refactor and add another check
* Update src/axolotl/utils/models.py
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* Support device_map sequential (and others). Support max_memory in cfg.
* Update documentation in README accordingly.
* Update README.md
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* Feat: Update to handle wandb env better
* chore: rename wandb_run_id to wandb_name
* feat: add new recommendation and update config
* fix: indent and pop disabled env if project passed
* feat: test env set for wandb and recommendation
* feat: update to use wandb_name and allow id
* chore: add info to readme
* Determine FSDP/deepspeed settings on device select.
Without this, the OS env check for accelerate will fail.
* rename and move env setup call
* chore: lint
---------
Co-authored-by: Karl-Johan Alm <kalle@gmail.com>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* add phi modeling from hf
* update for packing and use new modeling class for phi
* update e2e tests for phi to use new model name
* update example phi to also use new phi model name
* use AutoModelForCausalLM for phi lora since sample packing isn't supported
* allow zero len dataset
* better handling and warning of small eval splits
* raise error if eval split is too small
* don't mess with calculating total num steps in distributed context
* fix eval_sample_packing training args logic
* allow overriding of model_config parameters from the YML
* remove old logging, update readme
* move the updating of model config to the load_model_config function
* add warning for deprecated rope_scaling in the root of the YML config
* Update data.py
Change of conversation formatting type should also trigger updating the preprocessed dataset, so it should be part of the signature.
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* various bugfixes
use latest tinyllama release
check if val_set_size is empty first
update sdp and xformers llama patches for updated upstream transformers
fix system prompt when no input
calculate total and total supervised tokens even when not sample packing
* add fix for when eval size is estimated to be too small
* should be len 1 for dataset length
* add catchall kwargs
* test batch sampler w varying batch lens
* wip
* multipack batchsampler wip
* wip
* fix for prepare data loader to get correct # of steps based on gpues
* lint and clean up
* calculate len estimate
* fix total num steps calc
* add options for dataloader_num_workers and dataloader_pin_memory
* remove gitbook
* support prefetch_factor for dataloader optimization
* fix the kwarg
* Update to adapt to sharegpt datasets with "assistant" rather than "gpt" as the machine answers.
* use a strict option for hanedling incorrect turn data
* chore: lint
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>