* fix 405b with lower cpu ram requirements
* make sure to use doouble quant and only skip output embeddings
* set model attributes
* more fixes for sharded fsdp loading
* update the base model in example to use pre-quantized nf4-bf16 weights
* upstream fixes for qlora+fsdp
* various batch of fixes
* more tweaks
* fix autoawq requirement for torch flexibility
* simplify conditionals
* multi-node fixes wip
* bump transformers and include 405b qlora+fsdp yaml
* Implementing a basic chat_template strategy for DPO datasets
This mimics the sft chat_template strategy such that users can:
* Specify the messages field
* Specify the per message role and content fields
* speicfy the chosen and rejected fields
* Let the tokenizer construct the raw prompt
* Ensure the chosen and rejected fields don't have any prefix tokens
* Adding additional dpo chat template unittests
* Rename test class
* bump transformers and set roundup_power2_divisions for more VRAM improvements
* support for low bit optimizers from torch ao
* fix check for alternate optimizers and use nous models on hf for llama3
* add missing check for ao_adamw_fp8
* fix check when using custom optimizers w adamw
* Fix eval_sample_packing in llama-3 lora example
* Update examples/llama-3/lora-8b.yml
Co-authored-by: Wing Lian <wing.lian@gmail.com>
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
The current yml code throws an error: ValueError: Please set lora_modules_to_save to [`embed_tokens`, `lm_head`] when using an adapter and changing the special tokens.
I added the required changes to resolve it
The strategy now supports configuring several fields: * The data field holding message arrays * the role and
content fields for each message * role mapping from source to target types
additionally this adds a sample llama3-8b instruct template using the chat template