* WIP conversion to use pydantic for config validation
* wip, more fields, add capabilities
* wip
* update pydantic validation to match existing tests
* tweak requirements
* setup deprecated paams pydantic model
* more validations
* wrap up rest of the validations
* flesh out the rest of the options from the readme into pydantic
* fix model validators as class methods
remember to return in validator
missing return
add missing relora attributes
fix test for DictDefault change
fix sys template for mistral from fastchat change in PR 2872
fix test for batch size warning
* more missing attributes for cfg
* updates from PR feedback
* fix validation for datasets and pretrain datasets
* fix test for lora check
* make mlflow optional
* fix xformers
don't patch swiglu if xformers not working
fix the check for xformers swiglu
* fix install of xformers with extra index url for docker builds
* fix docker build arg quoting
* Allow load_best_model_at_end when using test_datasets and val_set_size is zero for custom evaluation datasets
* Fixed formatting following failed Lint check
* Add CausalLMBenchEvalCallback for measuring seq2seq performance
* Fix code for pre-commit
* Fix typing and improve logging
* eval_sample_packing must be false with CausalLMBenchEvalCallback
* add mps support
* linter stuff
* CI fixes
* install packaging for various tests
* Update setup.py
* Revert "install packaging for various tests"
This reverts commit 980e7aa44d.
* Revert "CI fixes"
This reverts commit 4609e3b166.
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* wip for pretraining/iterable data with arbitrary prompt strategies
* more fixes, wip
* more fixes for custom pretraining
* iterable ds wrapper not needed
* remove extra features
* chore: lint
* update pretraning example yml
* fix order for partials
* fixup for tests
* support for true batches with multipack
* patch the map dataset fetcher to handle batches with packed indexes
* patch 4d mask creation for sdp attention
* better handling for BetterTransformer
* patch general case for 4d mask
* setup forward patch. WIP
* fix patch file
* support for multipack w/o flash attention for llama
* cleanup
* add warning about bf16 vs fp16 for multipack with sdpa
* bugfixes
* add 4d multipack tests, refactor patches
* update tests and add warnings
* fix e2e file check
* skip sdpa test if not at least torch 2.1.1, update docs
* import deepspeed integration
* monkeypatch peft adapater with deepspeed for resume from checkpoint
* fix patch
* fix patches attempt 2
* make sure to set lora_model_dir
* skip pylint for deepspeed.utils
* pick up upstream fix in transformers
* remove monkeypatch for deepspeed/peft fix
* no need to set the lora_model_dir on resume
* unset load_in_*bit when using quant config
* guard before del
* better handling of load_in* kwargs
* Support for additional_special_tokens
* Support for additional_special_tokens. Adjust whitespace.
* Support for additional_special_tokens. Use correct quotes.
* Support for additional_special_tokens. Safe pop.
* Support for additional_special_tokens. nt.
* Support for additional_special_tokens. cfg.special_tokens may be None.
* add token if not in vocabulary when adding additional_special_tokens
* fix logic for copy/pasta
* bugfix for popping from config and tokenizer reload
* no need to add tokens manually now with previous bugfix
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* Make sure test_dataset are used and treat val_set_size.
* Add test_datasets docs.
* Apply suggestions from code review
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* loftq support for lora
* fix loftq check
* update readme for loftq
* readability cleanup
* use peft main for loftq fixes, remove unnecessary special tokens
* remove unused test from older deprecation
* wip modal for ci
* handle falcon layernorms better
* update
* rebuild the template each time with the pseudo-ARGS
* fix ref
* update tests to use modal
* cleanup ci script
* make sure to install jinja2 also
* kickoff the gh action on gh hosted runners and specify num gpus
* warning if hub model id set but no save
* add warning
* move the warning
* add test
* allow more public methods for tests for now
* fix tests
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>