* add ds zero3 to multigpu biweekly tests
* fix for upstream api change
* use updated accelerate and fix deepspeed tests
* stringify the Path, and run multigpu tests if the multigpu tests change for a PR
* use correct json rather than yaml
* revert accelerate for deepspeed
* wip add new proposed message structure
* tokenization
* wip
* wip transform builder
* wip make the chat dataset loadable
* wip chatml + llama 3 new chat objects
* chore: lint
* chore: lint
* fix tokenization
* remove dacite dependency since we're using pydantic now
* fix handling when already correctly split in messages
* make sure to remove chat features from tokenized ds
* move chat to be a input transform for messages
* make sure llama3 has the bos token
* remove non-working special token code
* fix messages strat loader
* Add support for `revision` dataset parameter
* only use revision on hf hub backed datasets
* use revision tied to head
* set download to use revision
* feat: add config to model validator class
* feat: add revision config to RL and tests for it
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: NanoCode012 <nano@axolotl.ai>
* wip, lm_eval harness post train
* include latex parser
* add dtype and doc
* add validation when doing bench evals
* automatically add test dataset when doing benches
* Add first version of a Comet integration
* Remove debug prints
* Add test for Comet Configuration transformation to env variables
* Fix last lint warning
* Update Readme for Comet logging documentation
* Update Comet integration to be optional, update code and tests
* Add documentation for Comet configuration
* Add missing check
* support for auto_find_batch_size when packing
* make sure to return data from validation
* make sure to return data from validation
* actually expose multipack_real_batches in the config
* calculate gathered efficiency in sampler
* tweak to fix auto find and use actual sampler len for multipack
* uncomment
* use args for bsz when not available from auto find
* Update supported models for Liger Kernel
Add Mistral LCE, Gemma LCE, Gemma 2 without LCE (softcapping is not yet implemented for Gemma in Liger Kernel LCE forward), Phi3 without LCE
* move import to their appropriate conditions
* Integrate Phi3 LCE support
https://github.com/linkedin/Liger-Kernel/pull/103/
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>