* Add support for `revision` dataset parameter
* only use revision on hf hub backed datasets
* use revision tied to head
* set download to use revision
* feat: add config to model validator class
* feat: add revision config to RL and tests for it
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
Co-authored-by: NanoCode012 <nano@axolotl.ai>
* wip, lm_eval harness post train
* include latex parser
* add dtype and doc
* add validation when doing bench evals
* automatically add test dataset when doing benches
* Add first version of a Comet integration
* Remove debug prints
* Add test for Comet Configuration transformation to env variables
* Fix last lint warning
* Update Readme for Comet logging documentation
* Update Comet integration to be optional, update code and tests
* Add documentation for Comet configuration
* Add missing check
* support for auto_find_batch_size when packing
* make sure to return data from validation
* make sure to return data from validation
* actually expose multipack_real_batches in the config
* calculate gathered efficiency in sampler
* tweak to fix auto find and use actual sampler len for multipack
* uncomment
* use args for bsz when not available from auto find
* Update supported models for Liger Kernel
Add Mistral LCE, Gemma LCE, Gemma 2 without LCE (softcapping is not yet implemented for Gemma in Liger Kernel LCE forward), Phi3 without LCE
* move import to their appropriate conditions
* Integrate Phi3 LCE support
https://github.com/linkedin/Liger-Kernel/pull/103/
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>
* efficiently save very large llms when using FSDP
* fix parsing and index of sharded chunks
* only save fsdp on main process
* debugging for rename
* save sharded state dict
* remove unused new param
* get state dict directly
* tweak acc merge fsdp to shard the weight files
* sharded_state_dict alongside save_safetensors seems to hang on checkpoint save
* update sklearn versrion, torch compile env vars, don't worry about failure on preprocess load model
* There is already a condition check within the function. This outer one is not necessary
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
* refactor one_cycle lr scheduler so it's reusable in more situations
* fix validation for lr_scheduler
* default to cosine anneal strategy
* one cycle lr exepects cos