* misc fixes for garbage collection and L40S w NCCL P2P * patch bnb fix for triton check * chore: lint * change up import * try patching differently * remove patch for bnb fix for now * more verbose checks and tweak train loss threshold
* refactor README; hardcode links to quarto docs; add additional quarto doc pages * updates * review comments * update --------- Co-authored-by: Dan Saunders <dan@axolotl.ai>
* adding model_cfg to set num_labels * using a num_labels field instead * linting * WIP stepwise prompt tokenizer * this should work? * trainer working? * pushing to runpod * fixing saving * updating conf * updating config, adding docs * adding stepwise supervision docpage * updating tests * adding test for dataset * fixing tests * linting * addressing some comments * adding additional cfg fields support * updating tests, fixing cfg * fixing tests * updating loss * Update test_process_reward_model_smollm2.py * updating loss values and seed * dumb pre-commit