salman
54dd7abfc1
Process reward models (#2241)
* adding model_cfg to set num_labels
* using a num_labels field instead
* linting
* WIP stepwise prompt tokenizer
* this should work?
* trainer working?
* pushing to runpod
* fixing saving
* updating conf
* updating config, adding docs
* adding stepwise supervision docpage
* updating tests
* adding test for dataset
* fixing tests
* linting
* addressing some comments
* adding additional cfg fields support
* updating tests, fixing cfg
* fixing tests
* updating loss
* Update test_process_reward_model_smollm2.py
* updating loss values and seed
* dumb pre-commit
2025-01-29 00:08:33 -05:00
..
2024-12-17 16:42:21 -05:00
2024-04-01 08:00:52 -07:00
2024-04-04 13:43:40 +09:00
2025-01-13 10:44:45 -05:00
2025-01-29 00:08:33 -05:00
2024-04-01 08:00:52 -07:00
2024-09-05 23:11:31 +09:00