salman
54dd7abfc1
Process reward models (#2241)
* adding model_cfg to set num_labels
* using a num_labels field instead
* linting
* WIP stepwise prompt tokenizer
* this should work?
* trainer working?
* pushing to runpod
* fixing saving
* updating conf
* updating config, adding docs
* adding stepwise supervision docpage
* updating tests
* adding test for dataset
* fixing tests
* linting
* addressing some comments
* adding additional cfg fields support
* updating tests, fixing cfg
* fixing tests
* updating loss
* Update test_process_reward_model_smollm2.py
* updating loss values and seed
* dumb pre-commit
2025-01-29 00:08:33 -05:00
..
2025-01-29 00:08:33 -05:00
2024-02-01 10:18:42 -05:00
2024-03-21 22:28:36 -07:00
2024-12-10 16:25:25 -05:00
2024-04-04 13:43:40 +09:00
2025-01-29 00:08:33 -05:00
2024-04-16 19:45:46 -04:00
2024-12-10 16:25:25 -05:00
2024-03-21 22:28:36 -07:00
2024-07-11 09:19:29 -04:00
2024-10-02 21:02:48 -04:00
2025-01-24 12:56:28 -05:00
2024-03-21 22:28:36 -07:00
2024-03-21 22:28:36 -07:00
2024-10-02 21:02:48 -04:00
2024-03-21 22:28:36 -07:00
2024-03-21 22:28:36 -07:00
2025-01-29 00:08:33 -05:00
2024-12-09 08:17:27 -05:00
2024-07-19 00:47:07 -04:00
2024-11-20 14:07:54 -05:00