salman
54dd7abfc1
Process reward models (#2241)
* adding model_cfg to set num_labels
* using a num_labels field instead
* linting
* WIP stepwise prompt tokenizer
* this should work?
* trainer working?
* pushing to runpod
* fixing saving
* updating conf
* updating config, adding docs
* adding stepwise supervision docpage
* updating tests
* adding test for dataset
* fixing tests
* linting
* addressing some comments
* adding additional cfg fields support
* updating tests, fixing cfg
* fixing tests
* updating loss
* Update test_process_reward_model_smollm2.py
* updating loss values and seed
* dumb pre-commit
2025-01-29 00:08:33 -05:00
..
2025-01-13 17:55:29 +00:00
2024-12-02 17:28:58 -05:00
2025-01-29 00:08:33 -05:00
2023-12-12 09:39:22 -08:00
2025-01-09 17:31:43 -05:00
2024-02-01 10:18:42 -05:00
2025-01-23 21:17:57 -05:00
2025-01-29 00:08:33 -05:00
2024-10-25 09:06:56 -04:00
2025-01-09 21:01:59 +00:00
2024-12-02 08:47:10 -05:00
2024-12-05 22:11:48 -05:00
2025-01-14 22:07:55 -05:00
2024-02-26 12:24:14 -05:00
2025-01-14 22:07:55 -05:00
2023-08-12 15:14:56 -04:00
2024-03-14 11:05:42 -04:00
2025-01-13 10:44:11 -05:00
2024-11-08 13:45:49 -05:00
2024-05-23 17:32:14 -04:00
2023-08-12 15:14:56 -04:00
2025-01-24 12:55:20 -05:00
2024-12-02 17:28:58 -05:00
2025-01-09 17:31:43 -05:00
2024-08-22 11:46:57 -04:00
2024-11-19 10:19:03 -05:00
2024-01-31 18:13:13 -05:00
2024-12-02 20:15:39 -05:00