salman
54dd7abfc1
Process reward models (#2241)
* adding model_cfg to set num_labels
* using a num_labels field instead
* linting
* WIP stepwise prompt tokenizer
* this should work?
* trainer working?
* pushing to runpod
* fixing saving
* updating conf
* updating config, adding docs
* adding stepwise supervision docpage
* updating tests
* adding test for dataset
* fixing tests
* linting
* addressing some comments
* adding additional cfg fields support
* updating tests, fixing cfg
* fixing tests
* updating loss
* Update test_process_reward_model_smollm2.py
* updating loss values and seed
* dumb pre-commit
2025-01-29 00:08:33 -05:00
..
2025-01-13 17:55:29 +00:00
2025-01-23 21:17:57 -05:00
2025-01-28 23:23:44 -05:00
2025-01-23 21:17:57 -05:00
2023-11-06 18:33:01 -05:00
2023-09-15 15:46:54 -04:00
2025-01-13 17:55:29 +00:00
2025-01-13 17:55:29 +00:00
2025-01-13 17:55:29 +00:00
2024-07-27 10:24:11 -04:00
2025-01-24 12:55:20 -05:00
2025-01-13 17:55:29 +00:00
2025-01-13 17:55:29 +00:00
2024-10-25 09:06:56 -04:00
2025-01-13 17:55:29 +00:00
2025-01-13 17:55:29 +00:00
2025-01-13 17:55:29 +00:00
2025-01-13 17:55:29 +00:00
2025-01-13 17:55:29 +00:00
2025-01-13 17:55:29 +00:00
2025-01-13 17:55:29 +00:00
2025-01-29 00:08:33 -05:00
2024-11-15 19:09:20 -05:00
2025-01-29 00:08:33 -05:00
2025-01-28 23:23:44 -05:00