salman
54dd7abfc1
Process reward models (#2241)
* adding model_cfg to set num_labels
* using a num_labels field instead
* linting
* WIP stepwise prompt tokenizer
* this should work?
* trainer working?
* pushing to runpod
* fixing saving
* updating conf
* updating config, adding docs
* adding stepwise supervision docpage
* updating tests
* adding test for dataset
* fixing tests
* linting
* addressing some comments
* adding additional cfg fields support
* updating tests, fixing cfg
* fixing tests
* updating loss
* Update test_process_reward_model_smollm2.py
* updating loss values and seed
* dumb pre-commit
2025-01-29 00:08:33 -05:00
..
2024-10-13 12:15:18 -04:00
2024-11-22 15:05:42 -05:00
2024-12-17 16:42:21 -05:00
2024-04-19 01:03:04 -04:00
2024-10-29 10:14:51 +07:00
2024-12-17 16:42:21 -05:00
2024-12-17 16:42:21 -05:00
2024-10-29 10:14:51 +07:00
2024-04-19 01:03:04 -04:00
2025-01-29 00:08:33 -05:00