salman
54dd7abfc1
Process reward models (#2241)
* adding model_cfg to set num_labels
* using a num_labels field instead
* linting
* WIP stepwise prompt tokenizer
* this should work?
* trainer working?
* pushing to runpod
* fixing saving
* updating conf
* updating config, adding docs
* adding stepwise supervision docpage
* updating tests
* adding test for dataset
* fixing tests
* linting
* addressing some comments
* adding additional cfg fields support
* updating tests, fixing cfg
* fixing tests
* updating loss
* Update test_process_reward_model_smollm2.py
* updating loss values and seed
* dumb pre-commit
2025-01-29 00:08:33 -05:00
..
2024-12-17 11:24:30 -05:00
2024-12-17 11:24:30 -05:00
2024-12-10 16:25:25 -05:00
2024-12-17 11:24:30 -05:00
2024-12-17 11:24:30 -05:00
2024-12-17 11:24:30 -05:00
2024-12-17 11:24:30 -05:00
2025-01-29 00:08:33 -05:00
2024-12-17 11:24:30 -05:00
2024-12-17 11:24:30 -05:00
2024-12-17 11:24:30 -05:00
2024-12-17 11:24:30 -05:00
2024-12-17 11:24:30 -05:00
2024-12-17 11:24:30 -05:00
2024-12-17 11:24:30 -05:00
2024-12-17 11:24:30 -05:00
2024-12-17 11:24:30 -05:00
2024-12-17 11:24:30 -05:00
2024-12-17 11:24:30 -05:00
2024-12-17 11:24:30 -05:00
2024-12-17 11:24:30 -05:00
2024-12-17 11:24:30 -05:00
2025-01-29 00:08:33 -05:00
2024-12-17 11:24:30 -05:00
2024-12-17 11:24:30 -05:00
2024-12-17 11:24:30 -05:00
2024-12-17 11:24:30 -05:00
2024-12-17 11:24:30 -05:00
2024-12-17 11:24:30 -05:00
2024-12-17 11:24:30 -05:00