salman
ac471a697a
updating to fused ( #2293 )
2025-01-30 11:45:56 -05:00
salman
54dd7abfc1
Process reward models ( #2241 )
...
* adding model_cfg to set num_labels
* using a num_labels field instead
* linting
* WIP stepwise prompt tokenizer
* this should work?
* trainer working?
* pushing to runpod
* fixing saving
* updating conf
* updating config, adding docs
* adding stepwise supervision docpage
* updating tests
* adding test for dataset
* fixing tests
* linting
* addressing some comments
* adding additional cfg fields support
* updating tests, fixing cfg
* fixing tests
* updating loss
* Update test_process_reward_model_smollm2.py
* updating loss values and seed
* dumb pre-commit
2025-01-29 00:08:33 -05:00
Sunny Liu
1c14c4a15c
Add hub model id config options to all example yml files ( #2196 ) [skip ci]
...
* added hub model_id in example yml
* add hub model id to example yml
2024-12-17 11:24:30 -05:00
Wing Lian
521e62daf1
remove the bos token from dpo outputs ( #1733 ) [skip ci]
...
* remove the bos token from dpo outputs
* don't forget to fix prompt_input_ids too
* use processing_class instead of tokenizer
* fix for processing class
2024-11-15 19:09:20 -05:00
Wing Lian
5b0b774e38
ensure that the bias is also in the correct dtype ( #1848 ) [skip ci]
...
* ensure that the bias is also in the correct dtype
* add nightly for dpo-qlora-fsdp
2024-08-22 11:45:00 -04:00
Wing Lian
00ac3022a1
add qwen2-72b fsdp example ( #1696 )
2024-06-07 16:38:29 -04:00