axolotl

Files

salman 54dd7abfc1 Process reward models (#2241 )

* adding model_cfg to set num_labels

* using a num_labels field instead

* linting

* WIP stepwise prompt tokenizer

* this should work?

* trainer working?

* pushing to runpod

* fixing saving

* updating conf

* updating config, adding docs

* adding stepwise supervision docpage

* updating tests

* adding test for dataset

* fixing tests

* linting

* addressing some comments

* adding additional cfg fields support

* updating tests, fixing cfg

* fixing tests

* updating loss

* Update test_process_reward_model_smollm2.py

* updating loss values and seed

* dumb pre-commit

2025-01-29 00:08:33 -05:00

conversation.qmd

fix: use apply_chat_template to find turn boundaries and allow tool_calling field (#2179 ) [skip ci]

2024-12-17 16:42:21 -05:00

index.qmd

Reorganize Docs (#1468 )

2024-04-01 08:00:52 -07:00

inst_tune.qmd

Feat: update doc (#1475 ) [skip ci]

2024-04-04 13:43:40 +09:00

pretraining.qmd

skip over rows in pretraining dataset (#2223 )