* Add flexible configuration options for chat dataset training
- Introduce roles_to_train parameter to set training labels by role
- Add train_on_eos option to configure training on end-of-sequence tokens
- Implement per-message training configuration in dataset
- Allow fine-grained control over training specific portions of messages
- Add message_field_training and message_field_training_detail settings
- Implement mapping between dataset character offsets and tokenized prompt
- Enhance test suite to cover new functionality
* Fix missing field inits, things weren't working from yaml.
* Add flexible configuration options for chat dataset training
- Introduce roles_to_train parameter to set training labels by role
- Add train_on_eos option to configure training on end-of-sequence tokens
- Implement per-message training configuration in dataset
- Allow fine-grained control over training specific portions of messages
- Add message_field_training and message_field_training_detail settings
- Implement mapping between dataset character offsets and tokenized prompt
- Enhance test suite to cover new functionality
* Fix missing field inits, things weren't working from yaml.
* chore: lint
* Revert test repo back to NousResearch after opening PR to fix the tokenizer_config.json.
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>