Pretrain transforms (#1261)

* wip for pretraining/iterable data with arbitrary prompt strategies

* more fixes, wip

* more fixes for custom pretraining

* iterable ds wrapper not needed

* remove extra features

* chore: lint

* update pretraning example yml

* fix order for partials

* fixup for tests
This commit is contained in:
Wing Lian
2024-02-06 00:37:03 -05:00
committed by GitHub
parent 8c2e05ade3
commit c7cf3810bd
5 changed files with 145 additions and 62 deletions

View File

@@ -12,6 +12,7 @@ max_steps: 200
pretraining_dataset:
path: c4
name: en
type: pretrain
dataset_prepared_path:
val_set_size: 0.0
output_dir: ./model-out