* fix pretokenized datasets readme * check if dataset type is not set to handle pretokenized datasets