This commit is contained in:
Dan Saunders
2025-01-27 15:43:51 -05:00
parent f866157b74
commit 4d1553e53f
11 changed files with 159 additions and 39 deletions

View File

@@ -0,0 +1,19 @@
# TokenizedPromptDataset { #axolotl.TokenizedPromptDataset }
```python
TokenizedPromptDataset(
self,
prompt_tokenizer,
dataset,
process_count=None,
keep_in_memory=False,
**kwargs,
)
```
Dataset that returns tokenized prompts from a stream of text files.
Args:
prompt_tokenizer (PromptTokenizingStrategy): The prompt tokenizing method for processing the data.
dataset (dataset.Dataset): Dataset with text files.
process_count (int): Number of processes to use for tokenizing.
keep_in_memory (bool): Whether to keep the tokenized dataset in memory.