Files
axolotl/api/TokenizedPromptDataset.qmd
Dan Saunders 4d1553e53f updates
2025-01-27 15:43:51 -05:00

20 lines
609 B
Plaintext

# TokenizedPromptDataset { #axolotl.TokenizedPromptDataset }
```python
TokenizedPromptDataset(
self,
prompt_tokenizer,
dataset,
process_count=None,
keep_in_memory=False,
**kwargs,
)
```
Dataset that returns tokenized prompts from a stream of text files.
Args:
prompt_tokenizer (PromptTokenizingStrategy): The prompt tokenizing method for processing the data.
dataset (dataset.Dataset): Dataset with text files.
process_count (int): Number of processes to use for tokenizing.
keep_in_memory (bool): Whether to keep the tokenized dataset in memory.