20 lines
609 B
Plaintext
20 lines
609 B
Plaintext
# TokenizedPromptDataset { #axolotl.TokenizedPromptDataset }
|
|
|
|
```python
|
|
TokenizedPromptDataset(
|
|
self,
|
|
prompt_tokenizer,
|
|
dataset,
|
|
process_count=None,
|
|
keep_in_memory=False,
|
|
**kwargs,
|
|
)
|
|
```
|
|
|
|
Dataset that returns tokenized prompts from a stream of text files.
|
|
Args:
|
|
prompt_tokenizer (PromptTokenizingStrategy): The prompt tokenizing method for processing the data.
|
|
dataset (dataset.Dataset): Dataset with text files.
|
|
process_count (int): Number of processes to use for tokenizing.
|
|
keep_in_memory (bool): Whether to keep the tokenized dataset in memory.
|