# datasets { #axolotl.datasets } `datasets` Module containing Dataset functionality ## Classes | Name | Description | | --- | --- | | [ConstantLengthDataset](#axolotl.datasets.ConstantLengthDataset) | Iterable dataset that returns constant length chunks of tokens from stream of text files. | | [TokenizedPromptDataset](#axolotl.datasets.TokenizedPromptDataset) | Dataset that returns tokenized prompts from a stream of text files. | ### ConstantLengthDataset { #axolotl.datasets.ConstantLengthDataset } ```python datasets.ConstantLengthDataset(self, tokenizer, datasets, seq_length=2048) ``` Iterable dataset that returns constant length chunks of tokens from stream of text files. Args: tokenizer (Tokenizer): The processor used for processing the data. dataset (dataset.Dataset): Dataset with text files. seq_length (int): Length of token sequences to return. ### TokenizedPromptDataset { #axolotl.datasets.TokenizedPromptDataset } ```python datasets.TokenizedPromptDataset( self, prompt_tokenizer, dataset, process_count=None, keep_in_memory=False, **kwargs, ) ``` Dataset that returns tokenized prompts from a stream of text files. Args: prompt_tokenizer (PromptTokenizingStrategy): The prompt tokenizing method for processing the data. dataset (dataset.Dataset): Dataset with text files. process_count (int): Number of processes to use for tokenizing. keep_in_memory (bool): Whether to keep the tokenized dataset in memory.