Files
axolotl/api/ConstantLengthDataset.qmd
Dan Saunders 4d1553e53f updates
2025-01-27 15:43:51 -05:00

12 lines
439 B
Plaintext

# ConstantLengthDataset { #axolotl.ConstantLengthDataset }
```python
ConstantLengthDataset(self, tokenizer, datasets, seq_length=2048)
```
Iterable dataset that returns constant length chunks of tokens from stream of text files.
Args:
tokenizer (Tokenizer): The processor used for processing the data.
dataset (dataset.Dataset): Dataset with text files.
seq_length (int): Length of token sequences to return.