This commit is contained in:
Dan Saunders
2025-01-27 15:43:51 -05:00
parent f866157b74
commit 4d1553e53f
11 changed files with 159 additions and 39 deletions

View File

@@ -0,0 +1,11 @@
# ConstantLengthDataset { #axolotl.ConstantLengthDataset }
```python
ConstantLengthDataset(self, tokenizer, datasets, seq_length=2048)
```
Iterable dataset that returns constant length chunks of tokens from stream of text files.
Args:
tokenizer (Tokenizer): The processor used for processing the data.
dataset (dataset.Dataset): Dataset with text files.
seq_length (int): Length of token sequences to return.