Built site for gh-pages
This commit is contained in:
@@ -629,7 +629,7 @@
|
||||
"href": "docs/dataset-formats/pretraining.html",
|
||||
"title": "Pre-training",
|
||||
"section": "",
|
||||
"text": "For pretraining, there is no prompt template or roles. The only required field is text:\n\n\ndata.jsonl\n\n{\"text\": \"first row\"}\n{\"text\": \"second row\"}\n...\n\n\n\n\n\n\n\nStreaming is recommended for large datasets\n\n\n\nAxolotl usually loads the entire dataset into memory. This will be challenging for large datasets. Use the following config to enable streaming:\n\n\nconfig.yaml\n\npretraining_dataset: # hf path only\n...",
|
||||
"text": "For pretraining, there is no prompt template or roles. The only required field is text:\n\n\ndata.jsonl\n\n{\"text\": \"first row\"}\n{\"text\": \"second row\"}\n...\n\n\n\n\n\n\n\nStreaming is recommended for large datasets\n\n\n\nAxolotl usually loads the entire dataset into memory. This will be challenging for large datasets. Use the following config to enable streaming:\n\n\nconfig.yaml\n\npretraining_dataset:\n - name:\n path:\n split:\n text_column: # column in dataset with the data, usually `text`\n type: pretrain\n trust_remote_code:\n skip: # number of rows of data to skip over from the beginning\n...",
|
||||
"crumbs": [
|
||||
"Dataset Formats",
|
||||
"Pre-training"
|
||||
|
||||
Reference in New Issue
Block a user