Built site for gh-pages

This commit is contained in:
Quarto GHA Workflow Runner
2025-01-13 15:46:05 +00:00
parent 34697fc683
commit 1b26b6a2f3
5 changed files with 44 additions and 37 deletions

View File

@@ -629,7 +629,7 @@
"href": "docs/dataset-formats/pretraining.html",
"title": "Pre-training",
"section": "",
"text": "For pretraining, there is no prompt template or roles. The only required field is text:\n\n\ndata.jsonl\n\n{\"text\": \"first row\"}\n{\"text\": \"second row\"}\n...\n\n\n\n\n\n\n\nStreaming is recommended for large datasets\n\n\n\nAxolotl usually loads the entire dataset into memory. This will be challenging for large datasets. Use the following config to enable streaming:\n\n\nconfig.yaml\n\npretraining_dataset: # hf path only\n...",
"text": "For pretraining, there is no prompt template or roles. The only required field is text:\n\n\ndata.jsonl\n\n{\"text\": \"first row\"}\n{\"text\": \"second row\"}\n...\n\n\n\n\n\n\n\nStreaming is recommended for large datasets\n\n\n\nAxolotl usually loads the entire dataset into memory. This will be challenging for large datasets. Use the following config to enable streaming:\n\n\nconfig.yaml\n\npretraining_dataset:\n - name:\n path:\n split:\n text_column: # column in dataset with the data, usually `text`\n type: pretrain\n trust_remote_code:\n skip: # number of rows of data to skip over from the beginning\n...",
"crumbs": [
"Dataset Formats",
"Pre-training"