diff --git a/.nojekyll b/.nojekyll index 1c338968d..b5db7fcae 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -cccf8acd \ No newline at end of file +30fb2d4a \ No newline at end of file diff --git a/docs/api/common.datasets.html b/docs/api/common.datasets.html index 4697d3b49..5b6caf525 100644 --- a/docs/api/common.datasets.html +++ b/docs/api/common.datasets.html @@ -539,7 +539,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true}); sample_dataset -Randomly sample num_samples samples from dataset. +Randomly sample num_samples samples with replacement from dataset. @@ -547,15 +547,15 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});

load_datasets

common.datasets.load_datasets(cfg, cli_args=None, debug=False)

Loads one or more training or evaluation datasets, calling -axolotl.utils.data.prepare_dataset. Optionally, logs out debug information.

+axolotl.utils.data.prepare_datasets. Optionally, logs out debug information.

Parameters

----++++ @@ -581,7 +581,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true}); - + @@ -591,9 +591,9 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});

Returns

debug boolWhether to print out tokenization of sampleWhether to print out tokenization of sample. This is duplicated in cfg and cli_args, but is kept due to use in our Colab notebooks. False
---+++ @@ -606,12 +606,7 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true}); - - - - - - +
TrainDatasetMetaDataclass with fields for training and evaluation datasets and the computed
TrainDatasetMetatotal_num_steps.Dataclass with fields for training and evaluation datasets and the computed total_num_steps.
@@ -621,15 +616,15 @@ gtag('config', 'G-9KYCVJBNMQ', { 'anonymize_ip': true});

load_preference_datasets

common.datasets.load_preference_datasets(cfg, cli_args)

Loads one or more training or evaluation datasets for RL training using paired -preference data, calling axolotl.utils.data.rl.load_prepare_preference_datasets. +preference data, calling axolotl.utils.data.rl.prepare_preference_datasets. Optionally, logs out debug information.

Parameters

--++ @@ -649,7 +644,7 @@ Optionally, logs out debug information.

- + @@ -689,63 +684,12 @@ Optionally, logs out debug information.

sample_dataset

common.datasets.sample_dataset(dataset, num_samples)
-

Randomly sample num_samples samples from dataset.

-
-

Parameters

-
cli_argsUnion[PreprocessCliArgs, TrainerCliArgs]PreprocessCliArgs | TrainerCliArgs Command-specific CLI arguments. required
- - - - - - - - - - - - - - - - - - - - - - -
NameTypeDescriptionDefault
datasetDatasetDataset.required
num_samplesintNumber of samples to return.required
-
-
-

Returns

- ----- - - - - - - - - - - - - - - -
NameTypeDescription
DatasetRandom sample (with replacement) of examples in dataset.
+

Randomly sample num_samples samples with replacement from dataset.

-