From fc1f985296dc1f506f151ac1533d6cff16918d36 Mon Sep 17 00:00:00 2001 From: Dan Saunders Date: Fri, 14 Mar 2025 14:11:23 -0400 Subject: [PATCH] Update docs/.gitignore to exclude auto-generated API documentation files --- docs/.gitignore | 3 + docs/api/datasets.qmd | 44 ----------- docs/api/prompt_strategies.base.qmd | 5 -- docs/api/prompt_strategies.chat_template.qmd | 80 -------------------- docs/api/utils.tokenization.qmd | 38 ---------- 5 files changed, 3 insertions(+), 167 deletions(-) delete mode 100644 docs/api/datasets.qmd delete mode 100644 docs/api/prompt_strategies.base.qmd delete mode 100644 docs/api/prompt_strategies.chat_template.qmd delete mode 100644 docs/api/utils.tokenization.qmd diff --git a/docs/.gitignore b/docs/.gitignore index 4c23a061f..92c0aa9a3 100644 --- a/docs/.gitignore +++ b/docs/.gitignore @@ -1,2 +1,5 @@ /.quarto/ _site/ +/api/*.qmd +/api/*.html +site_libs/ diff --git a/docs/api/datasets.qmd b/docs/api/datasets.qmd deleted file mode 100644 index 2fea393a1..000000000 --- a/docs/api/datasets.qmd +++ /dev/null @@ -1,44 +0,0 @@ -# datasets { #axolotl.datasets } - -`datasets` - -Module containing Dataset functionality - -## Classes - -| Name | Description | -| --- | --- | -| [ConstantLengthDataset](#axolotl.datasets.ConstantLengthDataset) | Iterable dataset that returns constant length chunks of tokens from stream of text files. | -| [TokenizedPromptDataset](#axolotl.datasets.TokenizedPromptDataset) | Dataset that returns tokenized prompts from a stream of text files. | - -### ConstantLengthDataset { #axolotl.datasets.ConstantLengthDataset } - -```python -datasets.ConstantLengthDataset(self, tokenizer, datasets, seq_length=2048) -``` - -Iterable dataset that returns constant length chunks of tokens from stream of text files. - Args: - tokenizer (Tokenizer): The processor used for processing the data. - dataset (dataset.Dataset): Dataset with text files. - seq_length (int): Length of token sequences to return. - -### TokenizedPromptDataset { #axolotl.datasets.TokenizedPromptDataset } - -```python -datasets.TokenizedPromptDataset( - self, - prompt_tokenizer, - dataset, - process_count=None, - keep_in_memory=False, - **kwargs, -) -``` - -Dataset that returns tokenized prompts from a stream of text files. - Args: - prompt_tokenizer (PromptTokenizingStrategy): The prompt tokenizing method for processing the data. - dataset (dataset.Dataset): Dataset with text files. - process_count (int): Number of processes to use for tokenizing. - keep_in_memory (bool): Whether to keep the tokenized dataset in memory. diff --git a/docs/api/prompt_strategies.base.qmd b/docs/api/prompt_strategies.base.qmd deleted file mode 100644 index dff9733b4..000000000 --- a/docs/api/prompt_strategies.base.qmd +++ /dev/null @@ -1,5 +0,0 @@ -# prompt_strategies.base { #axolotl.prompt_strategies.base } - -`prompt_strategies.base` - -module for base dataset transform strategies diff --git a/docs/api/prompt_strategies.chat_template.qmd b/docs/api/prompt_strategies.chat_template.qmd deleted file mode 100644 index 212e0a1c9..000000000 --- a/docs/api/prompt_strategies.chat_template.qmd +++ /dev/null @@ -1,80 +0,0 @@ -# prompt_strategies.chat_template { #axolotl.prompt_strategies.chat_template } - -`prompt_strategies.chat_template` - -HF Chat Templates prompt strategy - -## Classes - -| Name | Description | -| --- | --- | -| [ChatTemplatePrompter](#axolotl.prompt_strategies.chat_template.ChatTemplatePrompter) | Prompter for HF chat templates | -| [ChatTemplateStrategy](#axolotl.prompt_strategies.chat_template.ChatTemplateStrategy) | Tokenizing strategy for instruction-based prompts. | -| [StrategyLoader](#axolotl.prompt_strategies.chat_template.StrategyLoader) | Load chat template strategy based on configuration. | - -### ChatTemplatePrompter { #axolotl.prompt_strategies.chat_template.ChatTemplatePrompter } - -```python -prompt_strategies.chat_template.ChatTemplatePrompter( - self, - tokenizer, - chat_template, - processor=None, - max_length=2048, - message_property_mappings=None, - message_field_training=None, - message_field_training_detail=None, - field_messages='messages', - roles=None, - drop_system_message=False, -) -``` - -Prompter for HF chat templates - -### ChatTemplateStrategy { #axolotl.prompt_strategies.chat_template.ChatTemplateStrategy } - -```python -prompt_strategies.chat_template.ChatTemplateStrategy( - self, - prompter, - tokenizer, - train_on_inputs, - sequence_len, - roles_to_train=None, - train_on_eos=None, -) -``` - -Tokenizing strategy for instruction-based prompts. - -#### Methods - -| Name | Description | -| --- | --- | -| [find_turn](#axolotl.prompt_strategies.chat_template.ChatTemplateStrategy.find_turn) | Locate the starting and ending indices of the specified turn in a conversation. | -| [tokenize_prompt](#axolotl.prompt_strategies.chat_template.ChatTemplateStrategy.tokenize_prompt) | Public method that can handle either a single prompt or a batch of prompts. | - -##### find_turn { #axolotl.prompt_strategies.chat_template.ChatTemplateStrategy.find_turn } - -```python -prompt_strategies.chat_template.ChatTemplateStrategy.find_turn(turns, turn_idx) -``` - -Locate the starting and ending indices of the specified turn in a conversation. - -##### tokenize_prompt { #axolotl.prompt_strategies.chat_template.ChatTemplateStrategy.tokenize_prompt } - -```python -prompt_strategies.chat_template.ChatTemplateStrategy.tokenize_prompt(prompt) -``` - -Public method that can handle either a single prompt or a batch of prompts. - -### StrategyLoader { #axolotl.prompt_strategies.chat_template.StrategyLoader } - -```python -prompt_strategies.chat_template.StrategyLoader() -``` - -Load chat template strategy based on configuration. diff --git a/docs/api/utils.tokenization.qmd b/docs/api/utils.tokenization.qmd deleted file mode 100644 index d3ac1f0d2..000000000 --- a/docs/api/utils.tokenization.qmd +++ /dev/null @@ -1,38 +0,0 @@ -# utils.tokenization { #axolotl.utils.tokenization } - -`utils.tokenization` - -Module for tokenization utilities - -## Functions - -| Name | Description | -| --- | --- | -| [color_token_for_rl_debug](#axolotl.utils.tokenization.color_token_for_rl_debug) | Helper function to color tokens based on their type. | -| [process_tokens_for_rl_debug](#axolotl.utils.tokenization.process_tokens_for_rl_debug) | Helper function to process and color tokens. | - -### color_token_for_rl_debug { #axolotl.utils.tokenization.color_token_for_rl_debug } - -```python -utils.tokenization.color_token_for_rl_debug( - decoded_token, - encoded_token, - color, - text_only, -) -``` - -Helper function to color tokens based on their type. - -### process_tokens_for_rl_debug { #axolotl.utils.tokenization.process_tokens_for_rl_debug } - -```python -utils.tokenization.process_tokens_for_rl_debug( - tokens, - color, - tokenizer, - text_only, -) -``` - -Helper function to process and color tokens.