use v2 branch

try with deepspeed import
use commit sha for previous release dev
2025-03-10 19:46:19 -04:00 · 2025-03-10 19:39:55 -04:00 · 2025-03-10 18:41:15 -04:00 · 2025-03-10 16:53:57 -04:00 · 2025-03-10 16:48:33 -04:00 · 2025-03-10 16:36:33 -04:00
20 changed files with 339 additions and 40 deletions
--- a/docker/Dockerfile-cloud
+++ b/docker/Dockerfile-cloud
@@ -14,7 +14,7 @@ COPY scripts/motd /etc/motd
 RUN pip install jupyterlab notebook ipywidgets && \
    jupyter lab clean
-RUN apt install --yes --no-install-recommends openssh-server tmux && \
+RUN apt install --yes --no-install-recommends openssh-server tmux iproute2 nvtop && \
    mkdir -p ~/.ssh && \
    chmod 700 ~/.ssh && \
    printf "\n[[ -z \"\$TMUX\"  ]] && { tmux attach-session -t ssh_tmux || tmux new-session -s ssh_tmux; exit; }\n" >> ~/.bashrc && \
--- a/docs/config.qmd
+++ b/docs/config.qmd
@@ -154,8 +154,6 @@ datasets:
      content: value
      # ...
    message_property_mappings:
    # Optional[Dict[str, List]]. Roles mapping in the messages. The default is:
    roles:
      user: ["human", "user"]
@@ -556,6 +554,13 @@ special_tokens:
 # Add extra tokens.
 tokens:
 # Mapping token_id to new_token_string to override reserved added_tokens in the tokenizer.
 # Only works for tokens that are not part of the base vocab (aka are added_tokens).
 # Can be checked if they exist in tokenizer.json added_tokens.
 added_tokens_overrides:  # Dict[int, str]
 #  128041: "<|im_start|>"
 #  128042: "<|im_end|>"
 # FSDP
 fsdp:
 fsdp_config:
--- a/docs/dataset-formats/conversation.qmd
+++ b/docs/dataset-formats/conversation.qmd
@@ -74,6 +74,10 @@ datasets:
    train_on_eos:
 ```
 ::: {.callout-tip}
 If you receive an error like "`chat_template` choice is `tokenizer_default` but tokenizer's `chat_template` is null.", it means the tokenizer does not have a default `chat_template`. Follow the examples below instead to set a custom `chat_template`.
 :::
 2. Using the `gemma` chat template to override the tokenizer_config.json's chat template on OpenAI messages format, training on all assistant messages.
 ```yaml
--- a/docs/faq.qmd
+++ b/docs/faq.qmd
@@ -52,3 +52,7 @@ description: Frequently asked questions
 **Q: The EOS/EOT token is incorrectly being masked or not being masked.**
 > A: This is because of the mismatch between `tokenizer.eos_token` and EOS/EOT token in template. Please make sure to set `eos_token` under `special_tokens` to the same EOS/EOT token as in template.
 **Q: "`chat_template` choice is `tokenizer_default` but tokenizer's `chat_template` is null. Please add a `chat_template` in tokenizer config"**
 > A: This is because the tokenizer does not have a chat template. Please add a chat template in the tokenizer config. See [chat_template](dataset-formats/conversation.qmd#chat-template) for more details.
--- a/docs/reward_modelling.qmd
+++ b/docs/reward_modelling.qmd
@@ -28,6 +28,17 @@ val_set_size: 0.1
 eval_steps: 100
 ```
 Bradley-Terry chat templates expect single-turn conversations in the following format:
 ```json
 {
    "system": "...", // optional
    "input": "...",
    "chosen": "...",
    "rejected": "..."
 }
 ```
 ### Process Reward Models (PRM)
 Process reward models are trained using data which contains preference annotations for each step in a series of interactions. Typically, PRMs are trained to provide reward signals over each step of a reasoning trace and are used for downstream reinforcement learning.
@@ -45,3 +56,5 @@ datasets:
 val_set_size: 0.1
 eval_steps: 100
 ```
 Please see [stepwise_supervised](dataset-formats/stepwise_supervised.qmd) for more details on the dataset format.
--- a/docs/rlhf.qmd
+++ b/docs/rlhf.qmd
@@ -3,6 +3,7 @@ title: "RLHF (Beta)"
 description: "Reinforcement Learning from Human Feedback is a method whereby a language model is optimized from data using human feedback."
 back-to-top-navigation: true
 toc: true
 toc-expand: 2
 toc-depth: 4
 ---
@@ -528,6 +529,7 @@ trl:
    vllm_gpu_memory_utilization: 0.15
    num_generations: 4
    reward_funcs: ["rewards.rand_reward_func"]    # format: '{file_name}.{fn_name}'
    reward_weights: [1.0]
 datasets:
  - path: openai/gsm8k
    name: main
@@ -536,6 +538,8 @@ datasets:
 To see other examples of custom reward functions, please see [TRL GRPO Docs](https://github.com/huggingface/trl/blob/main/docs/source/grpo_trainer.md#using-a-custom-reward-function).
 To see description of the configs, please see [TRLConfig](https://github.com/axolotl-ai-cloud/axolotl/blob/main/src/axolotl/utils/config/models/input/v0_4_1/trl.py).
 ### Using local dataset files
 ```yaml
--- a/requirements.txt
+++ b/requirements.txt
@@ -62,5 +62,5 @@ antlr4-python3-runtime==4.13.2
 torchao==0.7.0
 schedulefree==1.3.0
-axolotl-contribs-lgpl==0.0.3
+axolotl-contribs-lgpl @ git+https://github.com/axolotl-ai-cloud/axolotl-contribs-lgpl.git@import-issues-v2
 axolotl-contribs-mit==0.0.3
--- a/scripts/cutcrossentropy_install.py
+++ b/scripts/cutcrossentropy_install.py
@@ -24,5 +24,5 @@ if cce_spec:
 print(
    UNINSTALL_PREFIX
-    + 'pip install "cut-cross-entropy @ git+https://github.com/apple/ml-cross-entropy.git@9c297c905f55b73594b5d650722d1e78183b77bd"'
+    + 'pip install "cut-cross-entropy[transformers] @ git+https://github.com/apple/ml-cross-entropy.git@24fbe4b5dab9a6c250a014573613c1890190536c"'
 )
--- a/src/axolotl/cli/cloud/modal_.py
+++ b/src/axolotl/cli/cloud/modal_.py
@@ -113,7 +113,7 @@ class ModalCloud(Cloud):
                [
                    # Random id for cache busting of branch commits
                    f"RUN echo '{str(randint(0, 1000000))}'",  # nosec B311
-                    f"RUN cd /workspace/axolotl && git fetch && git checkout {self.config.branch}",
+                    f"RUN cd /workspace/axolotl && git fetch && git checkout {self.config.branch} && git pull",
                ]
            )
@@ -270,6 +270,7 @@ def _preprocess(config_yaml: str, volumes=None):
 def _train(config_yaml: str, accelerate: bool = True, volumes=None, **kwargs):
    Path("/workspace/mounts").mkdir(parents=True, exist_ok=True)
    with open("/workspace/mounts/config.yaml", "w", encoding="utf-8") as f_out:
        f_out.write(config_yaml)
    run_folder = "/workspace/mounts"
@@ -288,6 +289,7 @@ def _train(config_yaml: str, accelerate: bool = True, volumes=None, **kwargs):
 def _lm_eval(config_yaml: str, volumes=None):
    Path("/workspace/mounts").mkdir(parents=True, exist_ok=True)
    with open("/workspace/mounts/config.yaml", "w", encoding="utf-8") as f_out:
        f_out.write(config_yaml)
    run_folder = "/workspace/mounts"
--- a/src/axolotl/integrations/cut_cross_entropy/README.md
+++ b/src/axolotl/integrations/cut_cross_entropy/README.md
@@ -17,7 +17,7 @@ Run the following command to install `cut_cross_entropy[transformers]` if you do
 python scripts/cutcrossentropy_install.py | sh
 # if you are not in dev environment
-pip3 uninstall -y cut-cross-entropy && pip3 install "cut-cross-entropy @ git+https://github.com/apple/ml-cross-entropy.git@9c297c905f55b73594b5d650722d1e78183b77bd"'
+pip3 uninstall -y cut-cross-entropy && pip3 install "cut-cross-entropy[transformers] @ git+https://github.com/apple/ml-cross-entropy.git@24fbe4b5dab9a6c250a014573613c1890190536c"
 ```
 ## Usage
--- a/src/axolotl/integrations/cut_cross_entropy/init.py
+++ b/src/axolotl/integrations/cut_cross_entropy/init.py
@@ -33,7 +33,7 @@ LOG = logging.getLogger("axolotl.integrations.cut_cross_entropy")
 _CCE_INSTALL_MESSAGE = (
    "Please install cut_cross_entropy with transformers support using "
-    '`pip install "cut-cross-entropy[transformers]==24.11.4"`'
+    '`pip install "cut-cross-entropy[transformers] @ git+https://github.com/apple/ml-cross-entropy.git@24fbe4b5dab9a6c250a014573613c1890190536c"`'
 )
--- a/src/axolotl/integrations/spectrum/args.py
+++ b/src/axolotl/integrations/spectrum/args.py
@@ -17,7 +17,7 @@ Module for handling Spectrum input arguments.
 """
 from typing import Optional
-from pydantic import BaseModel
+from pydantic import BaseModel, model_validator
 class SpectrumArgs(BaseModel):
@@ -27,3 +27,20 @@ class SpectrumArgs(BaseModel):
    spectrum_top_fraction: Optional[float] = 0.5
    spectrum_model_name: Optional[str] = None
    @model_validator(mode="before")
    @classmethod
    def check_fsdp_use_orig_params(cls, data):
        if (
            data.get("fsdp")
            and data.get("fsdp_config")
            and not data["fsdp_config"].get("use_orig_params")
            and data.get("plugins")
            and any("SpectrumPlugin" in plugin for plugin in data["plugins"])
        ):
            # would otherwise raise
            # ValueError: Must flatten tensors with uniform `requires_grad` when `use_orig_params=False`
            raise ValueError(
                "FSDP + SpectrumPlugin cannot be used together when `use_orig_params=False` is set"
            )
        return data
--- a/src/axolotl/utils/config/models/input/v0_4_1/init.py
+++ b/src/axolotl/utils/config/models/input/v0_4_1/init.py
@@ -72,7 +72,6 @@ class CustomSupportedOptimizers(str, Enum):
    ao_adamw_8bit = "ao_adamw_8bit"  # pylint: disable=invalid-name
    ao_adamw_fp8 = "ao_adamw_fp8"  # pylint: disable=invalid-name
    adopt_adamw = "adopt_adamw"  # pylint: disable=invalid-name
    lion_pytorch = "lion_pytorch"  # pylint: disable=invalid-name
    muon = "muon"  # pylint: disable=invalid-name
@@ -780,9 +779,9 @@ class AxolotlInputConfig(
    # torch_dtype: Optional[torch.dtype]
-    gradient_checkpointing: Optional[Union[Literal["unsloth"], bool]] = Field(
+    gradient_checkpointing: Optional[
-        default=False
+        Union[Literal["unsloth", "offload"], bool]
-    )
+    ] = Field(default=False)
    gradient_checkpointing_kwargs: Optional[Dict[str, Any]] = None
    unfrozen_parameters: Optional[List[str]] = None
@@ -857,6 +856,7 @@ class AxolotlInputConfig(
    special_tokens: Optional[SpecialTokensConfig] = None
    tokens: Optional[List[str]] = None
    added_tokens_overrides: Optional[Dict[int, str]] = None
    torch_compile: Optional[Union[Literal["auto"], bool]] = None
    torch_compile_backend: Optional[str] = None
@@ -1155,6 +1155,15 @@ class AxolotlInputConfig(
            raise ValueError("gradient_checkpointing is not supported for MPT models")
        return self
    @model_validator(mode="after")
    def check_offload_grad_checkpointing(self):
        if self.gradient_checkpointing and self.gradient_checkpointing == "unsloth":
            LOG.warning(
                "`unsloth` is deprecated for gradient_checkpointing, use `offload`"
            )
            self.gradient_checkpointing = "offload"
        return self
    @model_validator(mode="after")
    def check_better_transformers(self):
        if self.flash_optimum is True:
--- a/src/axolotl/utils/config/models/input/v0_4_1/trl.py
+++ b/src/axolotl/utils/config/models/input/v0_4_1/trl.py
@@ -1,7 +1,8 @@
 """
 GRPO specific configuration args
 """
-from typing import List, Optional
+
 from typing import Optional
 from pydantic import BaseModel, Field
@@ -11,7 +12,10 @@ class TRLConfig(BaseModel):
    Input args for TRL.
    """
-    beta: Optional[float] = None
+    beta: Optional[float] = Field(
        default=None,
        json_schema_extra={"description": "Beta for RL training"},
    )
    max_completion_length: Optional[int] = Field(
        default=None,
        json_schema_extra={
@@ -20,17 +24,68 @@ class TRLConfig(BaseModel):
    )
    # GRPO specific args
-    use_vllm: Optional[bool] = False
+    # Ref: https://github.com/huggingface/trl/blob/e3244d2d096ff1e2e248c931d06d39e165e20623/trl/trainer/grpo_config.py#L22
-    vllm_device: Optional[str] = "auto"
+    use_vllm: Optional[bool] = Field(
-    vllm_gpu_memory_utilization: Optional[float] = 0.9
+        default=False,
-    vllm_max_model_len: Optional[int] = None
+        json_schema_extra={"description": "Whether to use VLLM for RL training"},
-    vllm_dtype: Optional[str] = "auto"
+    )
    vllm_device: Optional[str] = Field(
        default="auto",
        json_schema_extra={"description": "Device to use for VLLM"},
    )
    vllm_gpu_memory_utilization: Optional[float] = Field(
        default=0.9,
        json_schema_extra={"description": "GPU memory utilization for VLLM"},
    )
    vllm_dtype: Optional[str] = Field(
        default="auto",
        json_schema_extra={"description": "Data type for VLLM"},
    )
    vllm_max_model_len: Optional[int] = Field(
        default=None,
        json_schema_extra={
            "description": "Maximum length of the model context for VLLM"
        },
    )
-    reward_funcs: Optional[List[str]] = None
+    reward_funcs: Optional[list[str]] = Field(
-    reward_weights: Optional[List[float]] = None
+        default=None,
-    num_generations: Optional[int] = None
+        json_schema_extra={"description": "List of reward functions to load"},
-    log_completions: Optional[bool] = False
+    )
-
+    reward_weights: Optional[list[float]] = Field(
-    sync_ref_model: Optional[bool] = False
+        default=None,
-    ref_model_mixup_alpha: Optional[float] = 0.9
+        json_schema_extra={
-    ref_model_sync_steps: Optional[int] = 64
+            "description": "Weights for each reward function. Must match the number of reward functions."
        },
    )
    num_generations: Optional[int] = Field(
        default=None,
        json_schema_extra={
            "description": "Number of generations to sample. The global batch size (num_processes * per_device_batch_size) must be divisible by this value."
        },
    )
    log_completions: Optional[bool] = Field(
        default=False,
        json_schema_extra={"description": "Whether to log completions"},
    )
    sync_ref_model: Optional[bool] = Field(
        default=False,
        json_schema_extra={
            "description": (
                "Whether to sync the reference model every `ref_model_sync_steps` "
                "steps, using the `ref_model_mixup_alpha` parameter."
            )
        },
    )
    ref_model_mixup_alpha: Optional[float] = Field(
        default=0.9,
        json_schema_extra={
            "description": "Mixup alpha for the reference model. Requires `sync_ref_model=True`."
        },
    )
    ref_model_sync_steps: Optional[int] = Field(
        default=64,
        json_schema_extra={
            "description": "Sync steps for the reference model. Requires `sync_ref_model=True`."
        },
    )
--- a/src/axolotl/utils/distributed.py
+++ b/src/axolotl/utils/distributed.py
@@ -79,7 +79,7 @@ def is_main_process():
 def is_local_main_process():
-    return PartialState().is_main_process
+    return PartialState().is_local_main_process
 def get_world_size():
--- a/src/axolotl/utils/gradient_checkpointing/init.py
+++ b/src/axolotl/utils/gradient_checkpointing/init.py
@@ -4,7 +4,7 @@ from axolotl.utils.gradient_checkpointing.unsloth import (
 )
-def hf_grad_checkpoint_unsloth_wrapper(
+def hf_grad_checkpoint_offload_wrapper(
    decoder_layer, *args, use_reentrant=None
 ):  # pylint: disable=unused-argument
    return Unsloth_Offloaded_Gradient_Checkpointer.apply(
--- a/src/axolotl/utils/models.py
+++ b/src/axolotl/utils/models.py
@@ -57,8 +57,14 @@ from axolotl.prompt_tokenizers import LLAMA_DEFAULT_EOS_TOKEN
 from axolotl.utils.bench import log_gpu_memory_usage
 from axolotl.utils.chat_templates import get_chat_template_from_config
 from axolotl.utils.dict import DictDefault
-from axolotl.utils.distributed import get_device_count, get_device_type, zero_only
+from axolotl.utils.distributed import (
-from axolotl.utils.gradient_checkpointing import hf_grad_checkpoint_unsloth_wrapper
+    barrier,
    get_device_count,
    get_device_type,
    is_local_main_process,
    zero_only,
 )
 from axolotl.utils.gradient_checkpointing import hf_grad_checkpoint_offload_wrapper
 from axolotl.utils.lora_embeddings import get_linear_embedding_layers
 from axolotl.utils.model_shard_quant import load_sharded_model, load_sharded_model_quant
@@ -165,7 +171,95 @@ def load_model_config(cfg):
    return model_config
 def modify_tokenizer_files(
    tokenizer_path: str, token_mappings: Dict[int, str], output_dir: str
 ) -> str:
    """
    Modify tokenizer files to replace added_tokens strings, save to output directory, and return the path to the modified tokenizer.
    This only works with reserved tokens that were added to the tokenizer, not tokens already part of the vocab.
    Args:
        tokenizer_path: Path or name of the original tokenizer
        token_mappings: Dict mapping {token_id (int): new_token_string}
        output_dir: Directory to save the modified tokenizer
    Returns:
        Path to the modified tokenizer directory
    Ref: https://github.com/huggingface/transformers/issues/27974#issuecomment-1854188941
    """
    import json
    # Create the tokenizer directory in output_dir if it doesn't exist
    tokenizer_dir = os.path.join(output_dir, "tokenizer")
    os.makedirs(tokenizer_dir, exist_ok=True)
    if is_local_main_process():  # pylint: disable=too-many-nested-blocks
        # Load the tokenizer
        temp_tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, use_fast=True)
        # Save the tokenizer to the output directory
        temp_tokenizer.save_pretrained(tokenizer_dir)
        # Get the token IDs and map them to their new values
        token_id_mappings = {
            int(token_id): new_value for token_id, new_value in token_mappings.items()
        }
        # 1. Update tokenizer_config.json - added_tokens_decoder
        config_path = os.path.join(tokenizer_dir, "tokenizer_config.json")
        if os.path.exists(config_path):
            with open(config_path, "r", encoding="utf-8") as f:
                config_data = json.load(f)
            # Update added_tokens_decoder
            if "added_tokens_decoder" in config_data:
                for token_id, new_value in token_id_mappings.items():
                    token_id_str = str(token_id)
                    if token_id_str in config_data["added_tokens_decoder"]:
                        config_data["added_tokens_decoder"][token_id_str][
                            "content"
                        ] = new_value
                    else:
                        raise ValueError(
                            f"Token ID {token_id_str} not found in added_tokens_decoder"
                        )
            # Write the updated config back
            with open(config_path, "w", encoding="utf-8") as f:
                json.dump(config_data, f, indent=2)
        # 2. Update tokenizer.json - added_tokens
        tokenizer_path = os.path.join(tokenizer_dir, "tokenizer.json")
        if os.path.exists(tokenizer_path):
            with open(tokenizer_path, "r", encoding="utf-8") as f:
                tokenizer_data = json.load(f)
            # Update added_tokens
            if "added_tokens" in tokenizer_data:
                for token_id, new_value in token_id_mappings.items():
                    for i, token_entry in enumerate(tokenizer_data["added_tokens"]):
                        if token_entry["id"] == token_id:
                            tokenizer_data["added_tokens"][i]["content"] = new_value
                            break
                    else:
                        # Reaching this section means the token_id was not found in tokenizer.json added_tokens
                        raise ValueError(
                            f"Token ID {token_id} not found in added_tokens"
                        )
            # Write the updated tokenizer data back
            with open(tokenizer_path, "w", encoding="utf-8") as f:
                json.dump(tokenizer_data, f, indent=2)
    barrier()
    return tokenizer_dir
 def load_tokenizer(cfg):
    """Load and configure the tokenizer based on the provided config."""
    model_config = load_model_config(cfg)
    tokenizer_kwargs = {}
    use_fast = True  # this is the default
@@ -180,8 +274,18 @@ def load_tokenizer(cfg):
    if cfg.tokenizer_type:
        tokenizer_cls = getattr(transformers, cfg.tokenizer_type)
    # Set base tokenizer path
    tokenizer_path = cfg.tokenizer_config
    # Apply token string overrides if specified
    if cfg.added_tokens_overrides:
        # Modify tokenizer files and get path to modified tokenizer
        tokenizer_path = modify_tokenizer_files(
            tokenizer_path, cfg.added_tokens_overrides, output_dir=cfg.output_dir
        )
    tokenizer = tokenizer_cls.from_pretrained(
-        cfg.tokenizer_config,
+        tokenizer_path,
        trust_remote_code=cfg.trust_remote_code or False,
        use_fast=use_fast,
        **tokenizer_kwargs,
@@ -389,8 +493,8 @@ class ModelLoader:
            patch_fa_peft_integration()
-        if self.cfg.gradient_checkpointing == "unsloth":
+        if self.cfg.gradient_checkpointing in ["unsloth", "offload"]:
-            transformers.modeling_utils.checkpoint = hf_grad_checkpoint_unsloth_wrapper
+            transformers.modeling_utils.checkpoint = hf_grad_checkpoint_offload_wrapper
        if self.cfg.flash_attention:
            self.patch_attention()
--- a/styles.css
+++ b/styles.css
@@ -14,7 +14,7 @@
 h1 {
    font-family: var(--font-title);
    font-weight: 400;
-    font-size: 6rem;
+    font-size: 5rem;
    line-height: 1.1;
    letter-spacing: -0.05em;
    font-feature-settings: "ss01" on;
--- a/tests/e2e/integrations/test_cut_cross_entropy.py
+++ b/tests/e2e/integrations/test_cut_cross_entropy.py
@@ -69,6 +69,51 @@ class TestCutCrossEntropyIntegration:
            train(cfg=cfg, dataset_meta=dataset_meta)
            check_model_output_exists(temp_dir, cfg)
    # pylint: disable=redefined-outer-name
    def test_qwen2_w_cce(self, temp_dir):
        cfg = DictDefault(
            {
                "base_model": "Qwen/Qwen2.5-0.5B",
                "plugins": [
                    "axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin",
                ],
                "cut_cross_entropy": True,
                "sequence_len": 1024,
                "val_set_size": 0.1,
                "special_tokens": {
                    "pad_token": "<|endoftext|>",
                },
                "datasets": [
                    {
                        "path": "mhenrichsen/alpaca_2k_test",
                        "type": "alpaca",
                    },
                ],
                "num_epochs": 1,
                "micro_batch_size": 4,
                "gradient_accumulation_steps": 1,
                "learning_rate": 0.00001,
                "optimizer": "adamw_torch_fused",
                "output_dir": temp_dir,
                "lr_scheduler": "cosine",
                "save_safetensors": True,
                "max_steps": 10,
                "bf16": "auto",
            }
        )
        prepare_plugins(cfg)
        normalize_config(cfg)
        cli_args = TrainerCliArgs()
        dataset_meta = load_datasets(cfg=cfg, cli_args=cli_args)
        major, minor, _ = get_pytorch_version()
        if (major, minor) < (2, 4):
            with pytest.raises(ImportError):
                train(cfg=cfg, dataset_meta=dataset_meta)
        else:
            train(cfg=cfg, dataset_meta=dataset_meta)
            check_model_output_exists(temp_dir, cfg)
    @pytest.mark.parametrize(
        "attention_type",
        [
--- a/tests/test_tokenizers.py
+++ b/tests/test_tokenizers.py
@@ -1,6 +1,7 @@
 """
 Test cases for the tokenizer loading
 """
 import unittest
 import pytest
@@ -9,7 +10,7 @@ from axolotl.utils.dict import DictDefault
 from axolotl.utils.models import load_tokenizer
-class TestTokenizers(unittest.TestCase):
+class TestTokenizers:
    """
    test class for the load_tokenizer fn
    """
@@ -75,12 +76,48 @@ class TestTokenizers(unittest.TestCase):
            }
        )
        tokenizer = load_tokenizer(cfg)
-        self.assertEqual(tokenizer("<|im_start|>user")["input_ids"], [1, 32000, 1404])
+        assert tokenizer("<|im_start|>user")["input_ids"] == [1, 32000, 1404]
-        self.assertEqual(len(tokenizer), 32001)
+        assert len(tokenizer) == 32001
        # ensure reloading the tokenizer again from cfg results in same vocab length
        tokenizer = load_tokenizer(cfg)
-        self.assertEqual(len(tokenizer), 32001)
+        assert len(tokenizer) == 32001
    def test_added_tokens_overrides(self, temp_dir):
        cfg = DictDefault(
            {
                # use with tokenizer that has reserved_tokens in added_tokens
                "tokenizer_config": "NousResearch/Llama-3.2-1B",
                "added_tokens_overrides": {
                    128041: "RANDOM_OVERRIDE_1",
                    128042: "RANDOM_OVERRIDE_2",
                },
                "output_dir": temp_dir,
            }
        )
        tokenizer = load_tokenizer(cfg)
        assert tokenizer.encode("RANDOM_OVERRIDE_1", add_special_tokens=False) == [
            128041
        ]
        assert tokenizer.encode("RANDOM_OVERRIDE_2", add_special_tokens=False) == [
            128042
        ]
    def test_added_tokens_overrides_with_toolargeid(self, temp_dir):
        cfg = DictDefault(
            {
                # use with tokenizer that has reserved_tokens in added_tokens
                "tokenizer_config": "NousResearch/Llama-3.2-1B",
                "added_tokens_overrides": {1000000: "BROKEN_RANDOM_OVERRIDE_1"},
                "output_dir": temp_dir,
            }
        )
        with pytest.raises(
            ValueError, match=r".*Token ID 1000000 not found in added_tokens.*"
        ):
            load_tokenizer(cfg)
 if __name__ == "__main__":
Author	SHA1	Message	Date
Wing Lian	9cb05283b2	use v2 branch	2025-03-10 19:46:19 -04:00
Wing Lian	aafa6245f4	try with deepspeed import	2025-03-10 19:39:55 -04:00
Wing Lian	3001e6d93c	use commit sha for previous release dev	2025-03-10 18:41:15 -04:00
Wing Lian	ed0456557d	use revised branch	2025-03-10 16:53:57 -04:00
Wing Lian	09e4393a6a	use branch again	2025-03-10 16:48:33 -04:00
Wing Lian	31a81106dd	revert to previous known good commit	2025-03-10 16:36:33 -04:00
Wing Lian	93c20cc0d5	test branch	2025-03-10 16:35:17 -04:00
Wing Lian	3f5e2d6cc9	bump axolotl-contribs-lgpl	2025-03-10 16:35:17 -04:00
NanoCode012	4a736986fa	fix(modal): add git pull when getting branch files (#2399 )	2025-03-10 15:14:41 -04:00
Wing Lian	5d0f110a3b	include iproute2 and nvtop in cloud image (#2393 )	2025-03-10 15:13:38 -04:00
NanoCode012	83f8698b8a	fix: create mount folder on modal if not exist (#2390 )	2025-03-10 16:27:42 +07:00
xzuyn	60a11a6410	Use Latest Cut Cross Entropy (#2392 ) * Update __init__.py * Update README.md * Update cutcrossentropy_install.py * add test	2025-03-10 16:26:40 +07:00
NanoCode012	46a045e528	chore(doc): add faq when having no default chat_template (#2398 ) * chore(doc): add faq when having no default chat_template * Update docs/dataset-formats/conversation.qmd Co-authored-by: salman <salman.mohammadi@outlook.com> * Update docs/faq.qmd Co-authored-by: salman <salman.mohammadi@outlook.com> --------- Co-authored-by: salman <salman.mohammadi@outlook.com>	2025-03-10 16:25:50 +07:00
NanoCode012	3b477e08a0	feat(doc): add more info on RewardModel datasets (#2391 ) * fix: reduce title size * feat(doc): add rm dataset info * Update docs/reward_modelling.qmd following suggestion Co-authored-by: salman <salman.mohammadi@outlook.com> --------- Co-authored-by: salman <salman.mohammadi@outlook.com>	2025-03-10 16:25:31 +07:00
NanoCode012	16dc6ee68d	refactor: trl grpo configs to have descriptions (#2386 ) * refactor: trl grpo configs to have descriptions * chore: caps	2025-03-07 08:58:53 -05:00
Wing Lian	fa7c79b3b9	remove lion-pytorch as it's already handled upstream (#2389 )	2025-03-07 08:58:15 -05:00
Wing Lian	ae66374156	Optimizer refactor and add Muon support (#2367 ) * add muon optimizer optimizer_cls_and_kwargs is on trainer_kwargs only add adamw_kwargs if they're non-null fix mocks better handling of override and check the optimizer unwrap optimizer * fix import	2025-03-06 11:49:19 -05:00
Wing Lian	5e21b1a9da	various fixes 20250305 (#2384 ) * various validation fixes * fix check for non-truthy value	2025-03-06 11:48:44 -05:00
mhenrichsen	575e5f28ec	Update Tokenizer Overrides Handling in models.py (#1549 ) * override special tokens mock code * fix(doc): remove duplicate config * feat: replace added_tokens in tokenizer and add test * make sure to run tokenizer modification on rank 0 only * use is local main process instead * feat: rename config --------- Co-authored-by: NanoCode012 <nano@axolotl.ai> Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-03-05 11:15:12 -05:00