make sure action has permission to create release

release version 0.5.1 (#2082 )
remove deprecated extra metadata kwarg from pydantic Field (#2081 ) [skip ci]
2024-11-19 10:41:19 -05:00 · 2024-11-19 10:35:59 -05:00 · 2024-11-19 10:30:10 -05:00 · 2024-11-19 10:29:31 -05:00 · 2024-11-19 10:19:03 -05:00 · 2024-11-19 10:18:24 -05:00
14 changed files with 168 additions and 158 deletions
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -10,7 +10,7 @@ on:

 jobs:
  build-axolotl:
-    if: ${{ ! contains(github.event.commits[0].message, '[skip docker]]') && github.repository_owner == 'axolotl-ai-cloud' }}
+    if: ${{ ! contains(github.event.commits[0].message, '[skip docker]') && github.repository_owner == 'axolotl-ai-cloud' }}
    strategy:
      fail-fast: false
      matrix:
@@ -77,7 +77,7 @@ jobs:

  build-axolotl-cloud:
    needs: build-axolotl
-    if: ${{ ! contains(github.event.commits[0].message, '[skip docker]]') && github.repository_owner == 'axolotl-ai-cloud' }}
+    if: ${{ ! contains(github.event.commits[0].message, '[skip docker]') && github.repository_owner == 'axolotl-ai-cloud' }}
    # this job needs to be run on self-hosted GPU runners...
    strategy:
      matrix:
@@ -140,7 +140,7 @@ jobs:

  build-axolotl-cloud-no-tmux:
    needs: build-axolotl
-    if: ${{ ! contains(github.event.commits[0].message, '[skip docker]]') && github.repository_owner == 'axolotl-ai-cloud' }}
+    if: ${{ ! contains(github.event.commits[0].message, '[skip docker]') && github.repository_owner == 'axolotl-ai-cloud' }}
    # this job needs to be run on self-hosted GPU runners...
    strategy:
      matrix:
--- a/.github/workflows/multi-gpu-e2e.yml
+++ b/.github/workflows/multi-gpu-e2e.yml
@@ -15,7 +15,7 @@ concurrency:

 jobs:
  test-axolotl-multigpu:
-    if: ${{ ! contains(github.event.commits[0].message, '[skip docker]]') && github.repository_owner == 'axolotl-ai-cloud' }}
+    if: ${{ ! contains(github.event.commits[0].message, '[skip e2e]') && github.repository_owner == 'axolotl-ai-cloud' }}
    strategy:
      fail-fast: false
      matrix:
--- a/.github/workflows/nightlies.yml
+++ b/.github/workflows/nightlies.yml
@@ -7,7 +7,7 @@ on:

 jobs:
  build-axolotl:
-    if: ${{ ! contains(github.event.commits[0].message, '[skip docker]]') && github.repository_owner == 'axolotl-ai-cloud' }}
+    if: ${{ ! contains(github.event.commits[0].message, '[skip docker]') && github.repository_owner == 'axolotl-ai-cloud' }}
    strategy:
      fail-fast: false
      matrix:
@@ -71,7 +71,7 @@ jobs:

  build-axolotl-cloud:
    needs: build-axolotl
-    if: ${{ ! contains(github.event.commits[0].message, '[skip docker]]') && github.repository_owner == 'axolotl-ai-cloud' }}
+    if: ${{ ! contains(github.event.commits[0].message, '[skip docker]') && github.repository_owner == 'axolotl-ai-cloud' }}
    # this job needs to be run on self-hosted GPU runners...
    strategy:
      matrix:
--- a/.github/workflows/pypi.yml
+++ b/.github/workflows/pypi.yml
@@ -10,6 +10,8 @@ jobs:
  setup_release:
    name: Create Release
    runs-on: ubuntu-latest
+    permissions:
+      contents: write
    steps:
      - name: Get the tag version
        id: extract_branch
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@@ -78,7 +78,7 @@ jobs:
          find "$(pip cache dir)/http-v2" -type f -mtime +14 -exec rm {} \;

  docker-e2e-tests-1st:
-    if: github.repository_owner == 'axolotl-ai-cloud'
+    if: ${{ ! contains(github.event.commits[0].message, '[skip e2e]') && github.repository_owner == 'axolotl-ai-cloud' }}
    # this job needs to be run on self-hosted GPU runners...
    runs-on: [self-hosted, modal]
    timeout-minutes: 90
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,12 +1,12 @@
 --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
 packaging==23.2
 peft==0.13.2
-transformers==4.46.2
+transformers==4.46.3
 tokenizers>=0.20.1
 bitsandbytes==0.44.1
 accelerate==1.1.0
 datasets==3.1.0
-deepspeed==0.15.3
+deepspeed==0.15.4
 pydantic==2.6.3
 addict
 fire
--- a/setup.py
+++ b/setup.py
@@ -96,7 +96,7 @@ install_requires, dependency_links = parse_requirements()

 setup(
    name="axolotl",
-    version="0.5.0",
+    version="0.5.1",
    description="LLM Trainer",
    long_description="Axolotl is a tool designed to streamline the fine-tuning of various AI models, offering support for multiple configurations and architectures.",
    package_dir={"": "src"},
@@ -108,7 +108,7 @@ setup(
            "flash-attn==2.7.0.post2",
        ],
        "deepspeed": [
-            "deepspeed==0.14.4",
+            "deepspeed==0.15.4",
            "deepspeed-kernels",
        ],
        "mamba-ssm": [
--- a/src/axolotl/monkeypatch/trainer_fsdp_grad_accum.py
+++ b/src/axolotl/monkeypatch/trainer_fsdp_grad_accum.py
@@ -1,83 +0,0 @@
-"""
-fix for FSDP gradient accumulation
-see https://github.com/huggingface/transformers/pull/34645
-"""
-import inspect
-
-from accelerate.logging import get_logger
-from transformers.trainer import Trainer
-
-from axolotl.monkeypatch.unsloth_ import detab_code
-
-LOG = get_logger("axolotl.monkeypatch.trainer_fsdp_grad_accumulation")
-
-ORIGINAL_CONTEXT_CODE = """
-                context = (
-                    functools.partial(self.accelerator.no_sync, model=model)
-                    if i == len(batch_samples) - 1
-                    else contextlib.nullcontext
-                )
-"""
-
-PATCHED_CONTEXT_CODE = """
-                context = (
-                    functools.partial(self.accelerator.no_sync, model=model)
-                    if i != len(batch_samples) - 1
-                    else contextlib.nullcontext
-                )
-"""
-
-
-def get_training_loop_code() -> str:
-    training_loop = inspect.getsource(
-        Trainer._inner_training_loop  # pylint: disable=protected-access
-    )
-    return training_loop
-
-
-def check_training_loop_is_patchable() -> bool:
-    train_loop = get_training_loop_code()
-    train_loop, _ = detab_code(train_loop)
-    return ORIGINAL_CONTEXT_CODE in train_loop
-
-
-def patch_training_loop_for_fsdp_grad_accum():
-    """
-    monkeypatch for fixing the training loop for FSDP gradient accumulation
-    """
-
-    train_loop = get_training_loop_code()
-    Trainer._original_inner_training_loop = (  # pylint: disable=protected-access
-        train_loop
-    )
-    train_loop, _ = detab_code(train_loop)
-    assert (
-        ORIGINAL_CONTEXT_CODE in train_loop
-    ), "Original _inner_training_loop code not found"
-
-    train_loop = train_loop.replace(ORIGINAL_CONTEXT_CODE, PATCHED_CONTEXT_CODE)
-    train_loop = train_loop.replace(
-        "def _inner_training_loop(",
-        "def _fixed_inner_training_loop(",
-        1,
-    )
-
-    # load imports necessary
-    import transformers.trainer
-
-    items_to_import = []
-    for item in dir(transformers.trainer):
-        if item in train_loop:
-            items_to_import.append(item)
-
-    exec(  # pylint: disable=exec-used  # nosec B102
-        "from transformers.trainer import ("
-        + ", ".join(x for x in items_to_import)
-        + ")",
-        globals(),
-    )
-    exec(train_loop, globals())  # pylint: disable=exec-used  # nosec B102
-    LOG.info("patching _inner_training_loop", main_process_only=True)
-    Trainer._inner_training_loop = (  # pylint: disable=protected-access
-        _fixed_inner_training_loop  # pylint: disable=undefined-variable  # noqa: F821
-    )
--- a/src/axolotl/utils/config/models/input/v0_4_1/init.py
+++ b/src/axolotl/utils/config/models/input/v0_4_1/init.py
@@ -250,8 +250,10 @@ class KTODataset(BaseModel):
 class LoftQConfig(BaseModel):
    """LoftQ configuration subset"""

-    loftq_bits: int = Field(default=4, metadata={"help": "Quantization bits for LoftQ"})
-    # loftq_iter: int = Field(default=1, metadata={"help": "Alternating iterations for LoftQ"})
+    loftq_bits: int = Field(
+        default=4, json_schema_extra={"description": "Quantization bits for LoftQ"}
+    )
+    # loftq_iter: int = Field(default=1, json_schema_extra={"description": "Alternating iterations for LoftQ"})


 class PeftConfig(BaseModel):
@@ -294,8 +296,8 @@ class LoraConfig(BaseModel):

    qlora_sharded_model_loading: Optional[bool] = Field(
        default=False,
-        metadata={
-            "help": "load qlora model in sharded format for FSDP using answer.ai technique."
+        json_schema_extra={
+            "description": "load qlora model in sharded format for FSDP using answer.ai technique."
        },
    )
    lora_on_cpu: Optional[bool] = None
@@ -304,13 +306,15 @@ class LoraConfig(BaseModel):

    loraplus_lr_ratio: Optional[float] = Field(
        default=None,
-        metadata={
-            "help": "loraplus learning rate ratio lr_B / lr_A. Recommended value is 2^4."
+        json_schema_extra={
+            "description": "loraplus learning rate ratio lr_B / lr_A. Recommended value is 2^4."
        },
    )
    loraplus_lr_embedding: Optional[float] = Field(
        default=1e-6,
-        metadata={"help": "loraplus learning rate for lora embedding layers."},
+        json_schema_extra={
+            "description": "loraplus learning rate for lora embedding layers."
+        },
    )

    merge_lora: Optional[bool] = None
@@ -380,10 +384,10 @@ class ModelInputConfig(BaseModel):
    tokenizer_use_fast: Optional[bool] = None
    tokenizer_legacy: Optional[bool] = None
    tokenizer_type: Optional[str] = Field(
-        default=None, metadata={"help": "transformers tokenizer class"}
+        default=None, json_schema_extra={"description": "transformers tokenizer class"}
    )
    processor_type: Optional[str] = Field(
-        default=None, metadata={"help": "transformers processor class"}
+        default=None, json_schema_extra={"description": "transformers processor class"}
    )
    trust_remote_code: Optional[bool] = None

@@ -405,18 +409,18 @@ class HyperparametersConfig(BaseModel):
    gradient_accumulation_steps: Optional[int] = Field(default=1)
    micro_batch_size: Optional[int] = Field(
        default=1,
-        metadata={"help": "per gpu micro batch size for training"},
+        json_schema_extra={"description": "per gpu micro batch size for training"},
    )
    batch_size: Optional[int] = Field(
        default=None,
-        metadata={
-            "help": "Total batch size, we do not recommended setting this manually"
+        json_schema_extra={
+            "description": "Total batch size, we do not recommended setting this manually"
        },
    )
    eval_batch_size: Optional[int] = Field(
        default=None,
-        metadata={
-            "help": "per gpu micro batch size for evals, defaults to value of micro_batch_size"
+        json_schema_extra={
+            "description": "per gpu micro batch size for evals, defaults to value of micro_batch_size"
        },
    )

@@ -441,12 +445,13 @@ class HyperparametersConfig(BaseModel):
        ]
    ] = OptimizerNames.ADAMW_HF.value
    optim_args: Optional[Union[str, Dict[str, Any]]] = Field(
-        default=None, metadata={"help": "Optional arguments to supply to optimizer."}
+        default=None,
+        json_schema_extra={"description": "Optional arguments to supply to optimizer."},
    )
    optim_target_modules: Optional[Union[List[str], Literal["all_linear"]]] = Field(
        default=None,
-        metadata={
-            "help": "The target modules to optimize, i.e. the module names that you would like to train."
+        json_schema_extra={
+            "description": "The target modules to optimize, i.e. the module names that you would like to train."
        },
    )
    torchdistx_path: Optional[str] = None
@@ -506,15 +511,15 @@ class LISAConfig(BaseModel):

    lisa_n_layers: Optional[int] = Field(
        default=None,
-        metadata={"help": "the number of activate layers in LISA"},
+        json_schema_extra={"description": "the number of activate layers in LISA"},
    )
    lisa_step_interval: Optional[int] = Field(
        default=None,
-        metadata={"help": "how often to switch layers in LISA"},
+        json_schema_extra={"description": "how often to switch layers in LISA"},
    )
    lisa_layers_attribute: Optional[str] = Field(
        default="model.layers",
-        metadata={"help": "path under the model to access the layers"},
+        json_schema_extra={"description": "path under the model to access the layers"},
    )


@@ -613,7 +618,8 @@ class AxolotlInputConfig(
    pretraining_dataset: Optional[  # type: ignore
        conlist(Union[PretrainingDataset, SFTDataset], min_length=1)
    ] = Field(
-        default=None, metadata={"help": {"streaming dataset to use for pretraining"}}
+        default=None,
+        json_schema_extra={"description": "streaming dataset to use for pretraining"},
    )
    dataset_processes: Optional[int] = Field(default=os.cpu_count())
    dataset_keep_in_memory: Optional[bool] = None
@@ -673,7 +679,8 @@ class AxolotlInputConfig(
    sequence_len: int = Field(default=512)
    min_sample_len: Optional[int] = None
    max_prompt_len: int = Field(
-        default=512, metadata={"help": "maximum prompt length for RL training"}
+        default=512,
+        json_schema_extra={"description": "maximum prompt length for RL training"},
    )
    sample_packing: Optional[bool] = None
    sample_packing_group_size: Optional[int] = 100_000
@@ -692,8 +699,8 @@ class AxolotlInputConfig(
    pretrain_multipack_buffer_size: Optional[int] = 10_000
    pretrain_multipack_attn: Optional[bool] = Field(
        default=True,
-        metadata={
-            "help": "whether to prevent cross attention for packed sequences during pretraining",
+        json_schema_extra={
+            "description": "whether to prevent cross attention for packed sequences during pretraining",
        },
    )

--- a/src/axolotl/utils/data/rl.py
+++ b/src/axolotl/utils/data/rl.py
@@ -64,15 +64,57 @@ def map_dataset(cfg, data_set, ds_transform_fn, tokenizer):
            tokenizer = load_tokenizer(cfg)
        ds_transform_fn = partial(ds_transform_fn, tokenizer=tokenizer)

+    if isinstance(data_set, DatasetDict):
+        data_set = data_set["train"]
+
    data_set = data_set.map(
        ds_transform_fn,
        desc="Mapping RL Dataset",
    )
-    if isinstance(data_set, DatasetDict):
-        data_set = data_set["train"]
+
    return data_set


+def drop_long_rl_seq(
+    sample, rl, tokenizer, sequence_len  # pylint: disable=invalid-name
+):
+    if rl in ("dpo", "ipo", "orpo", "simpo"):
+        if not (
+            sample.get("prompt") and sample.get("chosen") and sample.get("rejected")
+        ):
+            raise ValueError(
+                "Prompt, chosen and rejected keys are required for DPO/ORPO datasets"
+            )
+
+        prompt = sample["prompt"]
+        chosen = sample["chosen"]
+        rejected = sample["rejected"]
+
+        len_prompt = len(tokenizer(prompt, add_special_tokens=False)["input_ids"])
+        len_chosen = len(tokenizer(chosen, add_special_tokens=False)["input_ids"])
+        len_rejected = len(tokenizer(rejected, add_special_tokens=False)["input_ids"])
+
+        return (len_prompt + len_chosen) <= sequence_len and (
+            len_prompt + len_rejected
+        ) <= sequence_len
+
+    if rl == "kto":
+        if not (sample.get("prompt") and sample.get("completion")):
+            raise ValueError("Prompt and completion keys are required for KTO datasets")
+
+        prompt = sample["prompt"]
+        completion = sample["completion"]
+
+        len_prompt = len(tokenizer(prompt, add_special_tokens=False)["input_ids"])
+        len_completion = len(
+            tokenizer(completion, add_special_tokens=False)["input_ids"]
+        )
+
+        return (len_prompt + len_completion) <= sequence_len
+
+    raise ValueError("Unknown RL type")
+
+
 def load_prepare_dpo_datasets(cfg):
    def load_split(dataset_cfgs, _cfg):
        split_datasets: List[Any] = []
@@ -94,7 +136,7 @@ def load_prepare_dpo_datasets(cfg):
                )
                split_datasets.insert(i, ds)

-        tokenizer = None
+        tokenizer = load_tokenizer(cfg)

        for i, data_set in enumerate(split_datasets):
            _type = dataset_cfgs[i]["type"]
@@ -121,7 +163,28 @@ def load_prepare_dpo_datasets(cfg):
                # "prompt", "chosen" and "rejected" already preprocessed
                split_datasets[i] = data_set

-        return concatenate_datasets(split_datasets)
+            drop_long = partial(
+                drop_long_rl_seq,
+                rl=_cfg.rl,
+                tokenizer=tokenizer,
+                sequence_len=cfg.sequence_len,
+            )
+
+            prior_len = len(split_datasets[i])
+            split_datasets[i] = split_datasets[i].filter(
+                drop_long,
+                num_proc=cfg.dataset_processes,
+                load_from_cache_file=not cfg.is_preprocess,
+                desc="Dropping Long Sequences",
+            )
+            dropped = prior_len - len(split_datasets[i])
+            if dropped:
+                LOG.warning(f"Dropped {dropped} long samples from dataset index {i}")
+
+        combined_datasets = concatenate_datasets(split_datasets)
+        combined_datasets = combined_datasets.shuffle(seed=cfg.seed)
+
+        return combined_datasets

    with zero_first(is_main_process()):
        train_is_preprocessed = False
--- a/src/axolotl/utils/tokenization.py
+++ b/src/axolotl/utils/tokenization.py
@@ -66,28 +66,47 @@ def process_tokens_for_rl_debug(tokens, color, tokenizer, text_only):


 def check_rl_example_labels(example, tokenizer, text_only=False):
-    field_prompt, field_chosen, field_rejected = "prompt", "chosen", "rejected"
+    field_prompt, field_chosen, field_rejected, field_completion = (
+        "prompt",
+        "chosen",
+        "rejected",
+        "completion",
+    )

    input_tokens = example[field_prompt]
-    labels_chosen, labels_rejected = example[field_chosen], example[field_rejected]
+
+    labels_chosen = example.get(field_chosen)
+    labels_rejected = example.get(field_rejected)
+    labels_completion = example.get(field_completion)
+
+    # Create a delimiter based on text_only flag
+    delimiter = "" if text_only else " "

    # Process and color each type of token
    colored_tokens = process_tokens_for_rl_debug(
        input_tokens, "yellow", tokenizer, text_only
    )
-    colored_chosens = process_tokens_for_rl_debug(
-        labels_chosen, "green", tokenizer, text_only
-    )
-    colored_rejecteds = process_tokens_for_rl_debug(
-        labels_rejected, "red", tokenizer, text_only
-    )

-    # Create a delimiter based on text_only flag
-    delimiter = "" if text_only else " "
+    # Process tokens
+    if labels_completion is None:
+        colored_chosens = process_tokens_for_rl_debug(
+            labels_chosen, "green", tokenizer, text_only
+        )
+        colored_rejecteds = process_tokens_for_rl_debug(
+            labels_rejected, "red", tokenizer, text_only
+        )
+    else:
+        colored_completion = process_tokens_for_rl_debug(
+            labels_completion, "green", tokenizer, text_only
+        )

    # Logging information
    LOG.info(f"INPUT PROMPT: {delimiter.join(colored_tokens)}\n\n")
-    LOG.info(f"CHOSEN RESPONSE: {delimiter.join(colored_chosens)}\n\n")
-    LOG.info(f"REJECTED RESPONSE: {delimiter.join(colored_rejecteds)}\n\n\n")
+
+    if labels_completion is None:
+        LOG.info(f"CHOSEN RESPONSE: {delimiter.join(colored_chosens)}\n\n")
+        LOG.info(f"REJECTED RESPONSE: {delimiter.join(colored_rejecteds)}\n\n\n")
+    else:
+        LOG.info(f"COMPLETION RESPONSE: {delimiter.join(colored_completion)}\n\n\n")

    return delimiter.join(colored_tokens)
--- a/src/axolotl/utils/trainer.py
+++ b/src/axolotl/utils/trainer.py
@@ -16,9 +16,6 @@ from torch.utils.data import DataLoader, RandomSampler
 from transformers.utils import is_torch_bf16_gpu_available

 from axolotl.core.trainer_builder import HFCausalTrainerBuilder, HFRLTrainerBuilder
-from axolotl.monkeypatch.trainer_fsdp_grad_accum import (
-    patch_training_loop_for_fsdp_grad_accum,
-)
 from axolotl.utils.distributed import reduce_and_broadcast
 from axolotl.utils.environment import check_cuda_p2p_ib_support
 from axolotl.utils.samplers import MultipackBatchSampler, get_dataset_lengths
@@ -206,37 +203,59 @@ def process_datasets_for_packing(cfg, train_dataset, eval_dataset):
        if eval_dataset and "token_type_ids" in eval_dataset.column_names:
            eval_dataset = eval_dataset.remove_columns("token_type_ids")

+    prior_len = len(train_dataset)
    train_dataset = train_dataset.filter(
        drop_long,
        num_proc=cfg.dataset_processes,
        load_from_cache_file=not cfg.is_preprocess,
        desc="Dropping Long Sequences",
    )
+    dropped = prior_len - len(train_dataset)
+    if dropped:
+        LOG.warning(f"Dropped {dropped} long samples from train dataset")
+
    if eval_dataset:
+        prior_len = len(eval_dataset)
        eval_dataset = eval_dataset.filter(
            drop_long,
            num_proc=cfg.dataset_processes,
            load_from_cache_file=not cfg.is_preprocess,
            desc="Dropping Long Sequences",
        )
+        dropped = prior_len - len(eval_dataset)
+        if dropped:
+            LOG.warning(f"Dropped {dropped} long samples from eval dataset")

    # drop samples with where the number of elements with labels not equal to -100 is zero
    def drop_no_trainable_tokens(sample):
        return np.sum(np.array(sample["labels"]) != -100) > 0

+    prior_len = len(train_dataset)
    train_dataset = train_dataset.filter(
        drop_no_trainable_tokens,
        num_proc=cfg.dataset_processes,
        load_from_cache_file=not cfg.is_preprocess,
        desc="Drop Samples with Zero Trainable Tokens",
    )
+    dropped = prior_len - len(train_dataset)
+    if dropped:
+        LOG.warning(
+            f"Dropped {dropped} samples with no trainable tokens from train dataset"
+        )
+
    if eval_dataset:
+        prior_len = len(eval_dataset)
        eval_dataset = eval_dataset.filter(
            drop_no_trainable_tokens,
            num_proc=cfg.dataset_processes,
            load_from_cache_file=not cfg.is_preprocess,
            desc="Drop Samples with Zero Trainable Tokens",
        )
+        dropped = prior_len - len(eval_dataset)
+        if dropped:
+            LOG.warning(
+                f"Dropped {dropped} samples with no trainable tokens from eval dataset"
+            )

    if cfg.group_by_length:
        train_dataset = train_dataset.map(
@@ -496,12 +515,7 @@ def prepare_opinionated_env(cfg):
 def setup_trainer(
    cfg, train_dataset, eval_dataset, model, tokenizer, processor, total_num_steps
 ):
-    if cfg.fsdp:
-        try:
-            patch_training_loop_for_fsdp_grad_accum()
-        except AssertionError:
-            pass
-    if cfg.rl in ["dpo", "ipo", "orpo", "kto", "simpo"]:
+    if cfg.rl in ("dpo", "ipo", "orpo", "kto", "simpo"):
        trainer_builder = HFRLTrainerBuilder(cfg, model[0], tokenizer, processor)
        trainer_builder.model_ref = model[1]
        trainer_builder.peft_config = model[2]
--- a/tests/e2e/patched/test_trainer_fsdp.py
+++ b/tests/e2e/patched/test_trainer_fsdp.py
@@ -1,15 +0,0 @@
-"""Test module for checking whether the integration of Unsloth with Hugging Face Transformers is working as expected."""
-import unittest
-
-from axolotl.monkeypatch.trainer_fsdp_grad_accum import check_training_loop_is_patchable
-
-
-class TestTrainerFSDPIntegration(unittest.TestCase):
-    """Unsloth monkeypatch integration tests."""
-
-    def test_train_loop_patchable(self):
-        # ensures the current version of transformers has loss code that matches our patching code
-        self.assertTrue(
-            check_training_loop_is_patchable(),
-            "HF transformers _inner_training_loop has changed and isn't patchable",
-        )
--- a/tests/test_schedulers.py
+++ b/tests/test_schedulers.py
@@ -32,16 +32,19 @@ class TestCosineConstantLr(unittest.TestCase):
    def test_schedulers(self):
        self.assertEqual(self.lr_scheduler.get_last_lr()[0], 0)
        for _ in range(self.warmup_steps):
+            self.optimizer.step()
            self.lr_scheduler.step()
        self.assertEqual(self.lr_scheduler.get_last_lr()[0], self._lr)
        constant_step = int(self.train_steps * self.constant_lr_ratio)
        remaining_step = self.train_steps - constant_step
        for _ in range(constant_step):
+            self.optimizer.step()
            self.lr_scheduler.step()
        self.assertEqual(
            self.lr_scheduler.get_last_lr()[0], self._lr * self.min_lr_ratio
        )
        for _ in range(remaining_step):
+            self.optimizer.step()
            self.lr_scheduler.step()
        self.assertEqual(
            self.lr_scheduler.get_last_lr()[0], self._lr * self.min_lr_ratio
Author	SHA1	Message	Date
Wing Lian	afc0dab0f1	make sure action has permission to create release Some checks failed ci-cd / build-axolotl (<nil>, 124, 12.4.1, 3.11, 2.4.1) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 124, 12.4.1, 3.11, 2.5.1) (push) Has been cancelled Details ci-cd / build-axolotl (mamba-ssm, 121, 12.1.1, 3.10, 2.3.1) (push) Has been cancelled Details ci-cd / build-axolotl (mamba-ssm, 121, 12.1.1, true, 3.11, 2.3.1) (push) Has been cancelled Details publish pypi / Create Release (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 121, 12.1.1, 3.10, 2.3.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 121, 12.1.1, true, 3.11, 2.3.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, 3.11, 2.4.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, 3.11, 2.5.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-no-tmux (<nil>, 121, 12.1.1, 3.11, 2.3.1) (push) Has been cancelled Details publish pypi / Upload release to PyPI (push) Has been cancelled Details	2024-11-19 10:41:19 -05:00
Wing Lian	6679e20f47	release version 0.5.1 (#2082 )	2024-11-19 10:35:59 -05:00
Wing Lian	ec59d4cb83	remove deprecated extra metadata kwarg from pydantic Field (#2081 ) [skip ci]	2024-11-19 10:30:10 -05:00
Wing Lian	a77c8a71cf	fix brackets on docker ci builds, add option to skip e2e builds [skip e2e] (#2080 ) [skip ci]	2024-11-19 10:29:31 -05:00
Wing Lian	775311f98f	add optimizer step to prevent warning in tests (#1502 ) [skip ci] * add optimizer step to prevent warning in tests * add optimizer step to warmup as well	2024-11-19 10:19:03 -05:00
NanoCode012	f007c38e49	Feat: Drop long samples and shuffle rl samples (#2040 ) [skip ci] * feat: LOG warn if samples are dropped due to seq length * feat: add drop long samples for RL * feat: add ipo * fix: remove num_proc for map as subprocesses are prone to die * feat: shuffle rl dataset * fix: support preprocess for kto * chore: use set instead of list * feat: add simpo	2024-11-19 10:18:24 -05:00
Wing Lian	d9b71edf84	bump transformers for fsdp-grad-accum fix, remove patch (#2079 )	2024-11-19 02:23:09 -05:00