remove direct dependency on fused dense lib (#2027 )

make publish to pypi manually dispatchable as a workflow (#2026 ) [skip ci]
increment version to 0.5.0 for next release (#2025 ) [skip ci]
2024-11-08 14:48:04 -05:00 · 2024-11-08 14:18:16 -05:00 · 2024-11-08 14:02:25 -05:00 · 2024-11-08 13:45:49 -05:00 · 2024-11-08 11:29:11 -05:00 · 2024-11-08 10:46:24 -05:00
40 changed files with 250 additions and 1941 deletions
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -32,7 +32,7 @@ jobs:
          - cuda: 124
            cuda_version: 12.4.1
            python_version: "3.11"
-            pytorch: 2.5.0
+            pytorch: 2.5.1
            axolotl_extras:
    runs-on: axolotl-gpu-runner
    steps:
@@ -94,7 +94,7 @@ jobs:
          - cuda: 124
            cuda_version: 12.4.1
            python_version: "3.11"
-            pytorch: 2.5.0
+            pytorch: 2.5.1
            axolotl_extras:
    runs-on: axolotl-gpu-runner
    steps:
--- a/.github/workflows/multi-gpu-e2e.yml
+++ b/.github/workflows/multi-gpu-e2e.yml
@@ -31,7 +31,7 @@ jobs:
          - cuda: 124
            cuda_version: 12.4.1
            python_version: "3.11"
-            pytorch: 2.5.0
+            pytorch: 2.5.1
            axolotl_extras:
            num_gpus: 2
            nightly_build: "true"
--- a/.github/workflows/nightlies.yml
+++ b/.github/workflows/nightlies.yml
@@ -31,7 +31,7 @@ jobs:
          - cuda: 124
            cuda_version: 12.4.1
            python_version: "3.11"
-            pytorch: 2.5.0
+            pytorch: 2.5.1
            axolotl_extras:
    runs-on: axolotl-gpu-runner
    steps:
@@ -93,7 +93,7 @@ jobs:
          - cuda: 124
            cuda_version: 12.4.1
            python_version: "3.11"
-            pytorch: 2.5.0
+            pytorch: 2.5.1
            axolotl_extras:
    runs-on: axolotl-gpu-runner
    steps:
--- a/.github/workflows/pypi.yml
+++ b/.github/workflows/pypi.yml
@@ -4,6 +4,7 @@ on:
  push:
    tags:
      - '*'
+  workflow_dispatch:

 jobs:
  pypi-publish:
--- a/.github/workflows/tests-nightly.yml
+++ b/.github/workflows/tests-nightly.yml
@@ -25,7 +25,7 @@ jobs:
      fail-fast: false
      matrix:
        python_version: ["3.10", "3.11"]
-        pytorch_version: ["2.3.1", "2.4.1", "2.5.0"]
+        pytorch_version: ["2.3.1", "2.4.1", "2.5.1"]
    timeout-minutes: 20

    steps:
@@ -92,7 +92,7 @@ jobs:
          - cuda: 124
            cuda_version: 12.4.1
            python_version: "3.11"
-            pytorch: 2.5.0
+            pytorch: 2.5.1
            num_gpus: 1
            axolotl_extras:
            nightly_build: "true"
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@@ -36,7 +36,7 @@ jobs:
      fail-fast: false
      matrix:
        python_version: ["3.10", "3.11"]
-        pytorch_version: ["2.3.1", "2.4.1", "2.5.0"]
+        pytorch_version: ["2.3.1", "2.4.1", "2.5.1"]
    timeout-minutes: 20

    steps:
@@ -132,7 +132,7 @@ jobs:
          - cuda: 124
            cuda_version: 12.4.1
            python_version: "3.11"
-            pytorch: 2.5.0
+            pytorch: 2.5.1
            num_gpus: 1
            axolotl_extras:
    steps:
--- a/README.md
+++ b/README.md
@@ -383,11 +383,10 @@ See [examples](examples) for quick start. It is recommended to duplicate and mod
        - typescript
      type: ... # unimplemented custom format

-      # fastchat conversation (deprecation soon, use chat_template https://axolotl-ai-cloud.github.io/axolotl/docs/dataset-formats/conversation.html#chat_template)
-      # See 'conversation' options: https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py
+      # chat_template https://axolotl-ai-cloud.github.io/axolotl/docs/dataset-formats/conversation.html#chat_template
    - path: ...
-      type: sharegpt
-      conversation: chatml # default: vicuna_v1.1
+      type: chat_template
+      chat_template: chatml # defaults to tokenizer's chat_template

      # local
    - path: data.jsonl # or json
@@ -562,7 +561,8 @@ plugins:
  - axolotl.integrations.liger.LigerPlugin
 liger_rope: true
 liger_rms_norm: true
-liger_swiglu: true
+liger_glu_activation: true
+liger_layer_norm: true
 liger_fused_linear_cross_entropy: true
 ```

--- a/devtools/dev_chat_template.yml
+++ b/devtools/dev_chat_template.yml
@@ -1,4 +1,4 @@
-# Example config for debugging the sharegpt prompt format
+# Example config for debugging the chat_template prompt format
 base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
 model_type: LlamaForCausalLM
 tokenizer_type: LlamaTokenizer
--- a/docker/Dockerfile-base
+++ b/docker/Dockerfile-base
@@ -35,3 +35,7 @@ RUN git lfs install --skip-repo && \
    pip3 install awscli && \
    # The base image ships with `pydantic==1.8.2` which is not working
    pip3 install -U --no-cache-dir pydantic==1.10.10
+
+RUN if [ "$PYTHON_VERSION" != "2.5.1" ] ; then \
+        pip3 install flash-attn==2.6.3; \
+    fi
--- a/docs/config.qmd
+++ b/docs/config.qmd
@@ -83,7 +83,7 @@ lora_on_cpu: true
 datasets:
  # HuggingFace dataset repo | s3://,gs:// path | "json" for local dataset, make sure to fill data_files
  - path: vicgalle/alpaca-gpt4
-    # The type of prompt to use for training. [alpaca, sharegpt, gpteacher, oasst, reflection]
+    # The type of prompt to use for training. [alpaca, gpteacher, oasst, reflection]
    type: alpaca # format | format:<prompt_style> (chat/instruct) | <prompt_strategies>.load_<load_fn>
    ds_type: # Optional[str] (json|arrow|parquet|text|csv) defines the datatype when path is a file
    data_files: # Optional[str] path to source data files
@@ -92,15 +92,6 @@ datasets:
    train_on_split: train # Optional[str] name of dataset split to load from
    revision: # Optional[str] The specific revision of the dataset to use when loading from the Hugging Face Hub. This can be a commit hash, tag, or branch name. If not specified, the latest version will be used. This parameter is ignored for local datasets.

-    # Optional[str] fastchat conversation type, only used with type: sharegpt
-    conversation: # Options (see Conversation 'name'): https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py
-    field_human: # Optional[str]. Human key to use for conversation.
-    field_model: # Optional[str]. Assistant key to use for conversation.
-    # Add additional keys from your dataset as input or output roles
-    roles:
-      input: # Optional[List[str]]. These will be masked based on train_on_input
-      output: # Optional[List[str]].
-
  # Custom user instruction prompt
  - path: repo
    type:
@@ -183,6 +174,8 @@ test_datasets:

 # use RL training: 'dpo', 'ipo', 'kto'
 rl:
+# whether to perform weighting if doing DPO training. Boolean.
+dpo_use_weighting:

 # The name of the chat template to use for training, following values are supported:
 # - tokenizer_default: Uses the chat template that is available in the tokenizer_config.json. If the chat template is not available in the tokenizer, it will raise an error. This is the default value.
--- a/docs/dataset-formats/conversation.qmd
+++ b/docs/dataset-formats/conversation.qmd
@@ -6,33 +6,8 @@ order: 3

 ## sharegpt

-UPDATE: ShareGPT is being deprecated in the next release. Please see `chat_template` section below.
+IMPORTANT: ShareGPT is deprecated!. Please see `chat_template` section below.

-conversations where `from` is `human`/`gpt`. (optional: first row with role `system` to override default system prompt)
-
-```{.json filename="data.jsonl"}
-{"conversations": [{"from": "...", "value": "..."}]}
-```
-
-Note: `type: sharegpt` opens special configs:
- `conversation`: enables conversions to many Conversation types. Refer to the 'name' [here](https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py) for options.
- `roles`: allows you to specify the roles for input and output. This is useful for datasets with custom roles such as `tool` etc to support masking.
- `field_human`: specify the key to use instead of `human` in the conversation.
- `field_model`: specify the key to use instead of `gpt` in the conversation.
-
-```yaml
-datasets:
-    path: ...
-    type: sharegpt
-
-    conversation: # Options (see Conversation 'name'): https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py
-    field_human: # Optional[str]. Human key to use for conversation.
-    field_model: # Optional[str]. Assistant key to use for conversation.
-    # Add additional keys from your dataset as input or output roles
-    roles:
-      input: # Optional[List[str]]. These will be masked based on train_on_input
-      output: # Optional[List[str]].
-```

 ## pygmalion

@@ -40,38 +15,6 @@ datasets:
 {"conversations": [{"role": "...", "value": "..."}]}
 ```

-## sharegpt.load_role
-
-conversations where `role` is used instead of `from`
-
-```{.json filename="data.jsonl"}
-{"conversations": [{"role": "...", "value": "..."}]}
-```
-
-## sharegpt.load_guanaco
-
-conversations where `from` is `prompter` `assistant` instead of default sharegpt
-
-```{.json filename="data.jsonl"}
-{"conversations": [{"from": "...", "value": "..."}]}
-```
-
-## sharegpt.load_ultrachat
-
-conversations where the turns field is 'messages', human is 'user' and gpt is 'assistant'.
-
-```{.json filename="data.jsonl"}
-{"messages": [{"user": "...", "assistant": "..."}]}
-```
-
-## sharegpt_jokes
-
-creates a chat where bot is asked to tell a joke, then explain why the joke is funny
-
-```{.json filename="data.jsonl"}
-{"conversations": [{"title": "...", "text": "...", "explanation": "..."}]}
-```
-

 ## chat_template

--- a/examples/deepseek-v2/qlora-fsdp-2_5.yaml
+++ b/examples/deepseek-v2/qlora-fsdp-2_5.yaml
@@ -9,7 +9,7 @@ strict: false
 plugins:
  - axolotl.integrations.liger.LigerPlugin
 liger_rms_norm: true
-liger_swiglu: true
+liger_glu_activation: true
 liger_fused_linear_cross_entropy: true

 chat_template: deepseek_v2
--- a/examples/llama-3/fft-8b-liger-fsdp.yaml
+++ b/examples/llama-3/fft-8b-liger-fsdp.yaml
@@ -4,7 +4,7 @@ plugins:
  - axolotl.integrations.liger.LigerPlugin
 liger_rope: true
 liger_rms_norm: true
-liger_swiglu: true
+liger_glu_activation: true
 liger_fused_linear_cross_entropy: true

 strict: false
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,10 +1,10 @@
 --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
 packaging==23.2
 peft==0.13.2
-transformers==4.46.0
+transformers==4.46.1
 tokenizers>=0.20.1
 bitsandbytes==0.44.1
-accelerate==1.0.1
+accelerate==1.1.0
 datasets==3.0.1
 deepspeed==0.15.3
 pydantic==2.6.3
@@ -28,13 +28,12 @@ scipy
 scikit-learn==1.4.2
 pynvml
 art
-fschat @ git+https://github.com/lm-sys/FastChat.git@27a05b04a35510afb1d767ae7e5990cbd278f8fe
 gradio==3.50.2
 tensorboard
 python-dotenv==1.0.1
 autoawq>=0.2.5
 triton>=2.3.0
-liger-kernel==0.3.0
+liger-kernel==0.4.0

 mamba-ssm==1.2.0.post1

@@ -43,7 +42,7 @@ s3fs>=2024.5.0
 gcsfs>=2024.5.0
 # adlfs

-trl @ git+https://github.com/huggingface/trl.git@31d02cfb795284591a084416b9dcb7bef5d08924
+trl==0.12.0
 zstandard==0.22.0
 fastcore

--- a/setup.py
+++ b/setup.py
@@ -89,7 +89,7 @@ install_requires, dependency_links = parse_requirements()

 setup(
    name="axolotl",
-    version="0.4.1",
+    version="0.5.0",
    description="LLM Trainer",
    long_description="Axolotl is a tool designed to streamline the fine-tuning of various AI models, offering support for multiple configurations and architectures.",
    package_dir={"": "src"},
@@ -100,9 +100,6 @@ setup(
        "flash-attn": [
            "flash-attn==2.6.3",
        ],
-        "fused-dense-lib": [
-            "fused-dense-lib  @ git+https://github.com/Dao-AILab/flash-attention@v2.6.2#subdirectory=csrc/fused_dense_lib",
-        ],
        "deepspeed": [
            "deepspeed==0.14.4",
            "deepspeed-kernels",
--- a/src/axolotl/cli/preprocess.py
+++ b/src/axolotl/cli/preprocess.py
@@ -23,10 +23,6 @@ from axolotl.cli import (
 )
 from axolotl.common.cli import PreprocessCliArgs
 from axolotl.common.const import DEFAULT_DATASET_PREPARED_PATH
-from axolotl.prompt_strategies.sharegpt import (
-    register_chatml_template,
-    register_llama3_template,
-)
 from axolotl.utils.trainer import disable_datasets_caching

 LOG = logging.getLogger("axolotl.cli.preprocess")
@@ -44,23 +40,6 @@ def do_cli(config: Union[Path, str] = Path("examples/"), **kwargs):
        return_remaining_strings=True
    )

-    if parsed_cfg.chat_template == "chatml":
-        if parsed_cfg.default_system_message:
-            LOG.info(
-                f"ChatML set. Adding default system message: {parsed_cfg.default_system_message}"
-            )
-            register_chatml_template(parsed_cfg.default_system_message)
-        else:
-            register_chatml_template()
-    elif parsed_cfg.chat_template == "llama3":
-        if parsed_cfg.default_system_message:
-            LOG.info(
-                f"LLaMA-3 set. Adding default system message: {parsed_cfg.default_system_message}"
-            )
-            register_llama3_template(parsed_cfg.default_system_message)
-        else:
-            register_llama3_template()
-
    if not parsed_cfg.dataset_prepared_path:
        msg = (
            Fore.RED
--- a/src/axolotl/cli/train.py
+++ b/src/axolotl/cli/train.py
@@ -19,10 +19,6 @@ from axolotl.cli import (
 )
 from axolotl.common.cli import TrainerCliArgs
 from axolotl.integrations.base import PluginManager
-from axolotl.prompt_strategies.sharegpt import (
-    register_chatml_template,
-    register_llama3_template,
-)
 from axolotl.train import train

 LOG = logging.getLogger("axolotl.cli.train")
@@ -42,21 +38,6 @@ def do_train(cfg, cli_args) -> None:
    print_axolotl_text_art()
    check_accelerate_default_config()
    check_user_token()
-    if cfg.chat_template == "chatml" and cfg.default_system_message:
-        LOG.info(
-            f"ChatML set. Adding default system message: {cfg.default_system_message}"
-        )
-        register_chatml_template(cfg.default_system_message)
-    else:
-        register_chatml_template()
-
-    if cfg.chat_template == "llama3" and cfg.default_system_message:
-        LOG.info(
-            f"LLaMA-3 set. Adding default system message: {cfg.default_system_message}"
-        )
-        register_llama3_template(cfg.default_system_message)
-    else:
-        register_llama3_template()

    if cfg.rl:  # and cfg.rl != "orpo":
        dataset_meta = load_rl_datasets(cfg=cfg, cli_args=cli_args)
--- a/src/axolotl/core/trainer_builder.py
+++ b/src/axolotl/core/trainer_builder.py
@@ -896,13 +896,13 @@ class AxolotlTrainer(SchedulerMixin, Trainer):
        for key, value in metrics.items():
            self._stored_metrics[train_eval][key].append(value)

-    def _save_checkpoint(self, model, trial, metrics=None):
+    def _save_checkpoint(self, model, trial, **kwargs):
        # make sure the checkpoint dir exists, since trainer is flakey
        checkpoint_folder = f"{PREFIX_CHECKPOINT_DIR}-{self.state.global_step}"
        run_dir = self._get_output_dir(trial=trial)
        output_dir = os.path.join(run_dir, checkpoint_folder)
        os.makedirs(output_dir, exist_ok=True)
-        return super()._save_checkpoint(model, trial, metrics=metrics)
+        return super()._save_checkpoint(model, trial, **kwargs)


 class AxolotlMambaTrainer(AxolotlTrainer):
@@ -1890,17 +1890,18 @@ class HFRLTrainerBuilder(TrainerBuilderBase):
            # default to saving each epoch if not defined
            training_args_kwargs["save_strategy"] = "epoch"

+        training_args_kwargs["dataset_num_proc"] = self.cfg.dataset_processes
+
        if self.cfg.rl_beta:
            training_args_kwargs["beta"] = self.cfg.rl_beta
        if self.cfg.orpo_alpha:
            # trl does some odd mapping of alpha to beta to reuse the beta parameter ???
            training_args_kwargs["beta"] = self.cfg.orpo_alpha

-        training_args_kwargs["dataset_num_proc"] = self.cfg.dataset_processes
-        training_args_cls = AxolotlDPOConfig
        if self.cfg.rpo_alpha is not None:
            training_args_kwargs["rpo_alpha"] = self.cfg.rpo_alpha

+        training_args_cls = None
        if self.cfg.rl == "simpo":
            training_args_cls = AxolotlCPOConfig
            training_args_kwargs["loss_type"] = "simpo"
@@ -1909,13 +1910,13 @@ class HFRLTrainerBuilder(TrainerBuilderBase):
            if self.cfg.cpo_alpha is not None:
                training_args_kwargs["cpo_alpha"] = self.cfg.cpo_alpha

-        if self.cfg.rl == "orpo":
+        elif self.cfg.rl == "orpo":
            training_args_cls = AxolotlORPOConfig
            training_args_kwargs["max_length"] = self.cfg.sequence_len
            if self.cfg.max_prompt_len:
                training_args_kwargs["max_prompt_length"] = self.cfg.max_prompt_len

-        if self.cfg.rl == "kto":
+        elif self.cfg.rl == "kto":
            training_args_cls = AxolotlKTOConfig

            training_args_kwargs["desirable_weight"] = (
@@ -1930,6 +1931,11 @@ class HFRLTrainerBuilder(TrainerBuilderBase):
            if self.cfg.max_prompt_len:
                training_args_kwargs["max_prompt_length"] = self.cfg.max_prompt_len

+        else:
+            training_args_cls = AxolotlDPOConfig
+            if self.cfg.dpo_use_weighting is not None:
+                training_args_kwargs["use_weighting"] = self.cfg.dpo_use_weighting
+
        training_args = training_args_cls(  # pylint: disable=unexpected-keyword-arg
            output_dir=self.cfg.output_dir,
            per_device_train_batch_size=self.cfg.micro_batch_size,
--- a/src/axolotl/integrations/liger/init.py
+++ b/src/axolotl/integrations/liger/init.py
@@ -18,20 +18,23 @@ Module for the Plugin for LIGER integraton with Axolotl.
 Liger Kernel is the collection of Triton-native kernels for LLM Training.
 It is designed to be performant, correct, and light-weight.
 """
+import inspect
 import logging
 import sys
-from functools import partial

 from liger_kernel.transformers.cross_entropy import LigerCrossEntropyLoss
-from liger_kernel.transformers.geglu import LigerGEGLUMLP
+from liger_kernel.transformers.monkey_patch import MODEL_TYPE_TO_APPLY_LIGER_FN
 from liger_kernel.transformers.rms_norm import LigerRMSNorm
 from liger_kernel.transformers.rope import liger_rotary_pos_emb
 from liger_kernel.transformers.swiglu import LigerSwiGLUMLP

 from axolotl.integrations.base import BasePlugin

+from ...utils.distributed import zero_only
 from .args import LigerArgs  # pylint: disable=unused-import. # noqa: F401

+LOG = logging.getLogger("axolotl.integrations.liger")
+

 class LigerPlugin(BasePlugin):
    """
@@ -42,59 +45,31 @@ class LigerPlugin(BasePlugin):
        return "axolotl.integrations.liger.LigerArgs"

    def pre_model_load(self, cfg):
-        if cfg.model_config_type == "llama":
-            from liger_kernel.transformers.model.llama import (
-                lce_forward as llama_lce_forward,
-            )
-            from transformers.models.llama import modeling_llama
-
-            if cfg.liger_rope:
-                modeling_llama.apply_rotary_pos_emb = liger_rotary_pos_emb
-            if cfg.liger_rms_norm:
-                modeling_llama.LlamaRMSNorm = LigerRMSNorm
-            if cfg.liger_swiglu:
-                modeling_llama.LlamaMLP = LigerSwiGLUMLP
-            if cfg.liger_cross_entropy:
-                modeling_llama.CrossEntropyLoss = LigerCrossEntropyLoss
-            elif cfg.liger_fused_linear_cross_entropy:
-                modeling_llama.LlamaForCausalLM.forward = llama_lce_forward
-
-        elif cfg.model_config_type == "mistral":
-            from liger_kernel.transformers.model.mistral import (
-                lce_forward as mistral_lce_forward,
-            )
-            from transformers.models.mistral import modeling_mistral
-
-            if cfg.liger_rope:
-                modeling_mistral.apply_rotary_pos_emb = liger_rotary_pos_emb
-            if cfg.liger_rms_norm:
-                modeling_mistral.MistralRMSNorm = LigerRMSNorm
-            if cfg.liger_swiglu:
-                modeling_mistral.MistralMLP = LigerSwiGLUMLP
-            if cfg.liger_cross_entropy:
-                modeling_mistral.CrossEntropyLoss = LigerCrossEntropyLoss
-            if cfg.liger_fused_linear_cross_entropy:
-                modeling_mistral.MistralForCausalLM.forward = mistral_lce_forward
-
-        elif cfg.model_config_type == "gemma":
-            from liger_kernel.transformers.model.gemma import (
-                lce_forward as gemma_lce_forward,
-            )
-            from transformers.models.gemma import modeling_gemma
-
-            if cfg.liger_rope:
-                modeling_gemma.apply_rotary_pos_emb = liger_rotary_pos_emb
-            if cfg.liger_rms_norm:
-                modeling_gemma.GemmaRMSNorm = partial(
-                    LigerRMSNorm, offset=1.0, init_fn="zeros", casting_mode="gemma"
+        if cfg.model_config_type in MODEL_TYPE_TO_APPLY_LIGER_FN:
+            apply_liger_fn = MODEL_TYPE_TO_APPLY_LIGER_FN[cfg.model_config_type]
+            liger_fn_sig = inspect.signature(apply_liger_fn)
+            kwargs = {}
+            if "rope" in liger_fn_sig.parameters:
+                kwargs["rope"] = cfg.liger_rope
+            if "cross_entropy" in liger_fn_sig.parameters:
+                kwargs["cross_entropy"] = cfg.liger_cross_entropy
+            if "fused_linear_cross_entropy" in liger_fn_sig.parameters:
+                kwargs[
+                    "fused_linear_cross_entropy"
+                ] = cfg.liger_fused_linear_cross_entropy
+            if "rms_norm" in liger_fn_sig.parameters:
+                kwargs["rms_norm"] = cfg.liger_rms_norm
+            if "layer_norm" in liger_fn_sig.parameters:
+                kwargs["layer_norm"] = cfg.liger_layer_norm
+            if "geglu" in liger_fn_sig.parameters:
+                kwargs["geglu"] = cfg.liger_glu_activation
+            elif "swiglu" in liger_fn_sig.parameters:
+                kwargs["swiglu"] = cfg.liger_glu_activation
+            with zero_only():
+                LOG.info(
+                    f"Applying LIGER to {cfg.model_config_type} with kwargs: {kwargs}"
                )
-            if cfg.liger_swiglu:
-                modeling_gemma.GemmaMLP = LigerGEGLUMLP
-            if cfg.liger_cross_entropy:
-                modeling_gemma.CrossEntropyLoss = LigerCrossEntropyLoss
-            if cfg.liger_fused_linear_cross_entropy:
-                modeling_gemma.GemmaForCausalLM.forward = gemma_lce_forward
-
+            apply_liger_fn(**kwargs)
        elif cfg.model_config_type == "jamba":
            from transformers.models.jamba import modeling_jamba

@@ -104,30 +79,12 @@ class LigerPlugin(BasePlugin):
                modeling_jamba.apply_rotary_pos_emb = liger_rotary_pos_emb
            if cfg.liger_rms_norm:
                modeling_jamba.JambaRMSNorm = LigerRMSNorm
-            if cfg.liger_swiglu:
+            if cfg.liger_glu_activation:
                modeling_jamba.JambaMLP = LigerSwiGLUMLP
            if cfg.liger_cross_entropy:
                modeling_jamba.CrossEntropyLoss = LigerCrossEntropyLoss
            if cfg.liger_fused_linear_cross_entropy:
                modeling_jamba.JambaForCausalLM.forward = jamba_lce_forward
-
-        elif cfg.model_config_type == "qwen2":
-            from liger_kernel.transformers.model.qwen2 import (
-                lce_forward as qwen2_lce_forward,
-            )
-            from transformers.models.qwen2 import modeling_qwen2
-
-            if cfg.liger_rope:
-                modeling_qwen2.apply_rotary_pos_emb = liger_rotary_pos_emb
-            if cfg.liger_rms_norm:
-                modeling_qwen2.Qwen2RMSNorm = LigerRMSNorm
-            if cfg.liger_swiglu:
-                modeling_qwen2.Qwen2MLP = LigerSwiGLUMLP
-            if cfg.liger_cross_entropy:
-                modeling_qwen2.CrossEntropyLoss = LigerCrossEntropyLoss
-            if cfg.liger_fused_linear_cross_entropy:
-                modeling_qwen2.Qwen2ForCausalLM.forward = qwen2_lce_forward
-
        elif cfg.model_config_type == "deepseek_v2":
            from accelerate import init_empty_weights
            from transformers import AutoModelForCausalLM
@@ -146,44 +103,9 @@ class LigerPlugin(BasePlugin):
                logging.warning("Fused liger_rope is not supported for DeepseekV2.")
            if cfg.liger_rms_norm:
                modeling_mod.DeepseekV2RMSNorm = LigerRMSNorm
-            if cfg.liger_swiglu:
+            if cfg.liger_glu_activation:
                modeling_mod.DeepseekV2MLP.forward = LigerSwiGLUMLP.forward
            if cfg.liger_cross_entropy:
                modeling_mod.CrossEntropyLoss = LigerCrossEntropyLoss
            if cfg.liger_fused_linear_cross_entropy:
                modeling_mod.DeepseekV2ForCausalLM.forward = deepseekv2_lce_forward
-
-        elif cfg.model_config_type == "gemma2":
-            from transformers.models.gemma2 import modeling_gemma2
-
-            if cfg.liger_rope:
-                modeling_gemma2.apply_rotary_pos_emb = liger_rotary_pos_emb
-            if cfg.liger_rms_norm:
-                modeling_gemma2.Gemma2RMSNorm = partial(
-                    LigerRMSNorm, offset=1.0, init_fn="zeros", casting_mode="gemma"
-                )
-            if cfg.liger_swiglu:
-                modeling_gemma2.Gemma2MLP = LigerGEGLUMLP
-            if cfg.liger_cross_entropy:
-                modeling_gemma2.CrossEntropyLoss = LigerCrossEntropyLoss
-            if cfg.liger_fused_linear_cross_entropy:
-                logging.warning(
-                    "Fused linear cross entropy is not supported for Gemma 2."
-                )
-
-        elif cfg.model_config_type == "phi3":
-            from liger_kernel.transformers.model.phi3 import (
-                lce_forward as phi3_lce_forward,
-            )
-            from transformers.models.phi3 import modeling_phi3
-
-            if cfg.liger_rope:
-                modeling_phi3.apply_rotary_pos_emb = liger_rotary_pos_emb
-            if cfg.liger_rms_norm:
-                modeling_phi3.Phi3RMSNorm = LigerRMSNorm
-            if cfg.liger_swiglu:
-                modeling_phi3.Phi3MLP = LigerSwiGLUMLP
-            if cfg.liger_cross_entropy:
-                modeling_phi3.CrossEntropyLoss = LigerCrossEntropyLoss
-            if cfg.liger_fused_linear_cross_entropy:
-                modeling_phi3.Phi3ForCausalLM.forward = phi3_lce_forward
--- a/src/axolotl/integrations/liger/args.py
+++ b/src/axolotl/integrations/liger/args.py
@@ -15,9 +15,12 @@
 """
 Module for handling LIGER input arguments.
 """
+import logging
 from typing import Optional

-from pydantic import BaseModel
+from pydantic import BaseModel, model_validator
+
+LOG = logging.getLogger("axolotl.integrations.liger.args")


 class LigerArgs(BaseModel):
@@ -27,6 +30,24 @@ class LigerArgs(BaseModel):

    liger_rope: Optional[bool] = None
    liger_rms_norm: Optional[bool] = None
+    liger_layer_norm: Optional[bool] = None
    liger_swiglu: Optional[bool] = None
+    liger_glu_activation: Optional[bool] = None
    liger_cross_entropy: Optional[bool] = None
    liger_fused_linear_cross_entropy: Optional[bool] = None
+
+    @model_validator(mode="before")
+    @classmethod
+    def check_deprecated_swiglu(cls, data):
+        if data.get("liger_swiglu") is not None:
+            if data.get("liger_glu_activation") is not None:
+                raise ValueError(
+                    "You cannot have both `liger_swiglu` and `liger_glu_activation` set."
+                )
+
+            LOG.warning(
+                "The 'liger_swiglu' argument is deprecated and will be removed in a future release. "
+                "Please use 'liger_glu_activation' instead."
+            )
+            data["liger_glu_activation"] = data.pop("liger_swiglu")
+        return data
--- a/src/axolotl/monkeypatch/fastchat_conversation_turns.py
+++ b/src/axolotl/monkeypatch/fastchat_conversation_turns.py
@@ -1,231 +0,0 @@
-"""
-monkeypatch to add a get_turns method
-"""
-
-import logging
-from typing import Generator, Tuple
-
-from fastchat.conversation import SeparatorStyle
-
-LOG = logging.getLogger("axolotl.monkeypatch.fastchat_conversation_turns")
-
-
-def get_prompt(self) -> str:
-    ret = ""
-    for role, msg in self.get_turns():
-        ret += role + msg
-    return ret
-
-
-def get_turns(  # pylint: disable=too-many-return-statements
-    self,
-) -> Generator[Tuple[str, str], None, None]:
-    """Get the prompt for generation."""
-    system_prompt = self.system_template.format(system_message=self.system_message)
-    if self.sep_style == SeparatorStyle.ADD_COLON_SINGLE:
-        yield "", system_prompt + self.sep
-        for role, message in self.messages:
-            if message:
-                yield role + ": ", message + self.sep
-            else:
-                yield role + ":", ""
-        return
-    if self.sep_style == SeparatorStyle.ADD_COLON_TWO:
-        seps = [self.sep, self.sep2]
-        yield "", system_prompt + seps[0]
-        for i, (role, message) in enumerate(self.messages):
-            if message:
-                yield role + ": ", message + seps[i % 2]
-            else:
-                yield role + ":", ""
-        return
-    if self.sep_style == SeparatorStyle.ADD_COLON_SPACE_SINGLE:
-        yield "", system_prompt + self.sep
-        for role, message in self.messages:
-            if message:
-                yield role + ": ", message + self.sep
-            else:
-                yield role + ": ", ""  # must be end with a space
-        return
-    if self.sep_style == SeparatorStyle.ADD_NEW_LINE_SINGLE:
-        yield "", "" if system_prompt == "" else system_prompt + self.sep
-        for role, message in self.messages:
-            if message:
-                yield role + "\n", message + self.sep
-            else:
-                yield role + "\n", ""
-        return
-    if self.sep_style == SeparatorStyle.NO_COLON_SINGLE:
-        yield "", system_prompt
-        for role, message in self.messages:
-            if message:
-                yield role, message + self.sep
-            else:
-                yield role, ""
-        return
-    if self.sep_style == SeparatorStyle.NO_COLON_TWO:
-        seps = [self.sep, self.sep2]
-        yield "", system_prompt
-        for i, (role, message) in enumerate(self.messages):
-            if message:
-                yield role, message + seps[i % 2]
-            else:
-                yield role, ""
-        return
-    if self.sep_style == SeparatorStyle.RWKV:
-        yield "", system_prompt
-        for i, (role, message) in enumerate(self.messages):
-            if message:
-                yield role + ": ", message.replace("\r\n", "\n").replace(
-                    "\n\n", "\n"
-                ) + "\n\n"
-            else:
-                yield role + ":", ""
-        return
-    if self.sep_style == SeparatorStyle.LLAMA2 and self.name != "mistral":
-        if self.system_message:
-            if self.messages:
-                # For llama, the system message is incorporated into the first human instruction
-                first_role, first_msg = self.messages[0]
-                if first_role == self.roles[0]:
-                    system_prompt += first_msg
-                    self.messages.pop(0)
-            yield "", system_prompt
-        for i, (role, message) in enumerate(self.messages):
-            if message:
-                if (i % 2 == 0 and not self.system_message) or (
-                    i % 2 != 0 and self.system_message
-                ):
-                    role = "<s> " + role
-                yield role + " ", message
-            else:
-                yield role, ""
-        return
-    if self.sep_style == SeparatorStyle.LLAMA2 and self.name == "mistral":
-        contains_sys_msg = False
-        if self.system_message:
-            contains_sys_msg = True
-            if self.messages:
-                # There is no clear guidance on how to handle system messages in Mistral so we just prepend it to the first human instruction separated by a newline
-                first_role, first_msg = self.messages[0]
-                if first_role == self.roles[0]:
-                    system_prompt = self.system_template.format(
-                        system_message=" " + self.system_message
-                    )
-                    system_prompt += first_msg
-                    self.messages.pop(0)
-            yield "", system_prompt
-        for i, (role, message) in enumerate(self.messages):
-            if message and i == 0 and not contains_sys_msg:
-                yield "", system_prompt.strip() + " " + message  # if there is no system message, we need to make sure there is the a `<s> [INST]` at the beginning of the first instruction.
-            elif message:
-                yield role + " ", message
-            else:
-                yield role, ""
-        return
-    if self.sep_style == SeparatorStyle.LLAMA3:
-        if self.system_message:
-            # For llama3, the system message is NOT incorporated into the first human instruction
-            # All messages follow <|start_header_id|>' + role + '<|end_header_id|>\n\n'+ message + '<|eot_id|>
-            yield "", system_prompt
-        for i, (role, message) in enumerate(self.messages):
-            if message:
-                yield f"<|start_header_id|>{role}<|end_header_id|>\n\n", f"{message.strip()}<|eot_id|>"
-            else:
-                yield f"<|start_header_id|>{role}<|end_header_id|>\n\n", ""
-        return
-    if self.sep_style == SeparatorStyle.GEMMA:
-        if self.system_message:
-            raise ValueError("Gemma chat template does not support system messages")
-        for i, (role, message) in enumerate(self.messages):
-            prefix = "<bos>" if i == 0 else ""
-            message_str = message if message else ""
-            yield prefix + "<start_of_turn>" + role + "\n", message_str + "<end_of_turn>\n"
-        return
-    if self.sep_style == SeparatorStyle.CHATGLM:
-        # source: https://huggingface.co/THUDM/chatglm-6b/blob/1d240ba371910e9282298d4592532d7f0f3e9f3e/modeling_chatglm.py#L1302-L1308
-        # source2: https://huggingface.co/THUDM/chatglm2-6b/blob/e186c891cf64310ac66ef10a87e6635fa6c2a579/modeling_chatglm.py#L926
-        round_add_n = 1 if self.name == "chatglm2" else 0
-        if system_prompt:
-            yield "", system_prompt + self.sep
-
-        for i, (role, message) in enumerate(self.messages):
-            if i % 2 == 0:
-                yield "", f"[Round {i//2 + round_add_n}]{self.sep}"
-
-            if message:
-                yield f"{role}：", f"{message}{self.sep}"
-            else:
-                yield f"{role}：", ""
-        return
-    if self.sep_style == SeparatorStyle.CHATML:
-        yield "", "" if system_prompt == "" else system_prompt + self.sep + "\n"
-        for role, message in self.messages:
-            if message:
-                yield role + "\n", message + self.sep + "\n"
-            else:
-                yield role + "\n", ""
-        return
-    if self.sep_style == SeparatorStyle.CHATGLM3:
-        if self.system_message:
-            yield "", system_prompt
-        for role, message in self.messages:
-            if message:
-                yield role + "\n", " " + message
-            else:
-                yield role
-        return
-    if self.sep_style == SeparatorStyle.CHATINTERN:
-        # source: https://huggingface.co/internlm/internlm-chat-7b-8k/blob/bd546fa984b4b0b86958f56bf37f94aa75ab8831/modeling_internlm.py#L771
-        seps = [self.sep, self.sep2]
-        yield "", system_prompt
-        for i, (role, message) in enumerate(self.messages):
-            prefix = "<s>" if i % 2 == 0 else ""
-            if message:
-                yield prefix + role + ":", message + seps[i % 2] + "\n"
-            else:
-                yield role + ":", ""
-        return
-    if self.sep_style == SeparatorStyle.DOLLY:
-        seps = [self.sep, self.sep2]
-        yield "", system_prompt
-        for i, (role, message) in enumerate(self.messages):
-            if message:
-                suffix = "\n\n" if i % 2 == 1 else ""
-                yield role + ":\n", message + seps[i % 2] + suffix
-            else:
-                yield role + ":\n", ""
-        return
-    if self.sep_style == SeparatorStyle.PHOENIX:
-        yield "", system_prompt
-        for role, message in self.messages:
-            if message:
-                yield role + ": ", "<s>" + message + "</s>"
-            else:
-                yield role + ": " + "<s>", ""
-        return
-    if self.sep_style == SeparatorStyle.ROBIN:
-        yield "", system_prompt + self.sep
-        for role, message in self.messages:
-            if message:
-                yield role + ":\n", message + self.sep
-            else:
-                yield role + ":\n", ""
-        return
-    if self.sep_style == SeparatorStyle.FALCON_CHAT:
-        if self.system_message:
-            yield "", system_prompt + self.sep
-        for role, message in self.messages:
-            if message:
-                yield role + ": ", message + self.sep
-            else:
-                yield role + ":", ""
-    else:
-        raise ValueError(f"Invalid style: {self.sep_style}")
-
-
-def add_get_turns_to_conversation():
-    import fastchat.conversation
-
-    fastchat.conversation.Conversation.get_turns = get_turns
-    fastchat.conversation.Conversation.get_prompt = get_prompt
--- a/src/axolotl/prompt_strategies/instruct.py
+++ b/src/axolotl/prompt_strategies/instruct.py
@@ -1,33 +0,0 @@
-"""Module containing the InstructShareGPTPromptTokenizingStrategy class"""
-from typing import Any, Dict, Optional
-
-from axolotl.prompt_tokenizers import ShareGPTPromptTokenizingStrategy
-from axolotl.prompters import ShareGPTPrompterV2
-
-
-def load(tokenizer, cfg, ds_cfg: Optional[Dict[str, Any]] = None):
-    conversation = (
-        ds_cfg["conversation"] if ds_cfg and "conversation" in ds_cfg else None
-    )
-    strategy = InstructShareGPTPromptTokenizingStrategy(
-        # pylint: disable=duplicate-code
-        ShareGPTPrompterV2(
-            conversation=conversation,
-        ),
-        tokenizer,
-        cfg.train_on_inputs,
-        cfg.sequence_len,
-    )
-    return strategy
-
-
-class InstructShareGPTPromptTokenizingStrategy(ShareGPTPromptTokenizingStrategy):
-    """
-    basic sharegpt strategy to grab conversations from the sample row
-    """
-
-    def get_conversation_thread(self, prompt):
-        return [
-            {"from": "human", "value": prompt["instruction"]},
-            {"from": "gpt", "value": prompt["output"]},
-        ]
--- a/src/axolotl/prompt_strategies/llama2_chat.py
+++ b/src/axolotl/prompt_strategies/llama2_chat.py
@@ -29,7 +29,7 @@ from dataclasses import dataclass, field
 from typing import Generator, List, Sequence

 from axolotl.prompt_tokenizers import PromptTokenizingStrategy
-from axolotl.prompters import IGNORE_TOKEN_ID, SHAREGPT_ASSERTION_FAILED_ROLE
+from axolotl.prompters import ALTERNATING_ASSERTION_FAILED_ROLE, IGNORE_TOKEN_ID


@dataclass
@@ -75,7 +75,7 @@ class Llama2ChatConversation:

 class LLama2ChatTokenizingStrategy(PromptTokenizingStrategy):
    """
-    Tokenizing strategy for ShareGPT prompts.
+    Tokenizing strategy for Llama2 prompts.
    adapted from https://github.com/lm-sys/FastChat/blob/main/fastchat/train/train.py
    """

@@ -191,7 +191,7 @@ class Llama2ChatPrompter:  # pylint: disable=too-few-public-methods
        conv.messages = []  # pylint: disable=R0801
        for j, sentence in enumerate(source):
            role = roles[sentence["from"]]
-            assert role == conv.roles[j % 2], SHAREGPT_ASSERTION_FAILED_ROLE
+            assert role == conv.roles[j % 2], ALTERNATING_ASSERTION_FAILED_ROLE
            if sentence["value"]:
                conv.append_message(role, sentence["value"])
        yield conv
--- a/src/axolotl/prompt_strategies/sharegpt.py
+++ b/src/axolotl/prompt_strategies/sharegpt.py
@@ -1,223 +0,0 @@
-"""Module containing the SimpleShareGPTPromptTokenizingStrategy class"""
-
-import logging
-from typing import Any, Dict, Optional, Type
-
-from fastchat.conversation import Conversation, SeparatorStyle, register_conv_template
-
-from axolotl.prompt_tokenizers import ShareGPTPromptTokenizingStrategy
-from axolotl.prompters import ShareGPTPrompterV2
-from axolotl.utils.tokenization import (
-    chatml_to_conversation,
-    merge_consecutive_messages,
-)
-
-LOG = logging.getLogger("axolotl")
-
-
-def register_chatml_template(system_message=None):
-    system_message = system_message or "You are a helpful assistant."
-    register_conv_template(
-        Conversation(
-            name="chatml",
-            system_template="<|im_start|>system\n{system_message}",
-            system_message=system_message,
-            roles=("<|im_start|>user", "<|im_start|>assistant"),
-            sep_style=SeparatorStyle.CHATML,
-            sep="<|im_end|>",
-        )
-    )
-    register_conv_template(
-        Conversation(
-            name="chatml_glaive",
-            system_template="<|im_start|>system\n{system_message}",
-            system_message=system_message,
-            roles=("<|im_start|>user", "<|im_start|>assistant", "<|im_start|>tool"),
-            sep_style=SeparatorStyle.CHATML,
-            sep="<|im_end|>",
-        )
-    )
-
-
-def register_llama3_template(system_message=None):
-    system_message = system_message or "You are a helpful assistant."
-    register_conv_template(
-        Conversation(
-            name="llama3",
-            system_template="<|start_header_id|>system<|end_header_id|>\n\n{system_message}<|eot_id|>",
-            system_message=system_message,
-            roles=("user", "assistant"),
-            sep_style=SeparatorStyle.LLAMA3,
-            sep="",
-            stop_str="<|eot_id|>",
-            stop_token_ids=[128001, 128009],
-        )
-    )
-
-
-def build_loader(
-    tokenization_strategy_cls: Type["ShareGPTPromptTokenizingStrategy"],
-    prompter_cls: Type["ShareGPTPrompterV2"],
-    default_conversation: Optional[str] = None,
-):
-    def _load(tokenizer, cfg, ds_cfg: Optional[Dict[str, Any]] = None):
-        LOG.warning(
-            "sharegpt type support will be deprecated in the next release of Axolotl. Please use chat_template instead. https://axolotl-ai-cloud.github.io/axolotl/docs/dataset-formats/conversation.html#chat_template",
-        )
-        conversation = (
-            ds_cfg["conversation"]
-            if ds_cfg and "conversation" in ds_cfg
-            else default_conversation
-        )
-        field_human = (
-            ds_cfg["field_human"] if ds_cfg and "field_human" in ds_cfg else None
-        )
-        field_model = (
-            ds_cfg["field_model"] if ds_cfg and "field_model" in ds_cfg else None
-        )
-        roles = ds_cfg["roles"].to_dict() if ds_cfg and "roles" in ds_cfg else None
-        strategy = tokenization_strategy_cls(
-            prompter_cls(
-                conversation=conversation,
-                role_key_model=field_model,
-                role_key_human=field_human,
-                roles=roles,
-            ),
-            tokenizer,
-            cfg.train_on_inputs,
-            cfg.sequence_len,
-        )
-        if ds_cfg and "strict" in ds_cfg and hasattr(strategy, "strict"):
-            strategy.strict = ds_cfg["strict"]
-        if ds_cfg and "field_messages" in ds_cfg and hasattr(strategy, "messages"):
-            strategy.messages = ds_cfg["field_messages"]
-        return strategy
-
-    return _load
-
-
-class SimpleShareGPTPromptTokenizingStrategy(ShareGPTPromptTokenizingStrategy):
-    """
-    basic sharegpt strategy to grab conversations from the sample row
-    """
-
-    _strict = False
-    _messages = "conversations"
-
-    @property
-    def strict(self):
-        return self._strict
-
-    @strict.setter
-    def strict(self, strict):
-        self._strict = strict
-
-    @property
-    def messages(self):
-        return self._messages
-
-    @messages.setter
-    def messages(self, messages):
-        self._messages = messages
-
-    def get_conversation_thread(self, prompt):
-        conversations = prompt[self.messages]
-        if self.strict:
-            return conversations
-        role_key = "from"
-        if "role" in conversations[0].keys():
-            role_key = "role"
-        value_key = "value"
-        if "text" in conversations[0].keys():
-            value_key = "text"
-        elif "content" in conversations[0].keys():
-            value_key = "content"
-        # remap roles - allow for assistant turn"
-        role_map = {
-            "user": "human",
-            "human": "human",
-            "assistant": "gpt",
-            "gpt": "gpt",
-            "system": "system",
-        }
-        turns = [
-            {
-                "from": (
-                    role_map[t[role_key]] if t[role_key] in role_map else t[role_key]
-                ),
-                "value": t[value_key],
-                "weight": 1
-                if "weight" not in t or t["weight"] is None
-                else t["weight"],
-            }
-            for t in conversations
-        ]
-        return turns
-
-
-class SimpleRoleShareGPTPromptTokenizingStrategy(
-    SimpleShareGPTPromptTokenizingStrategy
-):
-    """
-    basic sharegpt strategy to grab conversations from the sample row, but uses role instead of from
-    """
-
-    def get_conversation_thread(self, prompt):
-        conversations = prompt["conversations"]
-        # remap role: prompter/assistant, text: ... => from: human/gpt, value: ...
-        turns = [{"from": t["role"], "value": t["value"]} for t in conversations]
-        return turns
-
-
-class GuanacoShareGPTPromptTokenizingStrategy(ShareGPTPromptTokenizingStrategy):
-    """
-    sharegpt strategy that remaps oasst data to sharegpt format
-    """
-
-    def get_conversation_thread(self, prompt):
-        conversations = prompt["conversations"]
-        # remap role: prompter/assistant, text: ... => from: human/gpt, value: ...
-        role_map = {"prompter": "human", "assistant": "gpt"}
-        turns = [
-            {"from": role_map[t["role"]], "value": t["text"]} for t in conversations
-        ]
-        return turns
-
-
-class UltrachatShareGPTPromptTokenizingStrategy(SimpleShareGPTPromptTokenizingStrategy):
-    """
-    sharegpt strategy that remaps ultrachat data to sharegpt format
-    """
-
-    def get_conversation_thread(self, prompt):
-        conversations = prompt["messages"]
-        role_map = {"user": "human", "assistant": "gpt"}
-        turns = [
-            {"from": role_map[t["role"]], "value": t["content"]} for t in conversations
-        ]
-        return turns
-
-
-class GlaiveShareGPTPromptTokenizingStrategy(SimpleShareGPTPromptTokenizingStrategy):
-    """
-    sharegpt strategy that remaps glaive data to sharegpt format
-    """
-
-    def get_conversation_thread(self, prompt):
-        conversation = chatml_to_conversation(prompt)
-        conversation = merge_consecutive_messages(conversation)
-
-        return conversation
-
-
-load = build_loader(SimpleShareGPTPromptTokenizingStrategy, ShareGPTPrompterV2)
-load_role = build_loader(SimpleRoleShareGPTPromptTokenizingStrategy, ShareGPTPrompterV2)
-load_ultrachat = build_loader(
-    UltrachatShareGPTPromptTokenizingStrategy, ShareGPTPrompterV2
-)
-load_guanaco = build_loader(GuanacoShareGPTPromptTokenizingStrategy, ShareGPTPrompterV2)
-load_glaive = build_loader(
-    GlaiveShareGPTPromptTokenizingStrategy,
-    ShareGPTPrompterV2,
-    default_conversation="chatml_glaive",
-)
--- a/src/axolotl/prompt_strategies/sharegpt_jokes.py
+++ b/src/axolotl/prompt_strategies/sharegpt_jokes.py
@@ -1,28 +0,0 @@
-"""Module for Jokes prompts using sharegpt style """
-from axolotl.prompt_tokenizers import ShareGPTPromptTokenizingStrategy
-from axolotl.prompters import ShareGPTPrompterV2
-
-
-def load(tokenizer, cfg):
-    return SimpleJokesShareGPTPromptTokenizingStrategy(
-        ShareGPTPrompterV2(),
-        tokenizer,
-        cfg.train_on_inputs,
-        cfg.sequence_len,
-    )
-
-
-class SimpleJokesShareGPTPromptTokenizingStrategy(ShareGPTPromptTokenizingStrategy):
-    """
-    Tokenization strategy for asking bot to tell a joke and then explain why its funny
-    """
-
-    # title, text, explanation
-    def get_conversation_thread(self, prompt):
-        title = "" if not prompt["title"] else prompt["title"] + " "
-        return [
-            {"from": "human", "value": "Tell me a joke."},
-            {"from": "gpt", "value": title + prompt["text"]},
-            {"from": "human", "value": "Why is that joke funny?"},
-            {"from": "gpt", "value": prompt["explanation"]},
-        ]
--- a/src/axolotl/prompt_tokenizers.py
+++ b/src/axolotl/prompt_tokenizers.py
@@ -1,17 +1,12 @@
 """Module containing PromptTokenizingStrategy and Prompter classes"""

 import abc
-import copy
 import logging
 from typing import Dict, List, Tuple, Union

-from fastchat.conversation import Conversation
 from transformers import BatchEncoding, PreTrainedTokenizer

-from axolotl.monkeypatch.fastchat_conversation_turns import (
-    add_get_turns_to_conversation,
-)
-from axolotl.prompters import IGNORE_TOKEN_ID, Prompter
+from axolotl.prompters import Prompter

 LOG = logging.getLogger("axolotl")

@@ -21,8 +16,6 @@ LLAMA_DEFAULT_EOS_TOKEN = "</s>"  # nosec
 LLAMA_DEFAULT_BOS_TOKEN = "<s>"  # nosec
 LLAMA_DEFAULT_UNK_TOKEN = "<unk>"  # nosec

-add_get_turns_to_conversation()
-

 class InvalidDataException(Exception):
    """
@@ -331,154 +324,6 @@ class AlpacaReflectionPTStrategy(ReflectionPromptTokenizingStrategy):
        )


-class ShareGPTPromptTokenizingStrategy(PromptTokenizingStrategy):
-    """
-    Tokenizing strategy for ShareGPT prompts.
-    """
-
-    def get_conversation_thread(self, prompt):
-        return prompt["conversations"]
-
-    def tokenize_prompt(self, prompt):
-        # Initial values. We will append to these as we go through the conversation.
-        result, current_len = tokenize_prompt_default()
-        conversation: Conversation = (
-            self.prompter._conversation.copy()  # pylint: disable=protected-access
-        )
-
-        input_roles = {conversation.roles[0]}
-        output_roles = {conversation.roles[1]}
-
-        if len(conversation.roles) == 3:
-            tool_role_label = conversation.roles[2]
-            input_roles.add(tool_role_label)
-
-        # Add roles from the config
-        if self.prompter.roles:
-            if "input" in self.prompter.roles and self.prompter.roles["input"]:
-                for role in self.prompter.roles["input"]:
-                    input_roles.add(role)
-
-            if "output" in self.prompter.roles and self.prompter.roles["output"]:
-                for role in self.prompter.roles["output"]:
-                    output_roles.add(role)
-
-        # support for custom roles from the dataset, only useful for vicuna style prompts/roles
-        role_remap = []
-        if (
-            conversation.name == "vicuna_v1.1"
-            and "roles" in prompt
-            and len(prompt["roles"]) >= 2
-        ):
-            role_remap = [
-                {"from": conversation.roles[0], "to": prompt["roles"][0]},
-                {"from": conversation.roles[1], "to": prompt["roles"][1]},
-            ]
-
-        try:
-            for _, part in enumerate(
-                self.prompter.build_prompt(self.get_conversation_thread(prompt))
-            ):
-                if not isinstance(part, tuple):
-                    LOG.warning(f"expected tuple, got {part}")
-                    continue
-
-                if len(part) <= 2:
-                    role, content = part
-                    weight = 1
-                else:
-                    role, content, weight = part
-
-                # Uses "in" because role contains extra characters
-                input_turn = any(r.lower() in role.lower() for r in input_roles)
-                output_turn = any(r.lower() in role.lower() for r in output_roles)
-                empty_role = role.strip() == ""
-
-                if not any([input_turn, output_turn, empty_role]):
-                    LOG.warning(f"unhandled role: {role}")
-                    continue
-
-                if input_turn:
-                    role = (
-                        role.replace(role_remap[0]["from"], role_remap[0]["to"])
-                        if role_remap
-                        else role
-                    )
-                    turn = role + content
-                    # this is still the user query, we should
-                    if not content.strip():
-                        LOG.warning(f"user turn has empty text: {prompt}")
-                    res = self._tokenize(
-                        turn,
-                        add_eos_token=False,
-                        strip_bos_token=True,
-                    )
-                    if self.train_on_inputs and weight == 1:
-                        labels = copy.deepcopy(res["input_ids"])
-                    else:
-                        # everything from this is masked out from the labels
-                        labels = [IGNORE_TOKEN_ID] * len(res["input_ids"])
-                elif output_turn:
-                    role = (
-                        role.replace(role_remap[1]["from"], role_remap[1]["to"])
-                        if role_remap
-                        else role
-                    )
-                    turn = role + content
-                    # this should be the assistant response, should end with an eos token
-                    if not content.strip():
-                        LOG.warning(f"assistant turn has empty text: {prompt}")
-                    add_eos_token = not (
-                        conversation.name == "chatml"
-                        and conversation.sep == self.tokenizer.eos_token
-                    )
-                    res = self._tokenize(
-                        turn,
-                        add_eos_token=add_eos_token,
-                        strip_bos_token=True,
-                    )
-                    role_res = self._tokenize(
-                        role.rstrip(),
-                        add_eos_token=False,
-                        strip_bos_token=True,
-                    )
-                    labels = copy.deepcopy(res["input_ids"])
-                    if not self.train_on_inputs:
-                        # mask out role tokens from the labels
-                        len_role = len(role_res["input_ids"])
-                        labels[:len_role] = [IGNORE_TOKEN_ID] * min(
-                            len_role, len(labels)
-                        )
-                    if weight == 0:
-                        # everything from this is masked out from the labels
-                        # (role is masked out too because it makes no sense if contents is masked out)
-                        labels = [IGNORE_TOKEN_ID] * len(res["input_ids"])
-
-                elif empty_role:
-                    turn = content
-                    # this is only ever the first part, should include the bos token and the user query
-                    res = self._tokenize(
-                        turn, add_eos_token=False, strip_bos_token=False
-                    )
-                    if self.train_on_inputs and weight == 1:
-                        labels = copy.deepcopy(res["input_ids"])
-                    else:
-                        # everything from this is masked out from the labels
-                        labels = [IGNORE_TOKEN_ID] * len(res["input_ids"])
-
-                # pylint: disable=duplicate-code
-                result, current_len = parse_tokenized_to_result(
-                    result,
-                    current_len,
-                    res,
-                    labels,
-                    pad_token_id=self.tokenizer.pad_token_id,
-                )
-            return result
-        except (KeyError, AssertionError, IndexError) as err:
-            raise InvalidDataException(str(err)) from err
-
-
 def tokenize_prompt_default() -> Tuple[Dict[str, List[int]], int]:
    """
    Returns the default values for the tokenize prompt function
--- a/src/axolotl/prompters.py
+++ b/src/axolotl/prompters.py
@@ -5,7 +5,6 @@ from enum import Enum
 from typing import Generator, Optional, Union

 from colorama import Fore
-from fastchat.conversation import Conversation, get_conv_template

 LOG = logging.getLogger("axolotl")
 IGNORE_TOKEN_ID = -100
@@ -262,166 +261,10 @@ class ReflectAlpacaPrompter(Prompter):
        )


-SHAREGPT_ASSERTION_FAILED_ROLE = (
+ALTERNATING_ASSERTION_FAILED_ROLE = (
    "Role did not alternate between turns (gpt and human). Please check your data."
 )

-CONVERSATION_ROLE_FORMAT = {
-    "chatml": "<|im_start|>{ROLE}",
-    "zephyr": "<|{ROLE}|>",
-    "vicuna_v1.1": "{ROLE}",
-    "llama3": "<|start_header_id|>{ROLE}<|end_header_id|>",
-}
-
-
-class ShareGPTPrompter(Prompter):  # pylint: disable=too-few-public-methods
-    """
-    A prompter that generates prompts for the ShareGPT
-    """
-
-    role_key_human = "human"
-    role_key_model = "gpt"
-    # Optional, only used for tool usage datasets.
-    role_key_tool: Optional[str] = None
-    # Optional, role input/output mapping
-    roles: Optional[dict] = None
-
-    def __init__(
-        self,
-        prompt_style=None,  # pylint: disable=unused-argument
-        conversation: Optional[Union[str, Conversation]] = None,
-        role_key_human: Optional[str] = None,
-        role_key_model: Optional[str] = None,
-        role_key_tool: Optional[str] = None,
-        roles: Optional[dict] = None,
-    ):
-        if conversation:
-            if isinstance(conversation, Conversation):
-                self._conversation = conversation
-            else:
-                self._conversation = get_conv_template(conversation)
-        else:
-            self._conversation = get_conv_template("vicuna_v1.1")
-        if role_key_human:
-            self.role_key_human = role_key_human
-        if role_key_model:
-            self.role_key_model = role_key_model
-        if role_key_tool:
-            self.role_key_tool = role_key_tool
-        if roles:
-            self.roles = roles
-
-    def _build_result(self, source):
-        if len(source) < 2:
-            # If there isn't a back and forth conversation, ignore it
-            # also happens on the data splitting leaving empty conversations
-            raise IndexError(
-                f"A conversation entry has less than 2 messages :\n{source}"
-            )
-
-        conv = self._conversation.copy()
-
-        original_source = source.copy()
-        # Add the conversation system prompt if provided, otherwise use the default one
-        if source[0]["from"] == "system":
-            conv.set_system_message(source[0]["value"])
-            source.pop(0)
-
-        roles = {self.role_key_human: conv.roles[0], self.role_key_model: conv.roles[1]}
-        if self.role_key_tool:
-            roles[self.role_key_tool] = conv.roles[2]
-
-        try:
-            # Apply prompt templates
-            if source[0]["from"] not in roles:
-                # Skip the first one if it is not from human
-                source = source[1:]
-        except IndexError as err:
-            # sometimes there is a bing or system chat
-            raise err
-
-        conv.messages = []
-        for _, sentence in enumerate(source):
-            from_role = sentence["from"]
-            if from_role in roles:
-                role = roles[from_role]
-            else:
-                if self._conversation.name not in CONVERSATION_ROLE_FORMAT:
-                    raise NotImplementedError(
-                        f"Role ({role}) not in default roles, and {self._conversation.name} does not support role remapping yet."
-                        "Please help us by creating an Issue to add support for this conversation type."
-                    )
-
-                if self._conversation.name in ["llama3"]:
-                    role = from_role
-                else:
-                    role = CONVERSATION_ROLE_FORMAT[self._conversation.name].format(
-                        ROLE=from_role
-                    )
-
-            if len(conv.messages) > 0 and ((role == conv.messages[-1][0])):
-                if (
-                    role != "assistant"
-                ):  # back to back assistant calls may be okay for tool calls
-                    LOG.warning(f"{SHAREGPT_ASSERTION_FAILED_ROLE}: {sentence}")
-
-            conv.append_message(role, sentence["value"])
-        turns = list(conv.get_turns())
-        original_source_length = len(original_source)
-        assert len(turns) in [
-            original_source_length - 1,
-            original_source_length,
-            original_source_length + 1,
-        ]
-        if len(turns) == original_source_length + 1:
-            original_source = [{"weight": None}] + original_source
-        elif len(turns) == original_source_length - 1:
-            original_source = original_source[1:]
-        return [
-            (*turn, weight)
-            for turn, weight in zip(
-                turns,
-                [
-                    1 if "weight" not in e or e["weight"] is None else e["weight"]
-                    for e in original_source
-                ],
-            )
-        ]
-
-    def build_prompt(self, source) -> Generator[str, None, None]:
-        turns = self._build_result(source)
-
-        for part in turns:
-            if part[0] and not part[1]:
-                LOG.warning(f"role with empty message: {part[0]}")
-            yield part
-
-    def __repr__(self) -> str:
-        turns = self._build_result([{"from": "{from}", "value": "{value}"}])
-        return "\n".join([REPR_TEMPLATE.format(full_prompt=part) for part in turns])
-
-
-class ShareGPTPrompterV2(ShareGPTPrompter):
-    """
-    A V2 prompter that generates prompts for the ShareGPT
-    """
-
-    def __init__(
-        self,
-        conversation: Optional[Union[str, Conversation]] = None,
-        role_key_human: Optional[str] = None,
-        role_key_model: Optional[str] = None,
-        role_key_tool: Optional[str] = None,
-        roles: Optional[dict] = None,
-    ):
-        super().__init__(
-            conversation=conversation,
-            role_key_human=role_key_human,
-            role_key_model=role_key_model,
-            role_key_tool=role_key_tool,
-            roles=roles,
-        )
-

 class UnsupportedPrompter(Prompter):
    """
--- a/src/axolotl/utils/config/init.py
+++ b/src/axolotl/utils/config/init.py
@@ -215,11 +215,6 @@ def normalize_cfg_datasets(cfg):
    if cfg.chat_template:
        if cfg.datasets:
            for idx, ds_cfg in enumerate(cfg.datasets):
-                if ds_cfg.type == "sharegpt" and not ds_cfg.conversation:
-                    LOG.info(
-                        f"updating dataset {ds_cfg.path} with `conversation: {cfg.chat_template}` to match your chat_template"
-                    )
-                    cfg.datasets[idx].conversation = cfg.chat_template
                if (
                    ds_cfg.type in ["orpo.chat_template", "chat_template"]
                    and not ds_cfg.chat_template
@@ -461,27 +456,6 @@ def legacy_validate_config(cfg):
                "`early_stopping_patience` requires that eval_steps should evenly divide save_steps."
            )

-    if cfg.datasets:
-        for idx, ds_cfg in enumerate(cfg.datasets):
-            if not ds_cfg.type:
-                continue
-            if ds_cfg.type == "sharegpt:chat":
-                LOG.warning(
-                    PendingDeprecationWarning(
-                        "`type: sharegpt:chat` will soon be deprecated. simply use `type: sharegpt` instead."
-                    )
-                )
-                cfg.datasets[idx].type = "sharegpt"
-            if "sharegpt_simple" in ds_cfg.type:
-                LOG.warning(
-                    PendingDeprecationWarning(
-                        "`type: sharegpt_simple` will soon be deprecated. simply use `type: sharegpt` instead."
-                    )
-                )
-                cfg.datasets[idx].type = cfg.datasets[idx].type.replace(
-                    "sharegpt_simple", "sharegpt"
-                )
-
    if cfg.saves_per_epoch and cfg.save_steps:
        raise ValueError(
            "save_steps and saves_per_epoch are mutually exclusive and cannot be used together."
--- a/src/axolotl/utils/config/models/input/v0_4_1/init.py
+++ b/src/axolotl/utils/config/models/input/v0_4_1/init.py
@@ -588,6 +588,9 @@ class AxolotlInputConfig(

    rl: Optional[RLType] = None
    reward_model: Optional[bool] = None
+    dpo_use_weighting: Optional[
+        bool
+    ] = None  # whether to use weighting in DPO trainer. If none, default is false in the trainer.

    datasets: Optional[conlist(Union[SFTDataset, DPODataset, KTODataset], min_length=1)] = None  # type: ignore
    test_datasets: Optional[conlist(Union[SFTDataset, DPODataset, KTODataset], min_length=1)] = None  # type: ignore
@@ -780,26 +783,16 @@ class AxolotlInputConfig(

    @field_validator("datasets", mode="before")
    @classmethod
-    def fix_sharegpt_datasets(cls, datasets):
-        for idx, ds_cfg in enumerate(datasets):
-            if not ds_cfg["type"]:
+    def deprecate_sharegpt_datasets(cls, datasets):
+        for _, ds_cfg in enumerate(datasets):
+            if not ds_cfg.get("type"):
                continue
-            if ds_cfg["type"] == "sharegpt:chat":
-                LOG.warning(
-                    PendingDeprecationWarning(
-                        "`type: sharegpt:chat` will soon be deprecated. simply use `type: sharegpt` instead."
-                    )
-                )
-                datasets[idx]["type"] = "sharegpt"
-            if "sharegpt_simple" in ds_cfg["type"]:
-                LOG.warning(
-                    PendingDeprecationWarning(
-                        "`type: sharegpt_simple` will soon be deprecated. simply use `type: sharegpt` instead."
-                    )
-                )
-                datasets[idx]["type"] = datasets[idx]["type"].replace(
-                    "sharegpt_simple", "sharegpt"
+
+            if ds_cfg["type"].startswith("sharegpt"):
+                raise ValueError(
+                    "`type: sharegpt.*` is deprecated. Please use `type: chat_template` instead."
                )
+
        return datasets

    @model_validator(mode="before")
--- a/src/axolotl/utils/tokenization.py
+++ b/src/axolotl/utils/tokenization.py
@@ -1,8 +1,6 @@
 """Module for tokenization utilities"""

 import logging
-import re
-from typing import Dict, List

 from termcolor import colored

@@ -93,65 +91,3 @@ def check_rl_example_labels(example, tokenizer, text_only=False):
    LOG.info(f"REJECTED RESPONSE: {delimiter.join(colored_rejecteds)}\n\n\n")

    return delimiter.join(colored_tokens)
-
-
-GLAIVE_ROLES = ["USER", "ASSISTANT", "FUNCTION RESPONSE"]
-GLAIVE_TO_SHAREGPT_ROLE = {
-    "SYSTEM": "system",
-    "USER": "human",
-    "ASSISTANT": "gpt",
-    "FUNCTION RESPONSE": "tool",
-}
-
-GLAIVE_MSG_REGEX = re.compile(rf"({'|'.join(GLAIVE_ROLES)}): ")
-
-
-def chatml_to_conversation(row: Dict[str, str]) -> List[Dict[str, str]]:
-    """
-    Converts a ChatML formatted row to a list of messages in ShareGPT format.
-    Initially based off https://github.com/lilacai/lilac/blob/main/notebooks/GlaiveToShareGPT.ipynb.
-    """
-
-    system_prompt = row.get("system")
-    if system_prompt:
-        system_prompt = system_prompt.removeprefix("SYSTEM: ")
-
-    chat_str = row["chat"]
-    chat_msgs = [s.strip() for s in GLAIVE_MSG_REGEX.split(chat_str) if s]
-
-    chat_msg_dicts = [
-        {"from": GLAIVE_TO_SHAREGPT_ROLE[role], "value": value}
-        for role, value in zip(chat_msgs[::2], chat_msgs[1::2])
-    ]
-
-    if system_prompt:
-        chat_msg_dicts = [
-            {"from": GLAIVE_TO_SHAREGPT_ROLE["SYSTEM"], "value": system_prompt}
-        ] + chat_msg_dicts
-
-    return chat_msg_dicts
-
-
-def merge_consecutive_messages(messages):
-    """
-    Merge consecutive messages from the same sender into a single message.
-    This can be useful with datasets that contain multiple consecutive tool calls.
-    """
-
-    merged_messages = []
-    current_from = None
-    current_message = ""
-
-    for msg in messages:
-        if current_from == msg["from"]:
-            current_message += msg["value"]
-        else:
-            if current_from is not None:
-                merged_messages.append({"from": current_from, "value": current_message})
-            current_from = msg["from"]
-            current_message = msg["value"]
-
-    if current_from is not None:
-        merged_messages.append({"from": current_from, "value": current_message})
-
-    return merged_messages
--- a/tests/e2e/integrations/liger.py
+++ b/tests/e2e/integrations/liger.py
@@ -1,7 +1,6 @@
 """
 Simple end-to-end test for Liger integration
 """
-
 import unittest
 from pathlib import Path

--- a/tests/e2e/test_dpo.py
+++ b/tests/e2e/test_dpo.py
@@ -115,6 +115,51 @@ class TestDPOLlamaLora(unittest.TestCase):
        train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
        assert (Path(temp_dir) / "checkpoint-20/adapter_model.safetensors").exists()

+    @with_temp_dir
+    def test_dpo_use_weighting(self, temp_dir):
+        # pylint: disable=duplicate-code
+        cfg = DictDefault(
+            {
+                "base_model": "JackFram/llama-68m",
+                "tokenizer_type": "LlamaTokenizer",
+                "sequence_len": 1024,
+                "load_in_8bit": True,
+                "adapter": "lora",
+                "lora_r": 64,
+                "lora_alpha": 32,
+                "lora_dropout": 0.1,
+                "lora_target_linear": True,
+                "special_tokens": {},
+                "rl": "dpo",
+                "dpo_use_weighting": True,
+                "datasets": [
+                    {
+                        "path": "arcee-ai/distilabel-intel-orca-dpo-pairs-binarized",
+                        "type": "chatml.ultra",
+                        "split": "train",
+                    },
+                ],
+                "num_epochs": 1,
+                "micro_batch_size": 4,
+                "gradient_accumulation_steps": 1,
+                "output_dir": temp_dir,
+                "learning_rate": 0.00001,
+                "optimizer": "paged_adamw_8bit",
+                "lr_scheduler": "cosine",
+                "max_steps": 20,
+                "save_steps": 10,
+                "warmup_steps": 5,
+                "gradient_checkpointing": True,
+                "gradient_checkpointing_kwargs": {"use_reentrant": True},
+            }
+        )
+        normalize_config(cfg)
+        cli_args = TrainerCliArgs()
+        dataset_meta = load_rl_datasets(cfg=cfg, cli_args=cli_args)
+
+        train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
+        assert (Path(temp_dir) / "checkpoint-20/adapter_model.safetensors").exists()
+
    @pytest.mark.skip("kto_pair no longer supported in trl")
    @with_temp_dir
    def test_kto_pair_lora(self, temp_dir):
--- a/tests/integrations/init.py
+++ b/tests/integrations/init.py
--- a/tests/integrations/liger.py
+++ b/tests/integrations/liger.py
@@ -0,0 +1,80 @@
+"""
+config validation tests for swiglu args
+"""
+# pylint: disable=duplicate-code
+import logging
+from typing import Optional
+
+import pytest
+
+from axolotl.utils.config import validate_config
+from axolotl.utils.dict import DictDefault
+
+
+@pytest.fixture(name="minimal_base_cfg")
+def fixture_cfg():
+    return DictDefault(
+        {
+            "base_model": "TinyLlama/TinyLlama-1.1B-Chat-v0.6",
+            "learning_rate": 0.000001,
+            "datasets": [
+                {
+                    "path": "mhenrichsen/alpaca_2k_test",
+                    "type": "alpaca",
+                }
+            ],
+            "micro_batch_size": 1,
+            "gradient_accumulation_steps": 1,
+        }
+    )
+
+
+class BaseValidation:
+    """
+    Base validation module to setup the log capture
+    """
+
+    _caplog: Optional[pytest.LogCaptureFixture] = None
+
+    @pytest.fixture(autouse=True)
+    def inject_fixtures(self, caplog):
+        self._caplog = caplog
+
+
+# pylint: disable=too-many-public-methods
+class TestValidation(BaseValidation):
+    """
+    Test the validation module for liger
+    """
+
+    def test_deprecated_swiglu(self, minimal_cfg):
+        test_cfg = DictDefault(
+            {
+                "liger_swiglu": False,
+            }
+            | minimal_cfg
+        )
+
+        with self._caplog.at_level(logging.WARNING):
+            updated_cfg = validate_config(test_cfg)
+            assert (
+                "The 'liger_swiglu' argument is deprecated"
+                in self._caplog.records[0].message
+            )
+            assert updated_cfg.liger_swiglu is None
+            assert updated_cfg.liger_glu_activations is False
+
+    def test_conflict_swiglu_ligergluactivation(self, minimal_cfg):
+        test_cfg = DictDefault(
+            {
+                "liger_swiglu": False,
+                "liger_glu_activations": True,
+            }
+            | minimal_cfg
+        )
+
+        with pytest.raises(
+            ValueError,
+            match=r".*You cannot have both `liger_swiglu` and `liger_glu_activation` set.*",
+        ):
+            validate_config(test_cfg)
--- a/tests/prompt_strategies/test_sharegpt.py
+++ b/tests/prompt_strategies/test_sharegpt.py
@@ -1,500 +0,0 @@
-"""
-Test module for sharegpt integration w chatml
-"""
-
-import pytest
-from datasets import Dataset
-from tokenizers import AddedToken
-from transformers import AutoTokenizer
-
-from axolotl.datasets import TokenizedPromptDataset
-from axolotl.prompt_strategies.sharegpt import (
-    GlaiveShareGPTPromptTokenizingStrategy,
-    SimpleShareGPTPromptTokenizingStrategy,
-    register_chatml_template,
-    register_llama3_template,
-)
-from axolotl.prompters import ShareGPTPrompterV2
-
-register_chatml_template()
-register_llama3_template()
-
-
-@pytest.fixture(name="sharegpt_dataset")
-def fixture_sharegpt_dataset():
-    return Dataset.from_list(
-        [
-            {
-                "conversations": [
-                    {
-                        "from": "system",
-                        "value": "repeat",
-                    },
-                    {
-                        "from": "human",
-                        "value": "hello",
-                    },
-                    {
-                        "from": "gpt",
-                        "value": "hello",
-                    },
-                    {
-                        "from": "human",
-                        "value": "goodbye",
-                    },
-                    {
-                        "from": "gpt",
-                        "value": "goodbye",
-                    },
-                ]
-            }
-        ]
-    )
-
-
-@pytest.fixture(name="sharegpt_dataset_with_weights")
-def fixture_sharegpt_dataset_with_weights():
-    return Dataset.from_list(
-        [
-            {
-                "conversations": [
-                    {
-                        "from": "system",
-                        "value": "repeat",
-                    },
-                    {
-                        "from": "human",
-                        "value": "hello",
-                        "weight": 1,
-                    },
-                    {
-                        "from": "gpt",
-                        "value": "hello",
-                        "weight": 0,
-                    },
-                    {
-                        "from": "human",
-                        "value": "rehello",
-                        "weight": 0,
-                    },
-                    {
-                        "from": "gpt",
-                        "value": "rehello",
-                        "weight": 1,
-                    },
-                    {
-                        "from": "human",
-                        "value": "goodbye",
-                    },
-                    {
-                        "from": "gpt",
-                        "value": "goodbye",
-                        "weight": 0,
-                    },
-                ]
-            }
-        ]
-    )
-
-
-@pytest.fixture(name="glaive_dataset")
-def fixture_sharegpt_glaive_dataset():
-    return Dataset.from_list(
-        [
-            {
-                "system": "SYSTEM: This is a system prompt",
-                "chat": "USER: Can you book a flight for me from New York to London? ASSISTANT: I'm sorry, but I don't have the capability to book flights.  <|endoftext|>",
-            }
-        ]
-    )
-
-
-@pytest.fixture(name="multi_role_dataset")
-def fixture_multi_role_dataset():
-    return Dataset.from_list(
-        [
-            {
-                "conversations": [
-                    {
-                        "from": "system",
-                        "value": "use get_weather(city) to get the weather for a city",
-                    },
-                    {
-                        "from": "human",
-                        "value": "hello, what's the weather in New York?",
-                    },
-                    {
-                        "from": "gpt",
-                        "value": "let me get that for you",
-                    },
-                    {
-                        "from": "tool",
-                        "value": "get_weather(New York)",
-                    },
-                    {
-                        "from": "gpt",
-                        "value": "the weather in New York is 70 degrees and sunny",
-                    },
-                ]
-            }
-        ]
-    )
-
-
-@pytest.fixture(name="tokenizer")
-def fixture_tokenizer():
-    tokenizer = AutoTokenizer.from_pretrained(
-        "casperhansen/mistral-7b-instruct-v0.1-awq"
-    )
-    tokenizer.add_special_tokens(
-        {
-            "eos_token": AddedToken(
-                "<|im_end|>", rstrip=False, lstrip=False, normalized=False
-            )
-        }
-    )
-    tokenizer.add_tokens(
-        [
-            AddedToken("<|im_start|>", rstrip=False, lstrip=False, normalized=False),
-        ]
-    )
-
-    return tokenizer
-
-
-@pytest.fixture(name="llama3_tokenizer")
-def fixture_llama3_tokenizer():
-    tokenizer = AutoTokenizer.from_pretrained("NousResearch/Meta-Llama-3-8B")
-    tokenizer.eos_token = "<|eot_id|>"
-
-    return tokenizer
-
-
-class TestSharegptLlama3:
-    """Test class for ShareGPT style datasets with llama-3 prompts"""
-
-    def test_tokenization(self, sharegpt_dataset, llama3_tokenizer):
-        strategy = SimpleShareGPTPromptTokenizingStrategy(
-            ShareGPTPrompterV2(
-                conversation="llama3",
-                role_key_model=None,
-                role_key_human=None,
-            ),
-            llama3_tokenizer,
-            False,  # train_on_inputs
-            2048,  # sequence_len
-        )
-
-        dataset_wrapper = TokenizedPromptDataset(
-            strategy, sharegpt_dataset, process_count=1
-        )
-
-        input_ids = dataset_wrapper[0]["input_ids"]
-
-        # fmt: off
-        # pylint: disable=duplicate-code
-        assert input_ids == [
-            128000,  # bos
-            128006, 9125, 128007,  # system header
-            271, 31724, 128009,  # sys prompt, eot
-            128006, 882, 128007,  # user header
-            271, 15339, 128009,  # user prompt eot
-            128006, 78191, 128007,  # assistant header
-            271, 15339, 128009,   # assistant response eot
-            128006, 882, 128007,
-            271, 19045, 29474, 128009,
-            128006, 78191, 128007,
-            271, 19045, 29474, 128009,
-        ]
-        # fmt: on
-
-    def test_tokenization_with_weights(
-        self, sharegpt_dataset_with_weights, llama3_tokenizer
-    ):
-        strategy = SimpleShareGPTPromptTokenizingStrategy(
-            ShareGPTPrompterV2(
-                conversation="llama3",
-                role_key_model=None,
-                role_key_human=None,
-            ),
-            llama3_tokenizer,
-            False,  # train_on_inputs
-            2048,  # sequence_len
-        )
-
-        dataset_wrapper = TokenizedPromptDataset(
-            strategy, sharegpt_dataset_with_weights, process_count=1
-        )
-
-        input_ids = dataset_wrapper[0]["input_ids"]
-
-        # fmt: off
-        # pylint: disable=duplicate-code
-        assert input_ids == [
-            128000,  # bos
-            128006, 9125, 128007,  # system header
-            271, 31724, 128009,  # sys prompt, eot
-            128006, 882, 128007,  # user header
-            271, 15339, 128009,  # user prompt eot
-            128006, 78191, 128007,  # assistant header
-            271, 15339, 128009,   # assistant response eot
-            128006, 882, 128007,
-            271, 11310, 4896, 128009,
-            128006, 78191, 128007,
-            271, 11310, 4896, 128009,
-            128006, 882, 128007,
-            271, 19045, 29474, 128009,
-            128006, 78191, 128007,
-            271, 19045, 29474, 128009,
-        ]
-        # fmt: on
-
-
-class TestSharegptChatML:
-    """
-    Test class for sharegpt prompter
-    """
-
-    def test_no_double_im_end(self, sharegpt_dataset, tokenizer):
-        strategy = SimpleShareGPTPromptTokenizingStrategy(
-            ShareGPTPrompterV2(
-                conversation="chatml",
-                role_key_model=None,
-                role_key_human=None,
-            ),
-            tokenizer,
-            False,  # train_on_inputs
-            2048,  # sequence_len
-        )
-
-        dataset_wrapper = TokenizedPromptDataset(
-            strategy, sharegpt_dataset, process_count=1
-        )
-
-        input_ids = dataset_wrapper[0]["input_ids"]
-        # fmt: off
-        assert input_ids == [
-            #  28705, 13, is " \n"
-            1,   # bos
-            32001, 1587, 13, 25997, 32000, 28705, 13,  # system
-            32001, 2188, 13, 21558, 32000, 28705, 13,  # human
-            32001, 13892, 13, 21558, 32000, 28705, 13,  # gpt
-            32001, 2188, 13, 12684, 17664, 32000, 28705, 13,   # human
-            32001, 13892, 13, 12684, 17664, 32000, 28705, 13,  # gpt
-        ]
-        # fmt: on
-
-    def test_no_double_im_end_with_weights(
-        self, sharegpt_dataset_with_weights, tokenizer
-    ):
-        strategy = SimpleShareGPTPromptTokenizingStrategy(
-            ShareGPTPrompterV2(
-                conversation="chatml",
-                role_key_model=None,
-                role_key_human=None,
-            ),
-            tokenizer,
-            False,  # train_on_inputs
-            2048,  # sequence_len
-        )
-
-        dataset_wrapper = TokenizedPromptDataset(
-            strategy, sharegpt_dataset_with_weights, process_count=1
-        )
-
-        input_ids = dataset_wrapper[0]["input_ids"]
-        # fmt: off
-        assert input_ids == [
-            #  28705, 13, is " \n"
-            1,   # bos
-            32001, 1587, 13, 25997, 32000, 28705, 13,  # system
-            32001, 2188, 13, 21558, 32000, 28705, 13,  # human
-            32001, 13892, 13, 21558, 32000, 28705, 13,  # gpt
-            32001, 2188, 13, 267, 21558, 32000, 28705, 13,  # human
-            32001, 13892, 13, 267, 21558, 32000, 28705, 13,  # gpt
-            32001, 2188, 13, 12684, 17664, 32000, 28705, 13,   # human
-            32001, 13892, 13, 12684, 17664, 32000, 28705, 13,  # gpt
-        ]
-        # fmt: on
-
-    def test_no_train_on_input(self, sharegpt_dataset, tokenizer):
-        strategy = SimpleShareGPTPromptTokenizingStrategy(
-            ShareGPTPrompterV2(
-                conversation="chatml",
-                role_key_model=None,
-                role_key_human=None,
-            ),
-            tokenizer,
-            False,  # train_on_inputs
-            2048,  # sequence_len
-        )
-
-        dataset_wrapper = TokenizedPromptDataset(
-            strategy, sharegpt_dataset, process_count=1
-        )
-
-        labels = dataset_wrapper[0]["labels"]
-        # fmt: off
-        assert labels == [
-            -100,   # bos
-            -100, -100, -100, -100, -100, -100, -100,  # system
-            -100, -100, -100, -100, -100, -100, -100,  # human
-            -100, -100, 13, 21558, 32000, 28705, 13,  # gpt
-            -100, -100, -100, -100, -100, -100, -100, -100,   # human
-            -100, -100, 13, 12684, 17664, 32000, 28705, 13,  # gpt
-        ]
-        # fmt: on
-
-    def test_no_train_on_input_with_weights(
-        self, sharegpt_dataset_with_weights, tokenizer
-    ):
-        strategy = SimpleShareGPTPromptTokenizingStrategy(
-            ShareGPTPrompterV2(
-                conversation="chatml",
-                role_key_model=None,
-                role_key_human=None,
-            ),
-            tokenizer,
-            False,  # train_on_inputs
-            2048,  # sequence_len
-        )
-
-        dataset_wrapper = TokenizedPromptDataset(
-            strategy, sharegpt_dataset_with_weights, process_count=1
-        )
-
-        labels = dataset_wrapper[0]["labels"]
-        # fmt: off
-        assert labels == [
-            -100,   # bos
-            -100, -100, -100, -100, -100, -100, -100,  # system
-            -100, -100, -100, -100, -100, -100, -100,  # human
-            -100, -100, -100, -100, -100, -100, -100,  # gpt with weight zero
-            -100, -100, -100, -100, -100, -100, -100, -100,  # human
-            -100, -100, 13, 267, 21558, 32000, 28705, 13,  # gpt
-            -100, -100, -100, -100, -100, -100, -100, -100,   # human
-            -100, -100, -100, -100, -100, -100, -100, -100  # gpt with weight zero
-        ]
-        # fmt: on
-
-    def test_w_train_on_input(self, sharegpt_dataset, tokenizer):
-        strategy = SimpleShareGPTPromptTokenizingStrategy(
-            ShareGPTPrompterV2(
-                conversation="chatml",
-                role_key_model=None,
-                role_key_human=None,
-            ),
-            tokenizer,
-            True,  # train_on_inputs
-            2048,  # sequence_len
-        )
-
-        dataset_wrapper = TokenizedPromptDataset(
-            strategy, sharegpt_dataset, process_count=1
-        )
-
-        labels = dataset_wrapper[0]["labels"]
-        # fmt: off
-        assert labels == [
-            1,   # bos
-            32001, 1587, 13, 25997, 32000, 28705, 13,  # system
-            32001, 2188, 13, 21558, 32000, 28705, 13,  # human
-            32001, 13892, 13, 21558, 32000, 28705, 13,  # gpt
-            32001, 2188, 13, 12684, 17664, 32000, 28705, 13,   # human
-            32001, 13892, 13, 12684, 17664, 32000, 28705, 13,  # gpt
-        ]
-        # fmt: on
-
-    def test_w_train_on_input_with_weights(
-        self, sharegpt_dataset_with_weights, tokenizer
-    ):
-        strategy = SimpleShareGPTPromptTokenizingStrategy(
-            ShareGPTPrompterV2(
-                conversation="chatml",
-                role_key_model=None,
-                role_key_human=None,
-            ),
-            tokenizer,
-            True,  # train_on_inputs
-            2048,  # sequence_len
-        )
-
-        dataset_wrapper = TokenizedPromptDataset(
-            strategy, sharegpt_dataset_with_weights, process_count=1
-        )
-
-        labels = dataset_wrapper[0]["labels"]
-        # fmt: off
-        assert labels == [
-            1,   # bos
-            32001, 1587, 13, 25997, 32000, 28705, 13,  # system
-            32001, 2188, 13, 21558, 32000, 28705, 13,  # human
-            -100, -100, -100, -100, -100, -100, -100,  # gpt with weight 0
-            -100, -100, -100, -100, -100, -100, -100, -100,  # human with weight 0
-            32001, 13892, 13, 267, 21558, 32000, 28705, 13,  # gpt
-            32001, 2188, 13, 12684, 17664, 32000, 28705, 13,  # human
-            -100, -100, -100, -100, -100, -100, -100, -100  # gpt with weight 0
-        ]
-        # fmt: on
-
-    def test_chatml_glaive(self, glaive_dataset, tokenizer):
-        strategy = GlaiveShareGPTPromptTokenizingStrategy(
-            ShareGPTPrompterV2(
-                conversation="chatml",
-                role_key_model=None,
-                role_key_human=None,
-            ),
-            tokenizer,
-            True,  # train_on_inputs
-            2048,  # sequence_len
-        )
-
-        dataset_wrapper = TokenizedPromptDataset(
-            strategy, glaive_dataset, process_count=1
-        )
-
-        labels = dataset_wrapper[0]["labels"]
-        # fmt: off
-        assert labels == [
-            1,  # bos
-            32001, 1587, 13, 3260, 349, 264, 1587, 11510, 32000, 28705, 13,  # system
-            32001, 2188, 13, 6325, 368, 1820, 264, 9314, 354, 528, 477, 1450, 2726, 298, 4222, 28804, 32000, 28705, 13,  # human
-            32001, 13892, 13, 28737, 28742, 28719, 7371, 28725, 562, 315, 949, 28742, 28707, 506, 272, 21368, 298, 1820, 22447, 28723, 28705, 523, 28766, 416, 1009, 772, 28766, 28767, 32000, 28705, 13  # gpt
-        ]
-        # fmt: on
-
-    def test_multi_role_dataset(self, multi_role_dataset, tokenizer):
-        strategy = SimpleShareGPTPromptTokenizingStrategy(
-            ShareGPTPrompterV2(conversation="chatml", roles={"input": ["tool"]}),
-            tokenizer,
-            False,  # train_on_inputs
-            2048,  # sequence_len
-        )
-
-        dataset_wrapper = TokenizedPromptDataset(
-            strategy, multi_role_dataset, process_count=1
-        )
-
-        input_ids = dataset_wrapper[0]["input_ids"]
-        # fmt: off
-        assert input_ids == [
-            1,   # bos
-            32001, 1587, 13, 1730, 625, 28730, 769, 1223, 28732, 18373, 28731, 298, 625, 272, 8086, 354, 264, 2990, 32000, 28705, 13,  # system
-            32001, 2188, 13, 21558, 28725, 767, 28742, 28713, 272, 8086, 297, 1450, 2726, 28804, 32000, 28705, 13,  # human
-            32001, 13892, 13, 895, 528, 625, 369, 354, 368, 32000, 28705, 13,  # gpt
-            32001, 3921, 13, 527, 28730, 769, 1223, 28732, 2972, 2726, 28731, 32000, 28705, 13,  # tool
-            32001, 13892, 13, 1237, 8086, 297, 1450, 2726, 349, 28705, 28787, 28734, 11182, 304, 4376, 1780, 32000, 28705, 13  # gpt
-        ]
-        # fmt: on
-
-        labels = dataset_wrapper[0]["labels"]
-        # fmt: off
-        assert labels == [
-            -100,  # bos
-            -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,  # system
-            -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,  # human
-            -100, -100, 13, 895, 528, 625, 369, 354, 368, 32000, 28705, 13,  # gpt
-            -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,  # tool
-            -100, -100, 13, 1237, 8086, 297, 1450, 2726, 349, 28705, 28787, 28734, 11182, 304, 4376, 1780, 32000, 28705, 13  # gpt
-        ]
-        # fmt: on
--- a/tests/test_datasets.py
+++ b/tests/test_datasets.py
@@ -306,6 +306,10 @@ class TestDatasetPreparation(unittest.TestCase):
        """Verify that processing data from the hub works with a specific revision"""
        with tempfile.TemporaryDirectory() as tmp_dir:
            prepared_path = Path(tmp_dir) / "prepared"
+
+            # make sure prepared_path is empty
+            shutil.rmtree(prepared_path, ignore_errors=True)
+
            cfg = DictDefault(
                {
                    "tokenizer_config": "huggyllama/llama-7b",
--- a/tests/test_normalize_config.py
+++ b/tests/test_normalize_config.py
@@ -39,12 +39,12 @@ class NormalizeConfigTestCase(unittest.TestCase):
                "datasets": [
                    {
                        "path": "lorem/ipsum",
-                        "type": "sharegpt",
-                        "conversation": "vicuna_v1.1",
+                        "type": "chat_template",
+                        "chat_template": "gemma",
                    },
                    {
                        "path": "sit/amet",
-                        "type": "sharegpt",
+                        "type": "chat_template",
                    },
                ],
            }
@@ -52,8 +52,8 @@ class NormalizeConfigTestCase(unittest.TestCase):

        normalize_cfg_datasets(cfg)

-        assert cfg.datasets[0].conversation == "vicuna_v1.1"
-        assert cfg.datasets[1].conversation == "chatml"
+        assert cfg.datasets[0].chat_template == "gemma"
+        assert cfg.datasets[1].chat_template == "chatml"

    @patch("axolotl.utils.config.is_torch_bf16_gpu_available")
    def test_bf16_auto_setter_available(self, mock_bf16_avail):
--- a/tests/test_prompt_tokenizers.py
+++ b/tests/test_prompt_tokenizers.py
@@ -3,7 +3,6 @@
 import json
 import logging
 import unittest
-from copy import deepcopy
 from pathlib import Path
 from typing import Optional

@@ -21,12 +20,8 @@ from axolotl.prompt_strategies.llama2_chat import (
    LLama2ChatTokenizingStrategy,
 )
 from axolotl.prompt_strategies.orpo.chat_template import load
-from axolotl.prompt_strategies.sharegpt import GlaiveShareGPTPromptTokenizingStrategy
-from axolotl.prompt_tokenizers import (
-    AlpacaPromptTokenizingStrategy,
-    ShareGPTPromptTokenizingStrategy,
-)
-from axolotl.prompters import AlpacaPrompter, PromptStyle, ShareGPTPrompterV2
+from axolotl.prompt_tokenizers import AlpacaPromptTokenizingStrategy
+from axolotl.prompters import AlpacaPrompter, PromptStyle
 from axolotl.utils.dict import DictDefault

 LOG = logging.getLogger("axolotl")
@@ -65,17 +60,6 @@ test_data = {
 }


-def prompt_strat(conversation, tokenizer):
-    "Helper function to create a prompt strategy for testing."
-    prompter = ShareGPTPrompterV2(conversation=conversation)
-    return ShareGPTPromptTokenizingStrategy(
-        prompter,
-        tokenizer,
-        False,
-        2048,
-    )
-
-
 class TestPromptTokenizationStrategies(unittest.TestCase):
    """
    Test class for prompt tokenization strategies.
@@ -98,196 +82,6 @@ class TestPromptTokenizationStrategies(unittest.TestCase):
            }
        )

-    def test_sharegpt_integration(self):
-        with open(
-            Path(__file__).parent / "fixtures/conversation.json", encoding="utf-8"
-        ) as fin:
-            data = fin.read()
-            conversation = json.loads(data)
-        with open(
-            Path(__file__).parent / "fixtures/conversation.tokenized.json",
-            encoding="utf-8",
-        ) as fin:
-            data = fin.read()
-            tokenized_conversation = json.loads(data)
-        prompter = ShareGPTPrompterV2()
-        strat = ShareGPTPromptTokenizingStrategy(
-            prompter,
-            self.tokenizer,
-            False,
-            2048,
-        )
-        example = strat.tokenize_prompt(conversation)
-        for fields in ["input_ids", "attention_mask", "labels"]:
-            self.assertEqual(len(example[fields]), len(tokenized_conversation[fields]))
-            self.assertEqual(example[fields], tokenized_conversation[fields])
-
-    def test_sharegpt_warnings_integration(self):
-        with open(
-            Path(__file__).parent / "fixtures/conversation.missingturns.json",
-            encoding="utf-8",
-        ) as fin:
-            data = fin.read()
-            conversation = json.loads(data)
-        prompter = ShareGPTPrompterV2()
-        strat = ShareGPTPromptTokenizingStrategy(
-            prompter,
-            self.tokenizer,
-            False,
-            2048,
-        )
-        with self._caplog.at_level(logging.WARNING):
-            strat.tokenize_prompt(conversation)
-            assert "assistant turn has empty text" in self._caplog.records[1].message
-
-    def test_sharegpt_warnings_turns(self):
-        conversation = {
-            "conversations": [
-                {"from": "system", "value": "lorem"},
-                {"from": "gpt", "value": "ipsum"},
-                {"from": "human", "value": "dolor"},
-                {"from": "human", "value": "dolor"},
-                {"from": "gpt", "value": "sit"},
-            ]
-        }
-        prompter = ShareGPTPrompterV2()
-        strat = ShareGPTPromptTokenizingStrategy(
-            prompter,
-            self.tokenizer,
-            False,
-            2048,
-        )
-        with self._caplog.at_level(logging.WARNING):
-            strat.tokenize_prompt(conversation)
-            assert (
-                "Role did not alternate between turns (gpt and human)"
-                in self._caplog.records[0].message
-            )
-
-    def test_sharegpt_llama(self):
-        "Make sure the sharegpt/llama is tokenized and formatted correctly."
-        strat = prompt_strat("llama-2", self.tokenizer)
-
-        def tokenize(conv):
-            return strat.tokenize_prompt(deepcopy(conv))["input_ids"]
-
-        def decode(ids):
-            return strat.tokenizer.decode(ids)
-
-        # fmt: off
-        # System message, multi-turn conversations
-        mt_ids = tokenize(test_data['multi_turn_sys'])
-        assert decode(mt_ids) == '<s> [INST] <<SYS>>\nlorem\n<</SYS>>\n\nabc [/INST] ipsum</s><s> [INST] 123 [/INST] sit</s>'
-        assert mt_ids == [1, 518, 25580, 29962, 3532, 14816, 29903, 6778, 13, 29880, 3668, 13, 29966, 829, 14816, 29903, 6778, 13, 13, 10736, 518, 29914, 25580, 29962, 23421, 2, 1, 518, 25580, 29962, 29871, 29896, 29906, 29941, 518, 29914, 25580, 29962, 7845, 2]
-
-        # System message, single-turn conversations
-        st_ids = tokenize(test_data['single_turn_sys'])
-        assert decode(st_ids) == '<s> [INST] <<SYS>>\nlorem\n<</SYS>>\n\nabc [/INST] ipsum</s>'
-        assert st_ids == [1, 518, 25580, 29962, 3532, 14816, 29903, 6778, 13, 29880, 3668, 13, 29966, 829, 14816, 29903, 6778, 13, 13, 10736, 518, 29914, 25580, 29962, 23421, 2]
-
-        # No system message, single-turn
-        ns_ids = tokenize(test_data['single_turn_no_sys'])
-        assert decode(ns_ids) == '<s> [INST] abc [/INST] ipsum</s>'
-        assert ns_ids == [1, 518, 25580, 29962, 25638, 518, 29914, 25580, 29962, 23421, 2]
-
-        # No system message, multi-turn
-        ns_mt_ids = tokenize(test_data['multi_turn_no_sys'])
-        assert decode(ns_mt_ids) == '<s> [INST] abc [/INST] ipsum</s><s> [INST] 123 [/INST] sit</s>'
-        assert ns_mt_ids == [1, 518, 25580, 29962, 25638, 518, 29914, 25580, 29962, 23421, 2, 1, 518, 25580, 29962, 29871, 29896, 29906, 29941, 518, 29914, 25580, 29962, 7845, 2]
-        # fmt: on
-
-    def test_sharegpt_mistral(self):
-        "Make sure the sharegpt/mistral is tokenized and formatted correctly."
-        strat = prompt_strat("mistral", self.tokenizer)
-
-        def tokenize(conv):
-            return strat.tokenize_prompt(deepcopy(conv))["input_ids"]
-
-        def decode(ids):
-            return strat.tokenizer.decode(ids)
-
-        # fmt: off
-        # System message, multi-turn conversations
-        mt_ids = tokenize(test_data['multi_turn_sys'])
-        assert decode(mt_ids) == '<s> [INST]  lorem\nabc [/INST] ipsum</s> [INST] 123 [/INST] sit</s>'
-        assert mt_ids == [1, 518, 25580, 29962, 29871, 301, 3668, 13, 10736, 518, 29914, 25580, 29962, 23421, 2, 518, 25580, 29962, 29871, 29896, 29906, 29941, 518, 29914, 25580, 29962, 7845, 2]
-
-        # System message, single-turn conversations
-        st_ids = tokenize(test_data['single_turn_sys'])
-        assert decode(st_ids) == '<s> [INST]  lorem\nabc [/INST] ipsum</s>'
-        assert st_ids == [1, 518, 25580, 29962, 29871, 301, 3668, 13, 10736, 518, 29914, 25580, 29962, 23421, 2]
-
-        # No system message, single-turn
-        ns_ids = tokenize(test_data['single_turn_no_sys'])
-        assert decode(ns_ids) == '<s> [INST] abc [/INST] ipsum</s>'
-        assert ns_ids == [1, 518, 25580, 29962, 25638, 518, 29914, 25580, 29962, 23421, 2]
-
-        # No system message, multi-turn
-        ns_mt_ids = tokenize(test_data['multi_turn_no_sys'])
-        assert decode(ns_mt_ids) == '<s> [INST] abc [/INST] ipsum</s> [INST] 123 [/INST] sit</s>'
-        assert ns_mt_ids == [1, 518, 25580, 29962, 25638, 518, 29914, 25580, 29962, 23421, 2, 518, 25580, 29962, 29871, 29896, 29906, 29941, 518, 29914, 25580, 29962, 7845, 2]
-        # fmt: on
-
-    def test_sharegpt_changes_roles(self):
-        conversation = {
-            "roles": ["USER", "CHARACTER"],
-            "conversations": [
-                {"from": "system", "value": "lorem"},
-                {"from": "gpt", "value": "ipsum"},
-                {"from": "human", "value": "dolor"},
-                {"from": "gpt", "value": "sit"},
-            ],
-        }
-        prompter = ShareGPTPrompterV2()
-        strat = ShareGPTPromptTokenizingStrategy(
-            prompter,
-            self.tokenizer,
-            False,
-            2048,
-        )
-        with self._caplog.at_level(logging.WARNING):
-            res = strat.tokenize_prompt(conversation)
-            assert "CHARACTER" in self.tokenizer.decode(res["input_ids"])
-
-    def test_sharegpt_assistant_label_ignore(self):
-        conversation = {
-            "roles": ["user", "assistant"],
-            "conversations": [
-                {"from": "system", "value": "lorem"},
-                {"from": "gpt", "value": "ipsum"},
-                {"from": "human", "value": "dolor"},
-                {"from": "gpt", "value": "sit"},
-            ],
-        }
-        prompter = ShareGPTPrompterV2()
-        strat = ShareGPTPromptTokenizingStrategy(
-            prompter,
-            self.tokenizer,
-            False,
-            2048,
-        )
-        with self._caplog.at_level(logging.WARNING):
-            res = strat.tokenize_prompt(conversation)
-            idx = res["input_ids"].index(20255)  # assistant token
-            assert res["labels"][idx] == -100
-
-    def test_glaive_tool_label_ignore(self):
-        conversation = {
-            "system": "SYSTEM: This is a system prompt",
-            "chat": "USER: Can you book a flight for me from New York to London? ASSISTANT: I'm sorry, but I don't have the capability to book flights.  <|endoftext|>",
-        }
-        prompter = ShareGPTPrompterV2()
-        strat = GlaiveShareGPTPromptTokenizingStrategy(
-            prompter,
-            self.tokenizer,
-            False,
-            2048,
-        )
-        with self._caplog.at_level(logging.WARNING):
-            res = strat.tokenize_prompt(conversation)
-            idx = res["input_ids"].index(13566)  # assistant token
-            assert res["labels"][idx] == -100
-
    def test_no_sys_prompt(self):
        """
        tests the interface between the user and assistant parts
--- a/tests/test_validation.py
+++ b/tests/test_validation.py
@@ -646,39 +646,6 @@ class TestValidation(BaseValidation):

        validate_config(cfg)

-    def test_sharegpt_deprecation(self, minimal_cfg):
-        cfg = (
-            DictDefault(
-                {"datasets": [{"path": "lorem/ipsum", "type": "sharegpt:chat"}]}
-            )
-            | minimal_cfg
-        )
-        with self._caplog.at_level(logging.WARNING):
-            new_cfg = validate_config(cfg)
-            assert any(
-                "`type: sharegpt:chat` will soon be deprecated." in record.message
-                for record in self._caplog.records
-            )
-        assert new_cfg.datasets[0].type == "sharegpt"
-
-        cfg = (
-            DictDefault(
-                {
-                    "datasets": [
-                        {"path": "lorem/ipsum", "type": "sharegpt_simple:load_role"}
-                    ]
-                }
-            )
-            | minimal_cfg
-        )
-        with self._caplog.at_level(logging.WARNING):
-            new_cfg = validate_config(cfg)
-            assert any(
-                "`type: sharegpt_simple` will soon be deprecated." in record.message
-                for record in self._caplog.records
-            )
-        assert new_cfg.datasets[0].type == "sharegpt:load_role"
-
    def test_no_conflict_save_strategy(self, minimal_cfg):
        cfg = (
            DictDefault(
--- a/tests/test_validation_dataset.py
+++ b/tests/test_validation_dataset.py
@@ -48,9 +48,8 @@ class TestValidationCheckDatasetConfig(BaseValidation):
            | {
                "datasets": [
                    {
-                        "path": "LDJnr/Puffin",
-                        "type": "sharegpt",
-                        "conversation": "chatml",
+                        "path": "mhenrichsen/alpaca_2k_test",
+                        "type": "alpaca",
                        "shards": 10,
                    }
                ]
@@ -62,7 +61,6 @@ class TestValidationCheckDatasetConfig(BaseValidation):
        def _check_config():
            assert checked_cfg.datasets[0].path == cfg.datasets[0].path
            assert checked_cfg.datasets[0].type == cfg.datasets[0].type
-            assert checked_cfg.datasets[0].conversation == cfg.datasets[0].conversation
            assert checked_cfg.datasets[0].shards == cfg.datasets[0].shards

        _check_config()
Author	SHA1	Message	Date
Wing Lian	e4af51eb66	remove direct dependency on fused dense lib (#2027 ) Some checks failed publish pypi / Upload release to PyPI (push) Has been cancelled Details	2024-11-08 14:48:04 -05:00
Wing Lian	e20b15bee3	make publish to pypi manually dispatchable as a workflow (#2026 ) [skip ci]	2024-11-08 14:18:16 -05:00
Wing Lian	d4796cb645	increment version to 0.5.0 for next release (#2025 ) [skip ci]	2024-11-08 14:02:25 -05:00
Wing Lian	fd3b80716a	remove fastchat and sharegpt (#2021 ) * remove fastchat and sharegpt * remove imports * remove more fastchat imports * chore: remove unused functions * feat: remove sharegpt and deprecate from docs * chore: remove unused sharegpt checks * fix: remove sharegpt type from tests * feat: add sharegpt deprecation error * feat: update readme --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2024-11-08 13:45:49 -05:00
Sunny Liu	3265b7095e	Add weighted optimisation support for trl DPO trainer integration (#2016 ) * trlv0.12.0 integration * update trl version requirements * linting * commenting out * trl version requirement	2024-11-08 11:29:11 -05:00
Wing Lian	3cb2d75de1	upgrade pytorch to 2.5.1 (#2024 )	2024-11-08 10:46:24 -05:00
Wing Lian	035e9f9dd7	janky workaround to install FA2 on torch 2.5.1 base image since it takes forever to build (#2022 )	2024-11-07 17:54:29 -05:00
Wing Lian	02ce520b7e	upgrade liger to 0.4.0 (#1973 ) * upgrade liger to 0.3.1 * update docs and example * skip duplicate code check * Update src/axolotl/integrations/liger/args.py Co-authored-by: NanoCode012 <nano@axolotl.ai> * Update README.md Co-authored-by: NanoCode012 <nano@axolotl.ai> * add logging * chore: lint * add test case * upgrade liger and transformers * also upgrade accelerate * use kwargs to support patch release * make sure prepared path is empty for test * use transfromers 4.46.1 since 4.46.2 breaks fsdp --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2024-11-07 12:53:34 -05:00