Merge branch 'main' into feat/glm45

2025-12-25 17:50:09 +07:00 · 2025-11-28 13:41:25 +07:00 · 2025-11-10 21:41:05 +07:00 · 2025-08-13 13:57:15 +07:00 · 2025-08-13 10:46:42 +07:00 · 2025-08-12 20:34:47 +07:00
6 changed files with 184 additions and 0 deletions
--- a/examples/glm45/README.md
+++ b/examples/glm45/README.md
@@ -0,0 +1,48 @@
 # Finetune GLM4.5 with Axolotl
 [UNSTABLE]
 ```bash
 # LoRA SFT (4xH200 @ 84GB/GPU)
 axolotl train examples/glm45/glm4.5-lora-fsdp2.yaml
 # FFT SFT (4xH200)
 # Checkpointing error on backward pass
 # Without checkpointing => OOM
 axolotl train examples/glm45/glm4.5-fft-fsdp2.yaml
 ```
 ## Dataset
 In addition to normal OpenAI Messages format, GLM4.5 support an extra parameter for thinking in assistant section.
 ```json
 {
    "role": "assistant",
    "reasoning_content": "...",  // or have </think>...</think> in `content`
    "content": "...",
 }
 ```
 Note:
 - The role name for tools in this template is `tool`.
 - You will see this Axolotl WARNING. This is to be as expected as the template does not use EOS.
 ```bash
 EOS token '<|endoftext|>' not found in chat_template. Please check if your template/EOS token is correct.
 ```
 - Make sure you set the below extra attributes if needed
 ```yaml
 datasets:
  - path: ...
    type: chat_template
    message_property_mappings:
      role: role
      content: content
    #   tool_calls: tool_calls  # uncomment if using tools
    #   reasoning_content: reasoning_content  # uncomment if have reasoning
 # Uncomment if training on tool role (you would rarely if ever need this)
 # eot_tokens:
 #   - <|observation|>
 ```
--- a/examples/glm45/glm4.5-fft-fsdp2.yaml
+++ b/examples/glm45/glm4.5-fft-fsdp2.yaml
@@ -0,0 +1,59 @@
 base_model: zai-org/GLM-4.5-Air
 # Automatically upload checkpoint and final model to HF
 # hub_model_id: username/custom_model_name
 plugins:
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
 experimental_skip_move_to_device: true  # prevent OOM by NOT putting model to GPU before sharding
 datasets:
  - path: winglian/pirate-ultrachat-10k
    type: chat_template
 dataset_prepared_path: last_run_prepared
 val_set_size: 0
 output_dir: ./outputs/qlora-out
 sequence_len: 2048
 sample_packing: true
 eval_sample_packing: true
 wandb_project:
 wandb_entity:
 wandb_watch:
 wandb_name:
 wandb_log_model:
 gradient_accumulation_steps: 1
 micro_batch_size: 1
 num_epochs: 1
 optimizer: adamw_torch_4bit
 lr_scheduler: cosine
 learning_rate: 0.0002
 bf16: auto
 tf32: false
 # gradient_checkpointing: true
 resume_from_checkpoint:
 logging_steps: 1
 flash_attention: true
 loss_watchdog_threshold: 5.0
 loss_watchdog_patience: 3
 warmup_ratio: 0.1
 evals_per_epoch: 1
 saves_per_epoch: 1
 weight_decay: 0.0
 special_tokens:
 fsdp_version: 2
 fsdp_config:
  offload_params: false
  cpu_ram_efficient_loading: true
  auto_wrap_policy: TRANSFORMER_BASED_WRAP
  transformer_layer_cls_to_wrap: Glm4MoeDecoderLayer
  state_dict_type: SHARDED_STATE_DICT
  reshard_after_forward: true
  activation_checkpointing: true
--- a/examples/glm45/glm4.5-lora-fsdp2.yaml
+++ b/examples/glm45/glm4.5-lora-fsdp2.yaml
@@ -0,0 +1,74 @@
 base_model: zai-org/GLM-4.5-Air
 # Automatically upload checkpoint and final model to HF
 # hub_model_id: username/custom_model_name
 plugins:
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
 experimental_skip_move_to_device: true  # prevent OOM by NOT putting model to GPU before sharding
 datasets:
  - path: winglian/pirate-ultrachat-10k
    type: chat_template
 dataset_prepared_path: last_run_prepared
 val_set_size: 0
 output_dir: ./outputs/qlora-out
 adapter: lora
 lora_model_dir:
 lora_r: 16
 lora_alpha: 32
 lora_dropout: 0.05
 lora_target_modules:
  - gate_proj
  - down_proj
  - up_proj
  - q_proj
  - v_proj
  - k_proj
  - o_proj
 sequence_len: 2048
 sample_packing: true
 eval_sample_packing: true
 wandb_project:
 wandb_entity:
 wandb_watch:
 wandb_name:
 wandb_log_model:
 gradient_accumulation_steps: 1
 micro_batch_size: 1
 num_epochs: 1
 optimizer: adamw_torch_4bit
 lr_scheduler: cosine
 learning_rate: 0.0002
 bf16: auto
 tf32: false
 # gradient_checkpointing: true
 resume_from_checkpoint:
 logging_steps: 1
 flash_attention: true
 loss_watchdog_threshold: 5.0
 loss_watchdog_patience: 3
 warmup_ratio: 0.1
 evals_per_epoch: 1
 saves_per_epoch: 1
 weight_decay: 0.0
 special_tokens:
 fsdp_version: 2
 fsdp_config:
  offload_params: false
  cpu_ram_efficient_loading: true
  auto_wrap_policy: TRANSFORMER_BASED_WRAP
  transformer_layer_cls_to_wrap: Glm4MoeDecoderLayer
  state_dict_type: SHARDED_STATE_DICT
  reshard_after_forward: true
  # activation_checkpointing: false
--- a/src/axolotl/common/architectures.py
+++ b/src/axolotl/common/architectures.py
@@ -14,6 +14,7 @@ MOE_ARCH_BLOCK = {
    "qwen3_moe": "Qwen3MoeSparseMoeBlock",
    "qwen3_vl_moe": "Qwen3VLMoeTextSparseMoeBlock",
    "deepseek_v2": "DeepseekV2MoE",
    "glm4_moe": "Glm4MoeMoE",
    "deepseek_v3": "DeepseekV3MoE",
    "gpt_oss": "GptOssDecoderLayer",
    "lfm2_moe": "Lfm2MoeSparseMoeBlock",
--- a/src/axolotl/integrations/cut_cross_entropy/README.md
+++ b/src/axolotl/integrations/cut_cross_entropy/README.md
@@ -44,6 +44,7 @@ plugins:
 - gemma3n_text
 - glm
 - glm4
 - glm_moe
 - glm4_moe
 - glm4v
 - glm4v_moe
--- a/src/axolotl/monkeypatch/multipack.py
+++ b/src/axolotl/monkeypatch/multipack.py
@@ -37,6 +37,7 @@ SUPPORTED_MULTIPACK_MODEL_TYPES = [
    "deepseek_v3",
    "glm",
    "glm4",
    "glm4_moe",
    "smollm3",
    "granite",
    "granitemoe",
Author	SHA1	Message	Date
NanoCode012	be1f8db913	Merge branch 'main' into feat/glm45	2025-12-25 17:50:09 +07:00
NanoCode012	a526647b31	Merge branch 'main' into feat/glm45	2025-11-28 13:41:25 +07:00
NanoCode012	8069177284	Merge branch 'main' into feat/glm45	2025-11-10 21:41:05 +07:00
NanoCode012	a28eb600e9	feat: add readme and better examples	2025-08-13 13:57:15 +07:00
NanoCode012	4b16f363bc	fix: move	2025-08-13 10:46:42 +07:00
NanoCode012	272a456ec0	fix: remove lora in fft config	2025-08-12 20:34:47 +07:00
NanoCode012	7e83268662	feat: add wip fft offload config	2025-08-12 20:34:47 +07:00
NanoCode012	b2a8c37a27	fix: use smaller model	2025-08-12 20:34:47 +07:00
NanoCode012	603166d9c5	feat: add example config	2025-08-12 20:34:47 +07:00
NanoCode012	e8c9517ac8	feat: add to multipack	2025-08-12 20:34:47 +07:00
NanoCode012	0bbad9202c	feat: add glm4moemoe to z3	2025-08-12 20:34:47 +07:00
NanoCode012	cb042e9775	feat: add cce for glm4_moe & deepseek v3	2025-08-12 20:32:46 +07:00