Compare commits
merge into: tocmo0nlord:streaming-on-the-fly-preprocess
tocmo0nlord:activeblue/main
tocmo0nlord:smol-ci
tocmo0nlord:attn-implementation-refactor
tocmo0nlord:gh-pages
tocmo0nlord:main
tocmo0nlord:torch-211-base
tocmo0nlord:kernelize-scattermoe-lora
tocmo0nlord:swe-rebench-rl-rebase
tocmo0nlord:vllm-0191
tocmo0nlord:weight-scale-norm
tocmo0nlord:fix/issue-1-build-deps
tocmo0nlord:fix/issue-2-flash-attn-install
tocmo0nlord:fix/issue-3-telemetry-whitelist
tocmo0nlord:fix/issue-4-deepspeed-optional
tocmo0nlord:fix/issue-5-8-docs
tocmo0nlord:fix/issue-6-default-attention
tocmo0nlord:fix/issue-7-hf-token-check
tocmo0nlord:scattermoe-nanotron
tocmo0nlord:textui
tocmo0nlord:lhl-moe-aux-loss-free
tocmo0nlord:tensorboard-loss-check
tocmo0nlord:fix/cp-waste
tocmo0nlord:scattermoe-lora-optim-dtypestest
tocmo0nlord:fix/merge-lora-fp32
tocmo0nlord:async-grpo-patched-v2
tocmo0nlord:uv-fixup
tocmo0nlord:tool-mpm
tocmo0nlord:transformers-itl-refactor
tocmo0nlord:feat/torchao-qlora
tocmo0nlord:accelerator-args-builder
tocmo0nlord:fix/gemma3-text-only
tocmo0nlord:liger-065
tocmo0nlord:feat/glmflash-other
tocmo0nlord:online-topk-kd
tocmo0nlord:release-v0.13.x
tocmo0nlord:version-dev
tocmo0nlord:dft
tocmo0nlord:dynamic-sft
tocmo0nlord:upgrade-torchao-0.15
tocmo0nlord:feat/glm45
tocmo0nlord:coderabbitai/docstrings/3e51a68
tocmo0nlord:transformers-4573
tocmo0nlord:fix/diffusion
tocmo0nlord:coderabbitai/docstrings/b234532
tocmo0nlord:fix/hpc-root
tocmo0nlord:liger-063
tocmo0nlord:uv-first
tocmo0nlord:fsdp2_fp32
tocmo0nlord:vendor-moe
tocmo0nlord:lora-fsdp2-doc
tocmo0nlord:cp-sdpa
tocmo0nlord:3181
tocmo0nlord:moekernels
tocmo0nlord:reentrant-w-offloading
tocmo0nlord:lora_bf16
tocmo0nlord:feat/lmeval-baseten
tocmo0nlord:streaming-v2
tocmo0nlord:squash_position_ids
tocmo0nlord:streaming
tocmo0nlord:tui
tocmo0nlord:streaming-on-the-fly-preprocess
tocmo0nlord:no-seq-len
tocmo0nlord:diffusion-next-token-trainer
tocmo0nlord:diffusion-custom-models
tocmo0nlord:diffusion-custom-loss
tocmo0nlord:release-v0.12.x
tocmo0nlord:split-batches-sizes
tocmo0nlord:fix-preview
tocmo0nlord:775-option-to-drop-vs-truncate-on-rows-longer-than-context-length
tocmo0nlord:fa-check
tocmo0nlord:chat-template-granite
tocmo0nlord:custom-modeling
tocmo0nlord:lora_kernels_fsdp
tocmo0nlord:testingci
tocmo0nlord:nd_parallel
tocmo0nlord:quantize-ptq-cli
tocmo0nlord:fix/granite-speech
tocmo0nlord:kwargs-refactor
tocmo0nlord:revert-2906-checkpoint-on-step-1
tocmo0nlord:torch_tensor_parallel
tocmo0nlord:fused-mlp-ez
tocmo0nlord:release-v0.11.x
tocmo0nlord:fix/rl-trainer-arg
tocmo0nlord:update-vllm
tocmo0nlord:print_venv
tocmo0nlord:shared-prepared-ci
tocmo0nlord:feat/phi_35_vision
tocmo0nlord:fix/gemma3n-text-attention
tocmo0nlord:map-dataset-fetcher-fix
tocmo0nlord:sp-restore-buffers
tocmo0nlord:dump-config
tocmo0nlord:fix/eval-accu
tocmo0nlord:chore/docstring-distributed
tocmo0nlord:release-0.10.x
tocmo0nlord:codecov-pulls-only
tocmo0nlord:feat/beautiful-readme
tocmo0nlord:sdpa-cp
tocmo0nlord:optimizer-compile
tocmo0nlord:mistral-support
tocmo0nlord:kd-fix-20250519-v2
tocmo0nlord:telemetry-opt-in
tocmo0nlord:devstral-support
tocmo0nlord:sac
tocmo0nlord:no-zero-ds-train
tocmo0nlord:axolotl-ci-hf
tocmo0nlord:fix/kd-trainer-num-items
tocmo0nlord:fa3-hopper
tocmo0nlord:rl-trainers-sp
tocmo0nlord:jagged-restart-lr-scheduler-v3
tocmo0nlord:wait-distributed-close
tocmo0nlord:coderabbitai/docstrings/QVUilv72ojQNaYsCLVNpUpfo2rK1ZU5x90oPNXYz0ZfsWzWSHca36pjgaU5JOtZOA4gNjbjVYxShdRmkm7fGSlW
tocmo0nlord:feat/wizard
tocmo0nlord:release-v0.9.x
tocmo0nlord:model-loader-refactor
tocmo0nlord:offload-activations-disk
tocmo0nlord:revert-multipack-changes
tocmo0nlord:xformers-wo-packing
tocmo0nlord:attention_enum
tocmo0nlord:datasets-351
tocmo0nlord:activations
tocmo0nlord:colab-misc-fixes
tocmo0nlord:colab-misc-fixes-test
tocmo0nlord:fix/vllm-version
tocmo0nlord:fix/dpo-labels
tocmo0nlord:lora-quant-state-offset
tocmo0nlord:llmcompressor-sft-v2
tocmo0nlord:llmcompressor-sft-wing
tocmo0nlord:runpod-sls
tocmo0nlord:llmcompressor-sft
tocmo0nlord:sp-rl-v3
tocmo0nlord:merged-2554
tocmo0nlord:feat_hqq
tocmo0nlord:smaller-rand-model
tocmo0nlord:preprocess_grpo-fix
tocmo0nlord:transformers-4513
tocmo0nlord:flex_patching_update
tocmo0nlord:maverick-example
tocmo0nlord:fix/doc-key
tocmo0nlord:fix/cce-linear
tocmo0nlord:llama-4-examples
tocmo0nlord:transformers-4511
tocmo0nlord:feat/liger-deepseekv3
tocmo0nlord:release-0.8.x
tocmo0nlord:sp-fix-masking
tocmo0nlord:llama4
tocmo0nlord:llama4-patches
tocmo0nlord:peft-update
tocmo0nlord:fsdp2
tocmo0nlord:llama-4-z3
tocmo0nlord:sp-rl
tocmo0nlord:lora-kernels-deepspeed
tocmo0nlord:muon-validation
tocmo0nlord:destroy-pg
tocmo0nlord:feat/soap-optim-v2
tocmo0nlord:fix/xformers
tocmo0nlord:mm_mc_chat
tocmo0nlord:quartodoc-fix
tocmo0nlord:quartodoc
tocmo0nlord:sequence-parallelism
tocmo0nlord:pre-commit-update
tocmo0nlord:cuda-12.8.1
tocmo0nlord:kd-logprob-data
tocmo0nlord:fix_kto
tocmo0nlord:kto_fix
tocmo0nlord:update-lgpl
tocmo0nlord:optimizers-refactor
tocmo0nlord:train-refactor
tocmo0nlord:fix/replace_jackllama
tocmo0nlord:seq-parallel-ring
tocmo0nlord:topk-logprobs-triton
tocmo0nlord:tp_support
tocmo0nlord:telemetry
tocmo0nlord:flx_attn_support
tocmo0nlord:grpo-ref-model-cleanup
tocmo0nlord:revert-2332-fix_sample_packing
tocmo0nlord:grpo_liger
tocmo0nlord:lora-kernels-doc-fix
tocmo0nlord:patch_lora_post_model_load
tocmo0nlord:pixtral_integration
tocmo0nlord:docs-lint-20250212
tocmo0nlord:bursteratom-doc-faq-update
tocmo0nlord:grpo-path-v2
tocmo0nlord:grpo-path
tocmo0nlord:feat/linearize
tocmo0nlord:kd-logits-view
tocmo0nlord:modal-upgrade-builder
tocmo0nlord:kd-trainer
tocmo0nlord:kd-trainer-zscore
tocmo0nlord:autodoc
tocmo0nlord:iterable-optional
tocmo0nlord:diff-transformer
tocmo0nlord:relaxed-recursive-transformers
tocmo0nlord:eos-hell
tocmo0nlord:hf-trainer-refactor
tocmo0nlord:chat-dataset-tool
tocmo0nlord:rala-v2
tocmo0nlord:rala
tocmo0nlord:kd-trainer-v2
tocmo0nlord:kd-trainer-pre
tocmo0nlord:kd-trainer-2
tocmo0nlord:fix-merge-lint-issue
tocmo0nlord:cli-refactor
tocmo0nlord:cli-cloud-modal-math-hard
tocmo0nlord:kd-trainer-rebased
tocmo0nlord:debug-hf-home-cache
tocmo0nlord:hymba_multipack2
tocmo0nlord:optimizer-checkpoint
tocmo0nlord:grouped_lr_squashed
tocmo0nlord:djsaunde-patch-1
tocmo0nlord:liger-dpo
tocmo0nlord:feat/pref_liger
tocmo0nlord:enable_tp
tocmo0nlord:pretrain-dataset
tocmo0nlord:pytest-each-flakey
tocmo0nlord:activation-offloading-torchtune
tocmo0nlord:base-model-readme-update
tocmo0nlord:e2e-fsdp-trainer
tocmo0nlord:docker-base-nvcr-pytorch
tocmo0nlord:transformers-4_47_0_v2
tocmo0nlord:sageattention
tocmo0nlord:zero3-8bit-lora
tocmo0nlord:phi-moe
tocmo0nlord:transformers-fsdp-check
tocmo0nlord:upgrade-trl-v0.12.0_2
tocmo0nlord:shampoo-low_bit
tocmo0nlord:upgrade-liger-test
tocmo0nlord:upgrade_liger-tr4.46.1
tocmo0nlord:soap-optim
tocmo0nlord:1991test
tocmo0nlord:1947fix
tocmo0nlord:cj_tokenizer_default_prompt_template
tocmo0nlord:feature/enable-huggingface-dataset-revision
tocmo0nlord:mm3
tocmo0nlord:mm2
tocmo0nlord:shampoo
tocmo0nlord:device-mesh
tocmo0nlord:remove-gptq-warn
tocmo0nlord:fixtypo
tocmo0nlord:fsdp-fft
tocmo0nlord:dpo-spawn-fix
tocmo0nlord:update-examples-llama3-ez
tocmo0nlord:q-galore
tocmo0nlord:fa-261
tocmo0nlord:deepspeed_0_14_4
tocmo0nlord:llama-multipack
tocmo0nlord:mora
tocmo0nlord:custom-trainer-cls
tocmo0nlord:olmo-no-position_ids
tocmo0nlord:nca-pair
tocmo0nlord:sppo
tocmo0nlord:fsdp-qdora
tocmo0nlord:fix-l3-lora
tocmo0nlord:merge-lora-tests
tocmo0nlord:save_only_model
tocmo0nlord:pytest-skip-s2
tocmo0nlord:fsdp-fix
tocmo0nlord:20240404-lisa-determinism
tocmo0nlord:lisa
tocmo0nlord:main-base
tocmo0nlord:4bit-optimizers
tocmo0nlord:scatter_moe
tocmo0nlord:scatter_moe_eric
tocmo0nlord:fix-ddp_find_unused_parameters
tocmo0nlord:llama-flash-attn-fix
tocmo0nlord:sharegpt-field-conversations
tocmo0nlord:20240307-updates
tocmo0nlord:flash-attn-2_5_5
tocmo0nlord:20240216-updates
tocmo0nlord:streaming-remote-dataset
tocmo0nlord:feat/spaces-ui
tocmo0nlord:multipack-dpo
tocmo0nlord:sdpa-multipack
tocmo0nlord:flash-attn-fix-patches-wo-sample-packing
tocmo0nlord:deepspeed-low-cpu-mem
tocmo0nlord:keep_in_memory
tocmo0nlord:NanoCode012-patch-1
tocmo0nlord:yayi2
tocmo0nlord:hamelsmu-patch-1
tocmo0nlord:mixtral_optimized
tocmo0nlord:20231212-fixes
tocmo0nlord:mixtral_swiglu
tocmo0nlord:refactor-flash-attention
tocmo0nlord:unsloth_modules
tocmo0nlord:multipack-pretraining
tocmo0nlord:completion-json
tocmo0nlord:tinyllama-example
tocmo0nlord:fp8
tocmo0nlord:tensor-parallel
tocmo0nlord:llava
tocmo0nlord:docker-cleanup-20231029
tocmo0nlord:llava-train
tocmo0nlord:ia3-peft
tocmo0nlord:neft-v2
tocmo0nlord:sharegpt-batched
tocmo0nlord:llama-dropout
tocmo0nlord:20230920-btlm
tocmo0nlord:datasets-refactor
tocmo0nlord:multi-gpu-state
tocmo0nlord:autogptq-tests
tocmo0nlord:fsdp-defaults
tocmo0nlord:benchmark-callbacks-next
tocmo0nlord:merge-lora-on-complete
tocmo0nlord:latent-space
tocmo0nlord:attn-patches
tocmo0nlord:embeddings-resize
tocmo0nlord:feature/attn-patches
tocmo0nlord:feature/relora-rebased
tocmo0nlord:packing-attn-limit-fa2-rebased
tocmo0nlord:ssmi-main
tocmo0nlord:openorca-fix-mask
tocmo0nlord:multipack
tocmo0nlord:openorca
tocmo0nlord:openorca-v2
tocmo0nlord:compute-perplexity-metrics
tocmo0nlord:dev-base
tocmo0nlord:flan-no-bos
tocmo0nlord:no-bos-tokens-packing
tocmo0nlord:exp-expand-len
tocmo0nlord:stable
tocmo0nlord:v0.16.1
tocmo0nlord:v0.16.0
tocmo0nlord:v0.15.0
tocmo0nlord:v0.14.0
tocmo0nlord:v0.13.2
tocmo0nlord:v0.13.1
tocmo0nlord:v0.13.0
tocmo0nlord:v0.12.2
tocmo0nlord:v0.12.1
tocmo0nlord:v0.12.0
tocmo0nlord:v0.11.0.post1
tocmo0nlord:v0.11.0
tocmo0nlord:v0.10.1
tocmo0nlord:v0.10.0
tocmo0nlord:v0.9.2
tocmo0nlord:v0.9.1.post1
tocmo0nlord:v0.9.1
tocmo0nlord:v0.9.0
tocmo0nlord:v0.8.1
tocmo0nlord:v0.8.0
tocmo0nlord:v0.7.1
tocmo0nlord:v0.7.0
tocmo0nlord:v0.6.0
tocmo0nlord:v0.5.2
tocmo0nlord:v0.5.1.post1
tocmo0nlord:v0.5.1
tocmo0nlord:v0.5.0
tocmo0nlord:v0.4.0
tocmo0nlord:v0.3.0
tocmo0nlord:v0.2.1
tocmo0nlord:v0.2.0
tocmo0nlord:v0.1.0
...
pull from: tocmo0nlord:sharegpt-field-conversations
tocmo0nlord:activeblue/main
tocmo0nlord:smol-ci
tocmo0nlord:attn-implementation-refactor
tocmo0nlord:gh-pages
tocmo0nlord:main
tocmo0nlord:torch-211-base
tocmo0nlord:kernelize-scattermoe-lora
tocmo0nlord:swe-rebench-rl-rebase
tocmo0nlord:vllm-0191
tocmo0nlord:weight-scale-norm
tocmo0nlord:fix/issue-1-build-deps
tocmo0nlord:fix/issue-2-flash-attn-install
tocmo0nlord:fix/issue-3-telemetry-whitelist
tocmo0nlord:fix/issue-4-deepspeed-optional
tocmo0nlord:fix/issue-5-8-docs
tocmo0nlord:fix/issue-6-default-attention
tocmo0nlord:fix/issue-7-hf-token-check
tocmo0nlord:scattermoe-nanotron
tocmo0nlord:textui
tocmo0nlord:lhl-moe-aux-loss-free
tocmo0nlord:tensorboard-loss-check
tocmo0nlord:fix/cp-waste
tocmo0nlord:scattermoe-lora-optim-dtypestest
tocmo0nlord:fix/merge-lora-fp32
tocmo0nlord:async-grpo-patched-v2
tocmo0nlord:uv-fixup
tocmo0nlord:tool-mpm
tocmo0nlord:transformers-itl-refactor
tocmo0nlord:feat/torchao-qlora
tocmo0nlord:accelerator-args-builder
tocmo0nlord:fix/gemma3-text-only
tocmo0nlord:liger-065
tocmo0nlord:feat/glmflash-other
tocmo0nlord:online-topk-kd
tocmo0nlord:release-v0.13.x
tocmo0nlord:version-dev
tocmo0nlord:dft
tocmo0nlord:dynamic-sft
tocmo0nlord:upgrade-torchao-0.15
tocmo0nlord:feat/glm45
tocmo0nlord:coderabbitai/docstrings/3e51a68
tocmo0nlord:transformers-4573
tocmo0nlord:fix/diffusion
tocmo0nlord:coderabbitai/docstrings/b234532
tocmo0nlord:fix/hpc-root
tocmo0nlord:liger-063
tocmo0nlord:uv-first
tocmo0nlord:fsdp2_fp32
tocmo0nlord:vendor-moe
tocmo0nlord:lora-fsdp2-doc
tocmo0nlord:cp-sdpa
tocmo0nlord:3181
tocmo0nlord:moekernels
tocmo0nlord:reentrant-w-offloading
tocmo0nlord:lora_bf16
tocmo0nlord:feat/lmeval-baseten
tocmo0nlord:streaming-v2
tocmo0nlord:squash_position_ids
tocmo0nlord:streaming
tocmo0nlord:tui
tocmo0nlord:streaming-on-the-fly-preprocess
tocmo0nlord:no-seq-len
tocmo0nlord:diffusion-next-token-trainer
tocmo0nlord:diffusion-custom-models
tocmo0nlord:diffusion-custom-loss
tocmo0nlord:release-v0.12.x
tocmo0nlord:split-batches-sizes
tocmo0nlord:fix-preview
tocmo0nlord:775-option-to-drop-vs-truncate-on-rows-longer-than-context-length
tocmo0nlord:fa-check
tocmo0nlord:chat-template-granite
tocmo0nlord:custom-modeling
tocmo0nlord:lora_kernels_fsdp
tocmo0nlord:testingci
tocmo0nlord:nd_parallel
tocmo0nlord:quantize-ptq-cli
tocmo0nlord:fix/granite-speech
tocmo0nlord:kwargs-refactor
tocmo0nlord:revert-2906-checkpoint-on-step-1
tocmo0nlord:torch_tensor_parallel
tocmo0nlord:fused-mlp-ez
tocmo0nlord:release-v0.11.x
tocmo0nlord:fix/rl-trainer-arg
tocmo0nlord:update-vllm
tocmo0nlord:print_venv
tocmo0nlord:shared-prepared-ci
tocmo0nlord:feat/phi_35_vision
tocmo0nlord:fix/gemma3n-text-attention
tocmo0nlord:map-dataset-fetcher-fix
tocmo0nlord:sp-restore-buffers
tocmo0nlord:dump-config
tocmo0nlord:fix/eval-accu
tocmo0nlord:chore/docstring-distributed
tocmo0nlord:release-0.10.x
tocmo0nlord:codecov-pulls-only
tocmo0nlord:feat/beautiful-readme
tocmo0nlord:sdpa-cp
tocmo0nlord:optimizer-compile
tocmo0nlord:mistral-support
tocmo0nlord:kd-fix-20250519-v2
tocmo0nlord:telemetry-opt-in
tocmo0nlord:devstral-support
tocmo0nlord:sac
tocmo0nlord:no-zero-ds-train
tocmo0nlord:axolotl-ci-hf
tocmo0nlord:fix/kd-trainer-num-items
tocmo0nlord:fa3-hopper
tocmo0nlord:rl-trainers-sp
tocmo0nlord:jagged-restart-lr-scheduler-v3
tocmo0nlord:wait-distributed-close
tocmo0nlord:coderabbitai/docstrings/QVUilv72ojQNaYsCLVNpUpfo2rK1ZU5x90oPNXYz0ZfsWzWSHca36pjgaU5JOtZOA4gNjbjVYxShdRmkm7fGSlW
tocmo0nlord:feat/wizard
tocmo0nlord:release-v0.9.x
tocmo0nlord:model-loader-refactor
tocmo0nlord:offload-activations-disk
tocmo0nlord:revert-multipack-changes
tocmo0nlord:xformers-wo-packing
tocmo0nlord:attention_enum
tocmo0nlord:datasets-351
tocmo0nlord:activations
tocmo0nlord:colab-misc-fixes
tocmo0nlord:colab-misc-fixes-test
tocmo0nlord:fix/vllm-version
tocmo0nlord:fix/dpo-labels
tocmo0nlord:lora-quant-state-offset
tocmo0nlord:llmcompressor-sft-v2
tocmo0nlord:llmcompressor-sft-wing
tocmo0nlord:runpod-sls
tocmo0nlord:llmcompressor-sft
tocmo0nlord:sp-rl-v3
tocmo0nlord:merged-2554
tocmo0nlord:feat_hqq
tocmo0nlord:smaller-rand-model
tocmo0nlord:preprocess_grpo-fix
tocmo0nlord:transformers-4513
tocmo0nlord:flex_patching_update
tocmo0nlord:maverick-example
tocmo0nlord:fix/doc-key
tocmo0nlord:fix/cce-linear
tocmo0nlord:llama-4-examples
tocmo0nlord:transformers-4511
tocmo0nlord:feat/liger-deepseekv3
tocmo0nlord:release-0.8.x
tocmo0nlord:sp-fix-masking
tocmo0nlord:llama4
tocmo0nlord:llama4-patches
tocmo0nlord:peft-update
tocmo0nlord:fsdp2
tocmo0nlord:llama-4-z3
tocmo0nlord:sp-rl
tocmo0nlord:lora-kernels-deepspeed
tocmo0nlord:muon-validation
tocmo0nlord:destroy-pg
tocmo0nlord:feat/soap-optim-v2
tocmo0nlord:fix/xformers
tocmo0nlord:mm_mc_chat
tocmo0nlord:quartodoc-fix
tocmo0nlord:quartodoc
tocmo0nlord:sequence-parallelism
tocmo0nlord:pre-commit-update
tocmo0nlord:cuda-12.8.1
tocmo0nlord:kd-logprob-data
tocmo0nlord:fix_kto
tocmo0nlord:kto_fix
tocmo0nlord:update-lgpl
tocmo0nlord:optimizers-refactor
tocmo0nlord:train-refactor
tocmo0nlord:fix/replace_jackllama
tocmo0nlord:seq-parallel-ring
tocmo0nlord:topk-logprobs-triton
tocmo0nlord:tp_support
tocmo0nlord:telemetry
tocmo0nlord:flx_attn_support
tocmo0nlord:grpo-ref-model-cleanup
tocmo0nlord:revert-2332-fix_sample_packing
tocmo0nlord:grpo_liger
tocmo0nlord:lora-kernels-doc-fix
tocmo0nlord:patch_lora_post_model_load
tocmo0nlord:pixtral_integration
tocmo0nlord:docs-lint-20250212
tocmo0nlord:bursteratom-doc-faq-update
tocmo0nlord:grpo-path-v2
tocmo0nlord:grpo-path
tocmo0nlord:feat/linearize
tocmo0nlord:kd-logits-view
tocmo0nlord:modal-upgrade-builder
tocmo0nlord:kd-trainer
tocmo0nlord:kd-trainer-zscore
tocmo0nlord:autodoc
tocmo0nlord:iterable-optional
tocmo0nlord:diff-transformer
tocmo0nlord:relaxed-recursive-transformers
tocmo0nlord:eos-hell
tocmo0nlord:hf-trainer-refactor
tocmo0nlord:chat-dataset-tool
tocmo0nlord:rala-v2
tocmo0nlord:rala
tocmo0nlord:kd-trainer-v2
tocmo0nlord:kd-trainer-pre
tocmo0nlord:kd-trainer-2
tocmo0nlord:fix-merge-lint-issue
tocmo0nlord:cli-refactor
tocmo0nlord:cli-cloud-modal-math-hard
tocmo0nlord:kd-trainer-rebased
tocmo0nlord:debug-hf-home-cache
tocmo0nlord:hymba_multipack2
tocmo0nlord:optimizer-checkpoint
tocmo0nlord:grouped_lr_squashed
tocmo0nlord:djsaunde-patch-1
tocmo0nlord:liger-dpo
tocmo0nlord:feat/pref_liger
tocmo0nlord:enable_tp
tocmo0nlord:pretrain-dataset
tocmo0nlord:pytest-each-flakey
tocmo0nlord:activation-offloading-torchtune
tocmo0nlord:base-model-readme-update
tocmo0nlord:e2e-fsdp-trainer
tocmo0nlord:docker-base-nvcr-pytorch
tocmo0nlord:transformers-4_47_0_v2
tocmo0nlord:sageattention
tocmo0nlord:zero3-8bit-lora
tocmo0nlord:phi-moe
tocmo0nlord:transformers-fsdp-check
tocmo0nlord:upgrade-trl-v0.12.0_2
tocmo0nlord:shampoo-low_bit
tocmo0nlord:upgrade-liger-test
tocmo0nlord:upgrade_liger-tr4.46.1
tocmo0nlord:soap-optim
tocmo0nlord:1991test
tocmo0nlord:1947fix
tocmo0nlord:cj_tokenizer_default_prompt_template
tocmo0nlord:feature/enable-huggingface-dataset-revision
tocmo0nlord:mm3
tocmo0nlord:mm2
tocmo0nlord:shampoo
tocmo0nlord:device-mesh
tocmo0nlord:remove-gptq-warn
tocmo0nlord:fixtypo
tocmo0nlord:fsdp-fft
tocmo0nlord:dpo-spawn-fix
tocmo0nlord:update-examples-llama3-ez
tocmo0nlord:q-galore
tocmo0nlord:fa-261
tocmo0nlord:deepspeed_0_14_4
tocmo0nlord:llama-multipack
tocmo0nlord:mora
tocmo0nlord:custom-trainer-cls
tocmo0nlord:olmo-no-position_ids
tocmo0nlord:nca-pair
tocmo0nlord:sppo
tocmo0nlord:fsdp-qdora
tocmo0nlord:fix-l3-lora
tocmo0nlord:merge-lora-tests
tocmo0nlord:save_only_model
tocmo0nlord:pytest-skip-s2
tocmo0nlord:fsdp-fix
tocmo0nlord:20240404-lisa-determinism
tocmo0nlord:lisa
tocmo0nlord:main-base
tocmo0nlord:4bit-optimizers
tocmo0nlord:scatter_moe
tocmo0nlord:scatter_moe_eric
tocmo0nlord:fix-ddp_find_unused_parameters
tocmo0nlord:llama-flash-attn-fix
tocmo0nlord:sharegpt-field-conversations
tocmo0nlord:20240307-updates
tocmo0nlord:flash-attn-2_5_5
tocmo0nlord:20240216-updates
tocmo0nlord:streaming-remote-dataset
tocmo0nlord:feat/spaces-ui
tocmo0nlord:multipack-dpo
tocmo0nlord:sdpa-multipack
tocmo0nlord:flash-attn-fix-patches-wo-sample-packing
tocmo0nlord:deepspeed-low-cpu-mem
tocmo0nlord:keep_in_memory
tocmo0nlord:NanoCode012-patch-1
tocmo0nlord:yayi2
tocmo0nlord:hamelsmu-patch-1
tocmo0nlord:mixtral_optimized
tocmo0nlord:20231212-fixes
tocmo0nlord:mixtral_swiglu
tocmo0nlord:refactor-flash-attention
tocmo0nlord:unsloth_modules
tocmo0nlord:multipack-pretraining
tocmo0nlord:completion-json
tocmo0nlord:tinyllama-example
tocmo0nlord:fp8
tocmo0nlord:tensor-parallel
tocmo0nlord:llava
tocmo0nlord:docker-cleanup-20231029
tocmo0nlord:llava-train
tocmo0nlord:ia3-peft
tocmo0nlord:neft-v2
tocmo0nlord:sharegpt-batched
tocmo0nlord:llama-dropout
tocmo0nlord:20230920-btlm
tocmo0nlord:datasets-refactor
tocmo0nlord:multi-gpu-state
tocmo0nlord:autogptq-tests
tocmo0nlord:fsdp-defaults
tocmo0nlord:benchmark-callbacks-next
tocmo0nlord:merge-lora-on-complete
tocmo0nlord:latent-space
tocmo0nlord:attn-patches
tocmo0nlord:embeddings-resize
tocmo0nlord:feature/attn-patches
tocmo0nlord:feature/relora-rebased
tocmo0nlord:packing-attn-limit-fa2-rebased
tocmo0nlord:ssmi-main
tocmo0nlord:openorca-fix-mask
tocmo0nlord:multipack
tocmo0nlord:openorca
tocmo0nlord:openorca-v2
tocmo0nlord:compute-perplexity-metrics
tocmo0nlord:dev-base
tocmo0nlord:flan-no-bos
tocmo0nlord:no-bos-tokens-packing
tocmo0nlord:exp-expand-len
tocmo0nlord:stable
tocmo0nlord:v0.16.1
tocmo0nlord:v0.16.0
tocmo0nlord:v0.15.0
tocmo0nlord:v0.14.0
tocmo0nlord:v0.13.2
tocmo0nlord:v0.13.1
tocmo0nlord:v0.13.0
tocmo0nlord:v0.12.2
tocmo0nlord:v0.12.1
tocmo0nlord:v0.12.0
tocmo0nlord:v0.11.0.post1
tocmo0nlord:v0.11.0
tocmo0nlord:v0.10.1
tocmo0nlord:v0.10.0
tocmo0nlord:v0.9.2
tocmo0nlord:v0.9.1.post1
tocmo0nlord:v0.9.1
tocmo0nlord:v0.9.0
tocmo0nlord:v0.8.1
tocmo0nlord:v0.8.0
tocmo0nlord:v0.7.1
tocmo0nlord:v0.7.0
tocmo0nlord:v0.6.0
tocmo0nlord:v0.5.2
tocmo0nlord:v0.5.1.post1
tocmo0nlord:v0.5.1
tocmo0nlord:v0.5.0
tocmo0nlord:v0.4.0
tocmo0nlord:v0.3.0
tocmo0nlord:v0.2.1
tocmo0nlord:v0.2.0
tocmo0nlord:v0.1.0
1 Commits
streaming-
...
sharegpt-f
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
b7fe46579d | make the conversations/messages field configurable for sharegpt |
1 changed files with 12 additions and 1 deletions
|
|
@@ -39,6 +39,8 @@ def load(tokenizer, cfg, ds_cfg: Optional[Dict[str, Any]] = None):
|
|||
)
|
||||
if ds_cfg and "strict" in ds_cfg:
|
||||
strategy.strict = ds_cfg["strict"]
|
||||
if ds_cfg and "field_messages" in ds_cfg:
|
||||
strategy.field_messages = ds_cfg["field_messages"]
|
||||
return strategy
|
||||
|
||||
|
||||
|
|
@@ -83,6 +85,7 @@ class SimpleShareGPTPromptTokenizingStrategy(ShareGPTPromptTokenizingStrategy):
|
|||
"""
|
||||
|
||||
_strict = False
|
||||
_field_messages = "conversations"
|
||||
|
||||
@property
|
||||
def strict(self):
|
||||
|
|
@@ -92,8 +95,16 @@ class SimpleShareGPTPromptTokenizingStrategy(ShareGPTPromptTokenizingStrategy):
|
|||
def strict(self, strict):
|
||||
self._strict = strict
|
||||
|
||||
@property
|
||||
def field_messages(self):
|
||||
return self._strict
|
||||
|
||||
@field_messages.setter
|
||||
def field_messages(self, field_messages):
|
||||
self._field_messages = field_messages
|
||||
|
||||
def get_conversation_thread(self, prompt):
|
||||
conversations = prompt["conversations"]
|
||||
conversations = prompt[self.field_messages]
|
||||
if self.strict:
|
||||
return conversations
|
||||
role_key = "from"
|
||||
|
|
|
|||
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.