axolotl/docs at efa1da52d500ae0dfa4cc192523a0a587611f020 - axolotl - Gitea

tocmo0nlord/axolotl

Files

History

yardenhoch efa1da52d5 Center rewards coefficient (#3124 )

* feat: add center_rewards_coefficient for reward modeling

- Add center_rewards_coefficient parameter to Pydantic schema with paper reference
- Pass parameter through base builder and causal builder to training args
- Add documentation section with usage examples and theoretical background
- Enable parameter in reward modeling example configs with recommended value
- Enables reward centering for improved training stability in RLHF workflows

Implements auxiliary loss from Eisenstein et al. 2023 (https://huggingface.co/papers/2312.09244)
to incentivize mean-zero reward outputs without post-training normalization.

* Update description

* test: add unit tests for center_rewards_coefficient integration

* Update src/axolotl/core/builders/base.py

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* Update docs/reward_modelling.qmd

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* Update docs/reward_modelling.qmd

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* reference to TRL documentation.

* add new reward model configuration for qwen3 with comprehensive parameters

* Verified center_rewards_coefficient is correctly passed through the trainer builder to training arguments.

* Refactor reward modeling documentation to consolidate information on center_rewards_coefficient

* Remove unit tests for center_rewards_coefficient integration as part of codebase cleanup.

* linting

* nit

* Apply suggestions from code review

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>

* lint

---------

Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
Co-authored-by: Salman Mohammadi <salman.mohammadi@outlook.com>

2025-09-03 16:22:37 -04:00

..

dataset-formats

feat(doc): update gpt-oss readme (#3029 ) [skip ci]

2025-08-07 09:26:42 -04:00

Ray Train Axolotl Integration (#2251 )

2025-01-29 00:10:19 -05:00

Add ruff, remove black, isort, flake8, pylint (#3092 )

2025-08-23 23:37:33 -04:00

.gitignore

Config doc autogen (#2718 )

2025-06-18 15:36:53 -04:00

amd_hpc.qmd

Feat(doc): Reorganize documentation, fix broken syntax, update notes (#2348 )

2025-02-25 16:09:37 +07:00

batch_vs_grad.qmd

Feat: update doc (#1475 ) [skip ci]

2024-04-04 13:43:40 +09:00

cli.qmd

CLI: add --launcher option, support launcher args, cleanup, refactor (#2924 )

2025-07-30 15:46:56 -04:00

custom_integrations.qmd

densemixer plugin integration (#2868 )

2025-07-07 17:05:19 -04:00

dataset_loading.qmd

Config doc autogen (#2718 )

2025-06-18 15:36:53 -04:00

dataset_preprocessing.qmd

Autodoc generation with quartodoc (#2419 )

2025-03-21 12:26:47 -04:00

debugging.qmd

Feat(doc): Reorganize documentation, fix broken syntax, update notes (#2348 )

2025-02-25 16:09:37 +07:00

docker.qmd

feat(doc): re-add docker 2.7.0 tag back (#2902 ) [skip ci]

2025-07-12 11:40:01 -04:00

faq.qmd

Act offload lora fix (#2928 ) [skip ci]

2025-07-24 16:10:04 -04:00

fsdp_qlora.qmd

Fix link in FSDP + QLoRA docs. (#2879 ) [skip ci]

2025-07-08 09:19:09 -04:00

getting-started.qmd

Config doc autogen (#2718 )

2025-06-18 15:36:53 -04:00

gradient_checkpointing.qmd

Activation Offloading w CUDA Streams (#2900 ) [skip ci]

2025-07-14 20:10:20 -04:00

inference.qmd

Feat: minor docs improvements for RLHF and faq on embeddings (#2401 ) [skip ci]

2025-03-17 08:39:04 -04:00

input_output.qmd

Feat(doc): Reorganize documentation, fix broken syntax, update notes (#2348 )

2025-02-25 16:09:37 +07:00

installation.qmd

feat(doc): improve visibility for colab notebooks (#3110 ) [skip ci]

2025-09-03 01:40:53 -04:00

lora_optims.qmd

feat(doc): note lora kernel incompat with RLHF (#2706 ) [skip ci]

2025-05-28 15:48:40 +07:00

lr_groups.qmd

support for custom lr groups for non-embedding modules (#2213 )

2025-01-24 12:56:28 -05:00

mac.qmd

Feat(doc): Reorganize documentation, fix broken syntax, update notes (#2348 )

2025-02-25 16:09:37 +07:00

mixed_precision.qmd

basic torchao fp8 mixed precision training (#2926 )

2025-07-22 16:27:47 -04:00

multi-gpu.qmd

automatically set env vars for single gpu deepspeed zero3 (#3118 ) [skip ci]

2025-08-29 13:36:47 -04:00

multi-node.qmd

CLI: add --launcher option, support launcher args, cleanup, refactor (#2924 )

2025-07-30 15:46:56 -04:00

multimodal.qmd

Various fixes for VLMs (#3063 )

2025-08-15 10:52:57 -04:00

multipack.qmd

Bootstrap Hosted Axolotl Docs w/Quarto (#1429 )

2024-03-21 22:28:36 -07:00

nccl.qmd

Feat(doc): Reorganize documentation, fix broken syntax, update notes (#2348 )

2025-02-25 16:09:37 +07:00

nd_parallelism.qmd

feat: update nd parallelism readme (#3039 )

2025-08-08 12:45:36 +01:00

optimizers.qmd

feat: add complete optimizer docs (#3017 ) [skip ci]

2025-08-06 08:01:51 -04:00

qat.qmd

QAT docfix (#2778 ) [skip ci]

2025-06-12 13:22:40 -04:00

quantize.qmd

Config doc autogen (#2718 )

2025-06-18 15:36:53 -04:00

ray-integration.qmd

Feat(doc): Reorganize documentation, fix broken syntax, update notes (#2348 )

2025-02-25 16:09:37 +07:00

reward_modelling.qmd

Center rewards coefficient (#3124 )

2025-09-03 16:22:37 -04:00

rlhf.qmd

fix: customized dataset with simpo (#2894 ) [skip ci]

2025-07-12 11:40:30 -04:00

sequence_parallelism.qmd

Distributed/ND-Parallel (#2977 )

2025-07-31 15:25:02 -04:00

streaming.qmd

Streaming SFT support (#3101 )

2025-09-02 12:08:44 -04:00

torchao.qmd

Feat(doc): Reorganize documentation, fix broken syntax, update notes (#2348 )

2025-02-25 16:09:37 +07:00

unsloth.qmd

Feat(doc): Reorganize documentation, fix broken syntax, update notes (#2348 )

2025-02-25 16:09:37 +07:00