diff --git a/docs/rlhf.qmd b/docs/rlhf.qmd index 2033649cc..1eea42036 100644 --- a/docs/rlhf.qmd +++ b/docs/rlhf.qmd @@ -598,8 +598,6 @@ To see other examples of custom reward functions, please see [TRL GRPO Docs](htt To see all configs, please see [TRLConfig](https://github.com/axolotl-ai-cloud/axolotl/blob/v0.9.2/src/axolotl/utils/schemas/trl.py). #### OpenEnv Rollout Functions -```bash -pip insatll openenv-core``` GRPO supports custom rollout functions for OpenEnv-style environments, enabling interactive tasks like web browsing, code execution, or tool use. This allows you to implement custom generation logic that interacts with external environments. diff --git a/requirements.txt b/requirements.txt index 96b185197..a12a3941b 100644 --- a/requirements.txt +++ b/requirements.txt @@ -64,6 +64,7 @@ immutabledict==4.2.0 antlr4-python3-runtime==4.13.2 torchao==0.13.0 +openenv-core==0.1.0 schedulefree==1.4.1 axolotl-contribs-lgpl==0.0.7