* current not clean working version move torch trainer to do_cli update code with config changes and clean up edit config cleanup add run name to trainer * address comments * use axolotl train in multigpu tests and add ray tests for multi-gpu * accelerate uses underscores for main_process_port arg * chore: lint * fix order of accelerate args * include ray train in docker images * current not clean working version move torch trainer to do_cli update code with config changes and clean up edit config cleanup add run name to trainer * address comments * use axolotl train in multigpu tests and add ray tests for multi-gpu * accelerate uses underscores for main_process_port arg * chore: lint * fix order of accelerate args * include ray train in docker images * fix bf16 resolution behavior * move dtype logic * x Signed-off-by: SumanthRH <sumanthrh@anyscale.com> * rename Signed-off-by: SumanthRH <sumanthrh@anyscale.com> * add to sidebar Signed-off-by: SumanthRH <sumanthrh@anyscale.com> * Apply suggestions from code review Co-authored-by: Eric Tang <46737979+erictang000@users.noreply.github.com> * Update docs/ray-integration.qmd Co-authored-by: Eric Tang <46737979+erictang000@users.noreply.github.com> * pre-commit fixes Signed-off-by: SumanthRH <sumanthrh@anyscale.com> * use output_dir instead of hardcoded saves path Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * bugfix storage dir * change type\ for resources_per_worker --------- Signed-off-by: SumanthRH <sumanthrh@anyscale.com> Co-authored-by: Wing Lian <wing@axolotl.ai> Co-authored-by: SumanthRH <sumanthrh@anyscale.com> Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com> Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
52 lines
2.0 KiB
Django/Jinja
52 lines
2.0 KiB
Django/Jinja
FROM axolotlai/axolotl-base:{{ BASE_TAG }}
|
|
|
|
ENV TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0 8.6+PTX"
|
|
ENV AXOLOTL_EXTRAS="{{ AXOLOTL_EXTRAS }}"
|
|
ENV AXOLOTL_ARGS="{{ AXOLOTL_ARGS }}"
|
|
ENV CUDA="{{ CUDA }}"
|
|
ENV PYTORCH_VERSION="{{ PYTORCH_VERSION }}"
|
|
ENV GITHUB_REF="{{ GITHUB_REF }}"
|
|
ENV GITHUB_SHA="{{ GITHUB_SHA }}"
|
|
ENV NIGHTLY_BUILD="{{ NIGHTLY_BUILD }}"
|
|
ENV HF_HOME="{{ HF_HOME }}"
|
|
|
|
RUN apt-get update && \
|
|
apt-get install -y --allow-change-held-packages vim curl nano libnccl2 libnccl-dev
|
|
|
|
WORKDIR /workspace
|
|
|
|
RUN git clone --depth=1 https://github.com/axolotl-ai-cloud/axolotl.git
|
|
|
|
WORKDIR /workspace/axolotl
|
|
|
|
RUN git fetch origin +$GITHUB_REF && \
|
|
git checkout FETCH_HEAD
|
|
|
|
# If AXOLOTL_EXTRAS is set, append it in brackets
|
|
RUN if [ "$NIGHTLY_BUILD" = "true" ] ; then \
|
|
sed -i 's#^transformers.*#transformers @ git+https://github.com/huggingface/transformers.git@main#' requirements.txt; \
|
|
sed -i 's#^peft.*#peft @ git+https://github.com/huggingface/peft.git@main#' requirements.txt; \
|
|
sed -i 's#^accelerate.*#accelerate @ git+https://github.com/huggingface/accelerate.git@main#' requirements.txt; \
|
|
sed -i 's#^trl.*#trl @ git+https://github.com/huggingface/trl.git@main#' requirements.txt; \
|
|
sed -i 's#^datasets.*#datasets @ git+https://github.com/huggingface/datasets.git@main#' requirements.txt; \
|
|
fi
|
|
|
|
RUN if [ "$AXOLOTL_EXTRAS" != "" ] ; then \
|
|
pip install --no-build-isolation -e .[deepspeed,flash-attn,optimizers,ray,$AXOLOTL_EXTRAS] $AXOLOTL_ARGS; \
|
|
else \
|
|
pip install --no-build-isolation -e .[deepspeed,flash-attn,optimizers,ray] $AXOLOTL_ARGS; \
|
|
fi
|
|
|
|
RUN python scripts/unsloth_install.py | sh
|
|
RUN python scripts/cutcrossentropy_install.py | sh
|
|
|
|
# So we can test the Docker image
|
|
RUN pip install -r requirements-dev.txt -r requirements-tests.txt
|
|
|
|
# fix so that git fetch/pull from remote works
|
|
RUN git config remote.origin.fetch "+refs/heads/*:refs/remotes/origin/*" && \
|
|
git config --get remote.origin.fetch
|
|
|
|
# helper for huggingface-login cli
|
|
RUN git config --global credential.helper store
|