* current not clean working version move torch trainer to do_cli update code with config changes and clean up edit config cleanup add run name to trainer * address comments * use axolotl train in multigpu tests and add ray tests for multi-gpu * accelerate uses underscores for main_process_port arg * chore: lint * fix order of accelerate args * include ray train in docker images * current not clean working version move torch trainer to do_cli update code with config changes and clean up edit config cleanup add run name to trainer * address comments * use axolotl train in multigpu tests and add ray tests for multi-gpu * accelerate uses underscores for main_process_port arg * chore: lint * fix order of accelerate args * include ray train in docker images * fix bf16 resolution behavior * move dtype logic * x Signed-off-by: SumanthRH <sumanthrh@anyscale.com> * rename Signed-off-by: SumanthRH <sumanthrh@anyscale.com> * add to sidebar Signed-off-by: SumanthRH <sumanthrh@anyscale.com> * Apply suggestions from code review Co-authored-by: Eric Tang <46737979+erictang000@users.noreply.github.com> * Update docs/ray-integration.qmd Co-authored-by: Eric Tang <46737979+erictang000@users.noreply.github.com> * pre-commit fixes Signed-off-by: SumanthRH <sumanthrh@anyscale.com> * use output_dir instead of hardcoded saves path Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * bugfix storage dir * change type\ for resources_per_worker --------- Signed-off-by: SumanthRH <sumanthrh@anyscale.com> Co-authored-by: Wing Lian <wing@axolotl.ai> Co-authored-by: SumanthRH <sumanthrh@anyscale.com> Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com> Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>
40 lines
1.2 KiB
Docker
40 lines
1.2 KiB
Docker
ARG BASE_TAG=main-base
|
|
FROM axolotlai/axolotl-base:$BASE_TAG
|
|
|
|
ARG TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0 8.6+PTX"
|
|
ARG AXOLOTL_EXTRAS=""
|
|
ARG AXOLOTL_ARGS=""
|
|
ARG CUDA="118"
|
|
ARG PYTORCH_VERSION="2.1.2"
|
|
|
|
ENV PYTORCH_VERSION=$PYTORCH_VERSION
|
|
|
|
RUN apt-get update && \
|
|
apt-get install -y --allow-change-held-packages vim curl nano libnccl2 libnccl-dev rsync s3fs
|
|
|
|
WORKDIR /workspace
|
|
|
|
RUN git clone --depth=1 https://github.com/axolotl-ai-cloud/axolotl.git
|
|
|
|
WORKDIR /workspace/axolotl
|
|
|
|
# If AXOLOTL_EXTRAS is set, append it in brackets
|
|
RUN if [ "$AXOLOTL_EXTRAS" != "" ] ; then \
|
|
pip install --no-build-isolation -e .[deepspeed,flash-attn,optimizers,ray,$AXOLOTL_EXTRAS] $AXOLOTL_ARGS; \
|
|
else \
|
|
pip install --no-build-isolation -e .[deepspeed,flash-attn,optimizers,ray] $AXOLOTL_ARGS; \
|
|
fi
|
|
|
|
RUN python scripts/unsloth_install.py | sh
|
|
RUN python scripts/cutcrossentropy_install.py | sh
|
|
|
|
# So we can test the Docker image
|
|
RUN pip install pytest
|
|
|
|
# fix so that git fetch/pull from remote works
|
|
RUN git config remote.origin.fetch "+refs/heads/*:refs/remotes/origin/*" && \
|
|
git config --get remote.origin.fetch
|
|
|
|
# helper for huggingface-login cli
|
|
RUN git config --global credential.helper store
|