* checkpoint model on first step callback * remove debug * add test cases; update existing tests not to save on first step * move test out of solo * delete * default to False * typo
31 lines
619 B
YAML
31 lines
619 B
YAML
project_name:
|
|
volumes:
|
|
- name: axolotl-data
|
|
mount: /workspace/data
|
|
- name: axolotl-artifacts
|
|
mount: /workspace/artifacts
|
|
|
|
# environment variables from local to set as secrets
|
|
secrets:
|
|
- HF_TOKEN
|
|
- WANDB_API_KEY
|
|
|
|
# Which branch of axolotl to use remotely
|
|
branch:
|
|
|
|
# additional custom commands when building the image
|
|
dockerfile_commands:
|
|
|
|
gpu: h100
|
|
gpu_count: 1
|
|
|
|
# Train specific configurations
|
|
memory: 128
|
|
timeout: 86400
|
|
|
|
# Preprocess specific configurations
|
|
memory_preprocess: 32
|
|
timeout_preprocess: 14400
|
|
|
|
# save_first_step: true # uncomment this to validate checkpoint saving works with your config
|