* initial telemetry manager impl
* adding todo
* updates
* updates
* progress on telemetry: config load, process, model load, train start / end, error tracking
* update error file path sanitization function; adding more error tracking
* updated sanitization logic, tests
* adding runtime metrics (cpu + gpu memory, steps/s, etc.)
* tests for runtime metrics telemetry and assoc. callback
* small update / fix
* simplifying path redaction
* sleep on all ranks in distributed setting
* adding back in base_model redaction w/ whitelist
* fix
* doc update
* improved redaction, send system info during model config load telemetry, etc.
* adding runtime metrics / system info additional accelerator support, etc.
* adding runtime metrics / system info additional accelerator support, etc.
* remove duplicate info
* fixes
* fix issue with tests in ci
* distributed fix
* opt-in version of telemetry
* enable / disable logic update
* docs fix
* doc update
* minor fixes
* simplifying
* slight changes
* fix
* lint
* update posthog dep
* coderabbit comments
* fix: opt-in model
* fix: increase time since last
* fix: increase whitelist orgs
* fix: posthog init and shutdown
* fix: imports
* fix: also check grad norm
* fix: duplicate plugin_manager calls
* fix: bad merge
* chore: update docs
* fix: cache process per comment
* fix: error handling
* fix: tests
* Revert "fix: error handling"
This reverts commit 22d1ea5755.
* fix: test telemetry error_handled bool
* fix: revert test
* chore: final doc fixes
---------
Co-authored-by: Dan Saunders <danjsaund@gmail.com>
Co-authored-by: Dan Saunders <dan@axolotl.ai>
77 lines
1.1 KiB
Plaintext
77 lines
1.1 KiB
Plaintext
--extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
|
|
|
|
# START section of dependencies that don't install on Darwin/MacOS
|
|
bitsandbytes==0.48.2
|
|
triton>=3.0.0
|
|
mamba-ssm==1.2.0.post1
|
|
xformers>=0.0.23.post1
|
|
liger-kernel==0.6.3
|
|
# END section
|
|
|
|
packaging==23.2
|
|
|
|
huggingface_hub>=0.36.0
|
|
peft>=0.17.1
|
|
tokenizers>=0.22.1
|
|
transformers==4.57.1
|
|
accelerate==1.11.0
|
|
datasets==4.4.1
|
|
deepspeed>=0.17.0
|
|
trl==0.25.0
|
|
hf_xet==1.2.0
|
|
kernels>=0.9.0
|
|
trackio
|
|
|
|
optimum==1.16.2
|
|
hf_transfer
|
|
sentencepiece
|
|
gradio==5.49.1
|
|
|
|
modal==1.0.2
|
|
pydantic>=2.10.6
|
|
addict
|
|
fire
|
|
PyYAML>=6.0
|
|
requests
|
|
wandb
|
|
einops
|
|
colorama
|
|
numba>=0.61.2
|
|
numpy>=2.2.6
|
|
|
|
# qlora things
|
|
evaluate==0.4.1
|
|
scipy
|
|
scikit-learn==1.4.2
|
|
nvidia-ml-py==12.560.30
|
|
art
|
|
tensorboard
|
|
python-dotenv==1.0.1
|
|
|
|
# remote filesystems
|
|
s3fs>=2024.5.0
|
|
gcsfs>=2025.3.0
|
|
adlfs>=2024.5.0
|
|
ocifs==1.3.2
|
|
|
|
zstandard==0.22.0
|
|
fastcore
|
|
|
|
# lm eval harness
|
|
lm_eval==0.4.7
|
|
langdetect==1.0.9
|
|
immutabledict==4.2.0
|
|
antlr4-python3-runtime==4.13.2
|
|
|
|
torchao==0.13.0
|
|
openenv-core==0.1.0
|
|
schedulefree==1.4.1
|
|
|
|
axolotl-contribs-lgpl==0.0.7
|
|
axolotl-contribs-mit==0.0.5
|
|
|
|
# telemetry
|
|
posthog==6.7.11
|
|
|
|
mistral-common==1.8.5
|