axolotl/docs/gradient_checkpointing.qmd at 970b2a6f2fcd5c364d5e8a9a6bf63ed3671569b4

Files

Wing Lian 99187cd208 Activation Offloading w CUDA Streams (#2900 ) [skip ci]

* use cuda streams for activation offloading

* use torch native ops

* update cfg schema for streams

* fix literal constructor for set

* use context for training step so it doesn't affect evals

* disable streams

* auto gc on eval steps

* use activation_offloading config arg

* add docs for gradient checkpointing

* handle validation for gc/ao

* use cuda streams for act offloading

* add more validation for AC w/o GC

* fix docs

* move activation_offloading lower in definition so it doesn't break args/kwargs

* fix kd due to import order

2025-07-14 20:10:20 -04:00

997 B

Raw Blame History

View Raw

997 B Raw Blame History

997 B

Raw Blame History