axolotl/docs/gradient_checkpointing.qmd at bc2bc688d80f2d39ad21f06bf2e68f768e99db31

Files

Wing Lian 99187cd208 Activation Offloading w CUDA Streams (#2900 ) [skip ci]

* use cuda streams for activation offloading

* use torch native ops

* update cfg schema for streams

* fix literal constructor for set

* use context for training step so it doesn't affect evals

* disable streams

* auto gc on eval steps

* use activation_offloading config arg

* add docs for gradient checkpointing

* handle validation for gc/ao

* use cuda streams for act offloading

* add more validation for AC w/o GC

* fix docs

* move activation_offloading lower in definition so it doesn't break args/kwargs

* fix kd due to import order

2025-07-14 20:10:20 -04:00

997 B

Raw Blame History

View Raw

997 B Raw Blame History

997 B

Raw Blame History