Files
Wing Lian 99187cd208 Activation Offloading w CUDA Streams (#2900) [skip ci]
* use cuda streams for activation offloading

* use torch native ops

* update cfg schema for streams

* fix literal constructor for set

* use context for training step so it doesn't affect evals

* disable streams

* auto gc on eval steps

* use activation_offloading config arg

* add docs for gradient checkpointing

* handle validation for gc/ao

* use cuda streams for act offloading

* add more validation for AC w/o GC

* fix docs

* move activation_offloading lower in definition so it doesn't break args/kwargs

* fix kd due to import order
2025-07-14 20:10:20 -04:00
..
2025-06-18 15:36:53 -04:00
2025-06-18 15:36:53 -04:00
2025-05-28 12:35:47 +01:00
2025-06-18 15:36:53 -04:00
2025-06-18 15:36:53 -04:00
2025-07-09 09:22:35 -04:00
2025-07-12 15:18:01 +01:00
2025-07-12 15:18:01 +01:00
2025-06-05 07:20:33 -07:00
2025-06-12 13:22:40 -04:00
2025-06-18 15:36:53 -04:00