* add e2e smoke test for using activation/gradient checkpointing with offload * disable duplicate code check for the test * fix relative import * seq len too small to test this dataset with packing * Fix checkpoint ptaching for tests