add e2e tests for checking functionality of resume from checkpoint (#865)
* use tensorboard to see if resume from checkpoint works * make sure e2e test is either fp16 or bf16 * set max_steps and save limit so we have the checkpoint when testing resuming * fix test parameters
This commit is contained in:
@@ -32,3 +32,4 @@ pynvml
|
||||
art
|
||||
fschat==0.2.29
|
||||
gradio
|
||||
tensorboard
|
||||
|
||||
Reference in New Issue
Block a user