* run nightly ci builds against upstream main * add test badges * run the multigpu tests against nightly main builds too