* add pytorch profiling * kick off the profiler asap since things may get allcoated before train start * document feature * add url for visualizer [skip ci]