* phi sequence packing * sample packing fixes * fix linting * fix inference and phi e2e tests * update phi example now that sample packing works * wandb import keeps getting moved around
* use torch.cuda.current_device() instead of local_rank * ignore NVML errors for gpu stats * llama lora packing e2e tests