mrCUDA: Low-Overhead Middleware for Transparently Migrating CUDA Execution from Remote to Local GPUs

Authors: Pak Markthub (Tokyo Institute of Technology), Akihiro Nomura (Tokyo Institute of Technology), Satoshi Matsuoka (Tokyo Institute of Technology)

Abstract: rCUDA is a state-of-the-art remote CUDA execution middleware that enables CUDA applications running on one node to transparently use GPUs on other nodes. With this capability, applications can use nodes that do not have enough unoccupied GPUs by using rCUDA to borrow idle GPUs from some other nodes. However, those applications may suffer from rCUDA's overhead; especially for applications that frequently call CUDA kernels or have to transfer a lot of data, rCUDA's overhead can be detrimentally large. We propose mrCUDA, a middleware for transparently live-migrating CUDA execution from remote to local GPUs, and show that mrCUDA's overhead is negligibly small compared with rCUDA's overhead. Hence, mrCUDA enables applications to run on nodes that does not have enough unoccupied GPUs (by using rCUDA) and later migrate the work to local GPUs (thus, get rid of rCUDA's overhead) when available.

Poster: pdf
Two-page extended abstract: pdf

Poster Index