mrCUDA: Low-Overhead Middleware for Transparently Migrating CUDA Execution from Remote to Local GPUs
Authors: Pak Markthub (Tokyo Institute of Technology), Akihiro Nomura (Tokyo Institute of Technology), Satoshi Matsuoka (Tokyo Institute of Technology)
Abstract: rCUDA is a state-of-the-art remote CUDA execution middleware that enables CUDA
applications running on one node to transparently use GPUs on other nodes. With
this capability, applications can use nodes that do not have enough unoccupied
GPUs by using rCUDA to borrow idle GPUs from some other nodes. However, those
applications may suffer from rCUDA's overhead; especially for applications that
frequently call CUDA kernels or have to transfer a lot of data, rCUDA's
overhead can be detrimentally large. We propose mrCUDA, a middleware for
transparently live-migrating CUDA execution from remote to local GPUs, and show
that mrCUDA's overhead is negligibly small compared with rCUDA's overhead.
Hence, mrCUDA enables applications to run on nodes that does not have enough
unoccupied GPUs (by using rCUDA) and later migrate the work to local GPUs
(thus, get rid of rCUDA's overhead) when available.
Poster: pdf
Two-page extended abstract: pdf
Poster Index