- Home
- Register
- Attend
- Conference Program
- SC15 Schedule
- Technical Program
- Awards
- Students@SC
- Research with SCinet
- HPC Impact Showcase
- HPC Matters Plenary
- Keynote Address
- Support SC
- SC15 Archive
- Exhibits
- Media
- SCinet
- HPC Matters
SCHEDULE: NOV 15-20, 2015
When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.
Fault Tolerant MapReduce-MPI for HPC Clusters
SESSION: Cloud Resource Management
EVENT TYPE: Papers
EVENT TAG(S): Clouds and Distributed Computing
TIME: 11:30AM - 12:00PM
SESSION CHAIR(S): Brent Welch
AUTHOR(S):Yanfei Guo, Wesley Bland, Pavan Balaji, Xiaobo Zhou
ROOM:19AB
ABSTRACT:
Building MapReduce applications using the Message-Passing Interface (MPI) enables us to exploit the performance of large HPC clusters for big data analytics. However, due to the lacking of native fault tolerance support in MPI and the incompatibility between the MapReduce fault tolerance model and HPC schedulers, it is very hard to provide a fault tolerant MapReduce runtime for HPC clusters. We propose and develop FT-MRMPI, the first fault tolerant MapReduce framework on MPI for HPC clusters. We discover a unique way to perform failure detection and recovery by exploiting the current MPI semantics and the new proposal of user-level failure mitigation. We design and develop the checkpoint/restart model for fault tolerant MapReduce in MPI. We further tailor the detect/resume model to conserve work for more efficient fault tolerance. The experimental results on a 256-node HPC cluster show that FT-MRMPI effectively masks failures and reduces the job completion time by 39%.
Chair/Author Details:
Brent Welch (Chair) - Google|
Yanfei Guo - University of Colorado Colorado Springs
Wesley Bland - Argonne National Laboratory
Pavan Balaji - Argonne National Laboratory
Xiaobo Zhou - University of Colorado Colorado Springs
Click here to download .ics calendar file
Click here to download .vcs calendar file
Click here to add event to your Google Calendar