- Home
- Register
- Attend
- Conference Program
- SC15 Schedule
- Technical Program
- Awards
- Students@SC
- Research with SCinet
- HPC Impact Showcase
- HPC Matters Plenary
- Keynote Address
- Support SC
- SC15 Archive
- Exhibits
- Media
- SCinet
- HPC Matters
SCHEDULE: NOV 15-20, 2015
When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.
Accelerating Big Data Processing with Hadoop, Spark, and Memcached on Modern Clusters
SESSION: Accelerating Big Data Processing with Hadoop, Spark, and Memcached on Modern Clusters
EVENT TYPE: Tutorials
EVENT TAG(S): Clouds and Distributed Computing
TIME: 1:30PM - 5:00PM
Presenter(s):Dhabaleswar K. (DK) Panda, Xiaoyi Lu, Hari Subramoni
ROOM:16AB
ABSTRACT:
Apache Hadoop and Spark are gaining prominence in handling Big Data and analytics. Similarly, Memcached in Web-2.0 environment is becoming important for large-scale query processing. Recent studies have shown default Hadoop, Spark, and Memcached can not leverage the features of modern high-performance computing clusters efficiently, like Remote Direct Memory Access (RDMA) enabled high-performance interconnects, high-throughput and large-capacity parallel storage systems (e.g. Lustre). These middleware are traditionally written with sockets and do not deliver best performance on modern high-performance networks. In this tutorial, we will provide an in-depth overview of the architecture of Hadoop components (HDFS, MapReduce, RPC, HBase, etc.), Spark and Memcached. We will examine the challenges in re-designing networking and I/O components of these middleware with modern interconnects, protocols (such as InfiniBand, iWARP, RoCE, and RSocket) with RDMA and storage architectures. Using the publicly available software packages in the High-Performance Big Data (HiBD, http://hibd.cse.ohio-state.edu) project, we will provide case studies of the new designs for several Hadoop/Spark/Memcached components and their associated benefits. Through these case studies, we will also examine the interplay between high-performance interconnects, storage systems (HDD and SSD), and multi-core platforms to achieve the best solutions for these components and Big Data applications on modern HPC clusters.
Chair/Presenter Details:
Dhabaleswar K. (DK) Panda - Ohio State University
Xiaoyi Lu - Ohio State University
Hari Subramoni - Ohio State University
Click here to download .ics calendar file
