BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:2.0 BEGIN:VEVENT DTSTART:20151119T200000Z DTEND:20151119T203000Z LOCATION:18CD DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: The Breadth-First Search (BFS) algorithm serves as the foundation for many big data applications and analytics workloads. While Graphics Processing Unit (GPU) offers massive parallelism, achieving high-performance BFS on GPUs entails efficient scheduling of a large number of GPU threads and effective utilization of GPU memory hierarchy. In this paper, we present a new BFS system, Enterprise, which utilizes three novel techniques to eliminate the performance bottlenecks: (1) streamlined GPU threads scheduling; (2) GPU workload balancing; and (3) GPU based BFS direction optimization. Enterprise achieves up to 76 billion traversed edges per second (TEPS) on a single NVIDIA Kepler K40, and up to 122 billion TEPS on two GPUs that ranks No. 45 in the Graph 500 on November 2014. Enterprise is also very energy-efficient as No. 1 in the GreenGraph 500 (small data category), delivering 446 million TEPS per watt. SUMMARY:Enterprise: Breadth-First Graph Traversal on GPUs PRIORITY:3 END:VEVENT END:VCALENDAR