Workshop on Clusters, Clouds and Grids for Life Sciences

In conjunction with CCGrid 2015 - 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 4-7, 2015, Shenzhen, Guangdong, China

A Comparative Analysis of Scheduling Mechanisms for Virtual Screening Workflow in a Shared Resource Environment

Abstract

Traditional High-Throughput Computing (HTC) consists of running many loosely-coupled tasks that are independent but require a large amount of computing power during relatively a long period of time. However, recent emerging applications requiring millions or even billions of tasks to be processed within a relatively short period of time have expanded the traditional HTC into Many-Task Computing (MTC). In silico drug discovery offers an efficient alternative to reduce the cost of drug development and discovery process. For this purpose, virtual screening is used to select the most promising candidate drugs for in vitro testing from millions of chemical compounds. This process of a large-scale virtual screening requires a substantial amount of computing resources and high-performance processing of docking simulations, which shows the main characteristics of MTC applications. In this paper, we present a comparative analysis of scheduling mechanisms for the virtual screening workflow where multiple users in the system are sharing a common service infrastructure. To effectively support these multiple users with various numbers of many tasks, the underlying system should be able to consider fairness, user response time and overall system throughput. We have implemented two different scheduling algorithms which can address fairness and user response time respectively in a common middleware stack called HTCaaS which is a pilot-job based multilevel scheduling system running on top of a dedicated production-level cluster. Throughout our comparative analysis of two different scheduling mechanisms targeting different metrics on top of a single H/W and S/W system, we can give an insight into the research community in the design and implementation of a scheduling mechanism that can trade-off user fairness and overall system performance which is crucial to support challenging MTC applications.