Toward an autonomic engine for scientific workflows and elastic Cloud infrastructure
The constant development of scientific and industrial computation infrastructures requires the concurrent development of scheduling and deployment mechanisms to manage such infrastructures. Throughout the last decade, the emergence of the Cloud paradigm raised many hopes, but achieving full platform autonomicity is still an ongoing challenge.
Work undertaken during this PhD aimed at building a workflow engine that integrated the logic needed to manage workflow execution and Cloud deployment on its own. More precisely, we focus on Cloud solutions with a dedicated Data as a Service (DaaS) data management component. Our objective was
to automate the execution of workflows submitted by many users on elastic Cloud resources.
This contribution proposes a modular middleware infrastructure and details the implementation of the underlying modules:
• A workflow clustering algorithm that optimises data locality in the context of DaaS-centered communications;
• A dynamic scheduler that executes clustered workflows on Cloud resources;
• A deployment manager that handles the allocation and deallocation of Cloud resources according to the workload characteristics and users’ requirements.
All these modules have been implemented in a simulator to analyse their behaviour and measure their effectiveness when running both synthetic and real scientific workflows. We also implemented these modules in the Diet middleware to give it new features and prove the versatility of this approach. Simulation running the WASABI workflow (waves analysis based inference, a framework for the reconstruction of gene regulatory networks) showed that our approach can decrease the deployment cost by up to 44% while meeting the required deadlines.