Cache-aware scheduling of scientific workflows in a multisite cloud

Many scientific experiments today are performed using scientific workflows, which become more and more data-intensive. We consider the efficient execution of such workflows in a multisite cloud, leveraging heterogeneous resources available at multiple geo-distributed data centers. Since it is common for workflow users to reuse code or data from previous workflows, a promising approach for efficient workflow execution is to cache intermediate data in order to avoid re-executing entire workflows. However, caching intermediate data and scheduling workflows to exploit such caching in a multisite cloud is complex. In particular, workflow scheduling must be cache-aware, in order to decide whether reusing cache data or re-executing workflows entirely. In this paper, we propose a solution for cache-aware scheduling of scientific workflows in a multisite cloud. Our solution includes a distributed and parallel architecture and new algorithms for adaptive caching, cache site selection, and dynamic workflow scheduling. We implemented our solution in the OpenAlea workflow system, together with cache-aware distributed scheduling algorithms. Our experimental evaluation in a three-site cloud with a real application in plant phenotyping shows that our solution can yield major performance gains, reducing total time up to 42% with 60% of the same input data for each new execution.

Saved in:
Bibliographic Details
Main Authors: Heidsieck, Gaëtan, De Oliveira, Daniel, Pacitti, Esther, Pradal, Christophe, Tardieu, François, Valduriez, Patrick
Format: article biblioteca
Language:eng
Subjects:U10 - Informatique, mathématiques et statistiques, informatique, processus, http://aims.fao.org/aos/agrovoc/c_27769, http://aims.fao.org/aos/agrovoc/c_13586,
Online Access:http://agritrop.cirad.fr/597996/
http://agritrop.cirad.fr/597996/1/FGCS_2021.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
id dig-cirad-fr-597996
record_format koha
spelling dig-cirad-fr-5979962024-01-29T03:27:05Z http://agritrop.cirad.fr/597996/ http://agritrop.cirad.fr/597996/ Cache-aware scheduling of scientific workflows in a multisite cloud. Heidsieck Gaëtan, De Oliveira Daniel, Pacitti Esther, Pradal Christophe, Tardieu François, Valduriez Patrick. 2021. Future Generation Computer Systems, 122 : 172-186.https://doi.org/10.1016/j.future.2021.03.012 <https://doi.org/10.1016/j.future.2021.03.012> Cache-aware scheduling of scientific workflows in a multisite cloud Heidsieck, Gaëtan De Oliveira, Daniel Pacitti, Esther Pradal, Christophe Tardieu, François Valduriez, Patrick eng 2021 Future Generation Computer Systems U10 - Informatique, mathématiques et statistiques informatique processus http://aims.fao.org/aos/agrovoc/c_27769 http://aims.fao.org/aos/agrovoc/c_13586 Many scientific experiments today are performed using scientific workflows, which become more and more data-intensive. We consider the efficient execution of such workflows in a multisite cloud, leveraging heterogeneous resources available at multiple geo-distributed data centers. Since it is common for workflow users to reuse code or data from previous workflows, a promising approach for efficient workflow execution is to cache intermediate data in order to avoid re-executing entire workflows. However, caching intermediate data and scheduling workflows to exploit such caching in a multisite cloud is complex. In particular, workflow scheduling must be cache-aware, in order to decide whether reusing cache data or re-executing workflows entirely. In this paper, we propose a solution for cache-aware scheduling of scientific workflows in a multisite cloud. Our solution includes a distributed and parallel architecture and new algorithms for adaptive caching, cache site selection, and dynamic workflow scheduling. We implemented our solution in the OpenAlea workflow system, together with cache-aware distributed scheduling algorithms. Our experimental evaluation in a three-site cloud with a real application in plant phenotyping shows that our solution can yield major performance gains, reducing total time up to 42% with 60% of the same input data for each new execution. article info:eu-repo/semantics/article Journal Article info:eu-repo/semantics/acceptedVersion http://agritrop.cirad.fr/597996/1/FGCS_2021.pdf text Cirad license info:eu-repo/semantics/openAccess https://agritrop.cirad.fr/mention_legale.html https://doi.org/10.1016/j.future.2021.03.012 10.1016/j.future.2021.03.012 info:eu-repo/semantics/altIdentifier/doi/10.1016/j.future.2021.03.012 info:eu-repo/semantics/altIdentifier/purl/https://doi.org/10.1016/j.future.2021.03.012 info:eu-repo/semantics/reference/purl/https://doi.org/10.5281/zenodo.1436634
institution CIRAD FR
collection DSpace
country Francia
countrycode FR
component Bibliográfico
access En linea
databasecode dig-cirad-fr
tag biblioteca
region Europa del Oeste
libraryname Biblioteca del CIRAD Francia
language eng
topic U10 - Informatique, mathématiques et statistiques
informatique
processus
http://aims.fao.org/aos/agrovoc/c_27769
http://aims.fao.org/aos/agrovoc/c_13586
U10 - Informatique, mathématiques et statistiques
informatique
processus
http://aims.fao.org/aos/agrovoc/c_27769
http://aims.fao.org/aos/agrovoc/c_13586
spellingShingle U10 - Informatique, mathématiques et statistiques
informatique
processus
http://aims.fao.org/aos/agrovoc/c_27769
http://aims.fao.org/aos/agrovoc/c_13586
U10 - Informatique, mathématiques et statistiques
informatique
processus
http://aims.fao.org/aos/agrovoc/c_27769
http://aims.fao.org/aos/agrovoc/c_13586
Heidsieck, Gaëtan
De Oliveira, Daniel
Pacitti, Esther
Pradal, Christophe
Tardieu, François
Valduriez, Patrick
Cache-aware scheduling of scientific workflows in a multisite cloud
description Many scientific experiments today are performed using scientific workflows, which become more and more data-intensive. We consider the efficient execution of such workflows in a multisite cloud, leveraging heterogeneous resources available at multiple geo-distributed data centers. Since it is common for workflow users to reuse code or data from previous workflows, a promising approach for efficient workflow execution is to cache intermediate data in order to avoid re-executing entire workflows. However, caching intermediate data and scheduling workflows to exploit such caching in a multisite cloud is complex. In particular, workflow scheduling must be cache-aware, in order to decide whether reusing cache data or re-executing workflows entirely. In this paper, we propose a solution for cache-aware scheduling of scientific workflows in a multisite cloud. Our solution includes a distributed and parallel architecture and new algorithms for adaptive caching, cache site selection, and dynamic workflow scheduling. We implemented our solution in the OpenAlea workflow system, together with cache-aware distributed scheduling algorithms. Our experimental evaluation in a three-site cloud with a real application in plant phenotyping shows that our solution can yield major performance gains, reducing total time up to 42% with 60% of the same input data for each new execution.
format article
topic_facet U10 - Informatique, mathématiques et statistiques
informatique
processus
http://aims.fao.org/aos/agrovoc/c_27769
http://aims.fao.org/aos/agrovoc/c_13586
author Heidsieck, Gaëtan
De Oliveira, Daniel
Pacitti, Esther
Pradal, Christophe
Tardieu, François
Valduriez, Patrick
author_facet Heidsieck, Gaëtan
De Oliveira, Daniel
Pacitti, Esther
Pradal, Christophe
Tardieu, François
Valduriez, Patrick
author_sort Heidsieck, Gaëtan
title Cache-aware scheduling of scientific workflows in a multisite cloud
title_short Cache-aware scheduling of scientific workflows in a multisite cloud
title_full Cache-aware scheduling of scientific workflows in a multisite cloud
title_fullStr Cache-aware scheduling of scientific workflows in a multisite cloud
title_full_unstemmed Cache-aware scheduling of scientific workflows in a multisite cloud
title_sort cache-aware scheduling of scientific workflows in a multisite cloud
url http://agritrop.cirad.fr/597996/
http://agritrop.cirad.fr/597996/1/FGCS_2021.pdf
work_keys_str_mv AT heidsieckgaetan cacheawareschedulingofscientificworkflowsinamultisitecloud
AT deoliveiradaniel cacheawareschedulingofscientificworkflowsinamultisitecloud
AT pacittiesther cacheawareschedulingofscientificworkflowsinamultisitecloud
AT pradalchristophe cacheawareschedulingofscientificworkflowsinamultisitecloud
AT tardieufrancois cacheawareschedulingofscientificworkflowsinamultisitecloud
AT valduriezpatrick cacheawareschedulingofscientificworkflowsinamultisitecloud
_version_ 1792500139935399936