ROMA

ROMA working groups — year 2020/2021

How to join:
The working groups take place online using the video-conference tool provided by ENS-Lyon: Next working groups:

November 26, 2020, 3pm.

Lucas Pérotin: Resilient Scheduling of Moldable Jobs on Failure-Prone Platforms.

This paper focuses on the resilient scheduling of moldable parallel jobs on high-performance computing (HPC) platforms. Moldable jobs allow for choosing a processor allocation before execution, and their execution time obeys various speedup models. The objective is to minimize the overall completion time of the jobs, or makespan, assuming that jobs are subject to arbitrary failure scenarios, and hence need to be re-executed each time they fail until successful completion. This work generalizes the classical framework where jobs are known offline and do not fail. We introduce a list-based algorithm, and prove new approximation ratios for three prominent speedup models (roofline, communication, Amdahl). We also introduce a batch-based algorithm, where each job is allowed a restricted number of failures per batch, and prove a new approximation ratio for the arbitrary speedup model. We conduct an extensive set of simulations to evaluate and compare different variants of the two algorithms. The results show that they consistently outperform some baseline heuristics. In particular, the list algorithm performs better for the roofline and communication models, while the batch algorithm has better performance for the Amdahl's model. Overall, our best algorithm is within a factor of 1.47 of a lower bound on average over the whole set of experiments, and within a factor of 1.8 in the worst case.

December 10, 2020, 3pm.

Yikin Gao: Energy-aware strategies for reliability-oriented real-time task allocation on heterogeneous platforms

Low energy consumption and high reliability are widely identified as increasingly relevant issues in real-time systems on heterogeneous platforms. In this paper, we propose a multi-criteria optimization strategy to minimize the expected energy consumption while enforcing the reliability threshold and meeting all task deadlines. The tasks are replicated to ensure a prescribed reliability threshold. The platforms are composed of processors with different (and possibly unrelated) characteristics, including speed profile, energy cost and failure rate. We provide several mapping and scheduling heuristics towards this challenging optimization problem. Specifically, a novel approach is designed to control (i) how many replicas to use for each task, (ii) on which processor to map each replica and (iii) when to schedule each replica on its assigned processor. Different mappings achieve different levels of reliability and consume different amounts of energy. Scheduling matters because once a task replica is successful, the other replicas of that task are cancelled, which calls for minimizing the amount of temporal overlap between any replica pair. The experiments are conducted for a comprehensive set of execution scenarios, with a wide range of processor speed profiles and failure rates. The comparison results reveal that our strategies perform better than the random baseline, with a gain of 40% in energy consumption, for nearly all cases. The absolute performance of the heuristics is assessed by a comparison with a lower bound; the best heuristics achieve an excellent performance, with an average value only 4% higher than the lower bound.

December 17, 2020, 3pm.

Yishu Du: Robustness of the Young/Daly formula for stochastic iterative applications

The Young/Daly formula for periodic checkpointing is known to hold fora divisible load application where one can checkpoint at any time-step. In an nutshell, the optimal period isPYD=√2μfC where μf is the Mean Time Between Failures (MTBF) and C is the checkpoint time. This paper assesses the accuracy of the formulafor applications decomposed into computational iterations where: (i) the duration of aniteration is stochastic, i.e., obeys a probability distribution law D of mean μD ; and (ii) onecan checkpoint only at the end of an iteration. We first consider static strategies where checkpoints are taken after a given number of iterationskand provide a closed-form,asymptotically optimal, formula fork, valid for any distribution D. We then show that using the Young/Daly formula to compute k (ask·μD=PYD) is a first order approximationof this formula. We also consider dynamic strategies where one decides to checkpoint at the end of an iteration only if the total amount of work since the last checkpoint exceeds a threshold Wth, and otherwise proceed to the next iteration. Similarly, we provide aclosed-form formula for this threshold and show that PYD is a first-order approximation of Wth. Finally, we provide an extensive set of simulations where D is either Uniform,Gamma or truncated Normal, which shows the global accuracy of the Young/Daly formula,even when the distribution D had a large standard deviation (and when one cannot use afirst-order approximation). Hence we establish that the relevance of the formula goes well beyond its original framework


Previous working groups:

November 19, 2020, 3pm.

Grégoire Pichon: Trading Performance for Memory in Sparse Direct Solvers using Low-rank Compression

Sparse direct solvers using Block Low-Rank compression have been proven efficient to solve problems arising in many real-life applications. Improving those solvers is crucial for being able to 1) solve larger problems and 2) speed up computations. A main characteristic of a sparse direct solver using low-rank compression is when compression is performed. There are two distinct approaches: (1) all blocks are compressed before starting the factorization, which reduces the memory as much as possible, or (2) each block is compressed as late as possible, which usually leads to better speedup. The objective of this paper is to design a composite approach, to speedup computations while staying under a given memory limit. This should allow to solve large problems that cannot be solved with Approach 2 while reducing the execution time compared to Approach 1. We propose a memory-aware strategy where each block can be compressed either at the beginning or as late as possible. We first consider the problem of choosing when to compress each block, under the assumption that all information on blocks is perfectly known, i.e., memory requirement and execution time of a block when compressed or not. We show that this problem is a variant of the NP-complete Knapsack problem, and adapt an existing 2-approximation algorithm for our problem. Unfortunately, the required information on blocks depends on numerical properties and in practice cannot be known in advance. We thus introduce models to estimate those values. Experiments on the PaStiX solver demonstrate that our new approach can achieve an excellent trade-off between memory consumption and computational cost. For instance on matrix Geo1438, Approach 2 uses three times as much memory as Approach 1 while being three times faster. Our new approach leads to an execution time only 30% larger than Approach 2 when given a memory 30% larger than the one needed by Approach 1.