### Evaluation des performances réseau dans le contexte de la vistualisation XEN.

• By: Olivier Mornard, Pascale Primet, Jean-patrick Gelas
• Number: RR2008-01
• Date: January 2008
• Abstract:
Virtualization technics of the Operating System improve security, isolation, reliability and flexibility of the environments. These technics become a must in large scale distributed system and begin to be used in data transport networks. However, virtualization introduced an overhead which must be integrated to virtualized system performance models in order to forecast their behavior. In this article, we analyse this overhead and give an experimental evaluation of the virtualization impact with XEN over the network performance in high speed network context. We show that the throughput reached with TCP is barely affected by the virtulization at the expense of a growing CPU consumption. Moreover, we study the conflicts between protocol process and standard process to access CPU resources. Finally, we show the impact of the TCP's congestion window size in case of several virtual machines running.
• Keywords:
Network performances, virtualization, Xen.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+12p
• Format: Portable Document Format
• Get it
• Revisions: none

### Bi-criteria Pipeline Mappings for Parallel Image Processing.

• By: Anne Benoit , Harald Kosch , Veronika Rehn-Sonigo , Yves Robert
• Number: RR2008-02
• Date: January 2008
• Abstract:
Mapping workflow applications onto parallel platforms is a challenging problem, even for simple application patterns such as pipeline graphs. Several antagonistic criteria should be optimized, such as throughput and latency (or a combination). Typical applications include digital image processing, where images are processed in steady-state mode. In this paper, we study the mapping of a particular image processing application, the JPEG encoding. Mapping pipelined JPEG encoding onto parallel platforms is useful for instance for encoding Motion JPEG images. As the bi-criteria mapping problem is NP-complete, we concentrate on the evaluation and performance of polynomial heuristics.
• Keywords:
pipeline, workflow application, multi-criteria optimization, JPEG encoding.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+11p
• Format: Portable Document Format
• Get it
• Revisions: none

### Fault Tolerant Scheduling of Precedence Task Graphs on Heterogeneous Platforms.

• By: Anne Benoit, Mourad Hakem, Yves Robert
• Number: RR2008-03
• Date: January 2008
• Abstract:
Fault tolerance and latency are important requirements in several applications which are time critical in nature: such applications require guaranties in terms of latency, even when processors are subject to failures. In this paper, we propose a fault tolerant scheduling heuristic for mapping precedence task graphs on heterogeneous systems. Our approach is based on an active replication scheme, capable of supporting $\varepsilon$ arbitrary fail-silent (fail-stop) processor failures, hence valid results will be provided even if $\varepsilon$ processors fail. We focus on a bi-criteria approach, where we aim at minimizing the latency given a fixed number of failures supported in the system, or the other way round. Major achievements include a low complexity, and a drastic reduction of the number of additional communications induced by the replication mechanism. Experimental results demonstrate that our heuristics, despite their lower complexity, outperform their direct competitor, the FTBAR scheduling algorithm \citeGirLav03.
• Keywords:
Fault tolerance, reliability, scheduling, precedence task graphs, multi-criteria scheduling, heterogeneous systems.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+18p
• Format: Portable Document Format
• Get it
• Revisions: none

### On the Continuity Set of an omega Rational Function.

• By: Olivier Carton, Olivier Finkel, et Pierre Simonnet
• Number: RR2008-04
• Get it (Web Archive)

### On the expressive power of permanents and perfect matchings of matrices of bounded pathwidth/cliquewidth.

• By: Uffe Flarup, Laurent Lyaudet
• Number: RR2008-05
• Get it (Web Archive)

### Generic deployment of applications on heterogeneous distributed platforms.

• By: Benjamin Depardon
• Number: RR2008-06
• Date: January 2008
• Abstract:
From user's point of view Grid computing requires an easy, automatic and efficient resource selection as well as application installation and execution. This problem refers to the deployment process. A major lack of today's deployment tools lies in their inefficient planning algorithms to select and allocate resources for the applications. %% Therefore, it becomes important to provide efficient %% ones that support sequential, parallel and hierarchical applications. % This paper presents several planning heuristics which aims at minimizing the communication cost between the applications, balancing the load among the processors, and maximizing the number of applications instances on the platform, for sequential and hierarchical applications. % These heuristics have been implemented into the deployment tool \adage at the price of extending its generic model to support hierarchical applications. % Finally, the paper presents some simulations and experimental results.
• Keywords:
Grid computing, deployment, mapping, middleware, heterogeneous platform.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+10p
• Format: Portable Document Format
• Get it
• Revisions: none

### Pictures worth a thousand tiles, a geometrical programming language for self-assembly.

• By: Florent Becker
• Number: RR2008-07
• Get it (Web Archive)

### Minimizing the stretch when scheduling flows of divisible requests.

• By: Arnaud Legrand, Alan Su, Frédéric Vivien
• Number: RR2008-08
• Date: February 2008
• Abstract:
In this paper, we consider the problem of scheduling distributed biological sequence comparison applications. This problem lies in the divisible load framework with negligible communication costs. Thus far, very few results have been proposed in this model. We discuss and select relevant metrics for this framework: namely max-stretch and sum-stretch. We explain the relationship between our model and the preemptive uni-processor case, and we show how to extend algorithms that have been proposed in the literature for the uni-processor model to the divisible multi-processor problem domain. We recall known results on closely related problems, we show how to minimize the max-stretch on unrelated machines either in the divisible load model or with preemption, we derive new lower bounds on the competitive ratio of any on-line algorithm, we present new competitiveness results for existing algorithms, and we develop several new on-line heuristics. We also address the Pareto optimization of max-stretch. Then, we extensively study the performance of these algorithms and heuristics in realistic scenarios. Our study shows that all previously proposed guaranteed heuristics for max-stretch for the uni-processor model prove to be inefficient in practice. In contrast, we show our on-line algorithms based on linear programming to be near-optimal solutions for max-stretch. Our study also clearly suggests heuristics that are efficient for both metrics, although a combined optimization is in theory not possible in the general case.
• Keywords:
Bioinformatics, heterogeneous computing, scheduling, divisible load, linear programming, stretch.
• Availability: Electronic copy only.
• Citation: To appear in the Journal of Scheduling.
• Size: 2+60p
• Format: Portable Document Format
• Get it
• Revisions: This article is a new version of this one.

### Realistic Models and Efficient Algorithms for Fault Tolerant Scheduling on Heterogeneous Platforms.

• By: Anne Benoit, Mourad Hakem, Yves Robert
• Number: RR2008-09
• Date: February 2008
• Abstract:
Most list scheduling heuristics rely on a simple platform model where communication contention is not taken into account. In addition, it is generally assumed that processors in the systems are completely safe. To schedule precedence graphs in a more realistic framework, we introduce an efficient fault tolerant scheduling algorithm that is both contention-aware and capable of supporting $\varepsilon$ arbitrary fail-silent (fail-stop) processor failures. We focus on a bi-criteria approach, where we aim at minimizing the total execution time, or latency, given a fixed number of failures supported in the system. Our algorithm has a low time complexity, and drastically reduces the number of additional communications induced by the replication mechanism. Experimental results fully demonstrate the usefulness of the proposed algorithm, which leads to efficient execution schemes while guaranteeing a prescribed level of fault tolerance.
• Keywords:
Communication contention, fault tolerance, multi-criteria scheduling, heterogeneous systems.
• Availability: Electronic copy only.
• Citation: To appear in the Journal of Scheduling.
• Size: 2+21p
• Format: Portable Document Format
• Get it
• Revisions: none

### Adversary lower bounds for nonadaptive quantum algorithms.

• By: Pascal Koiran, Jürgen Landes, Natacha Portier, Penghui Yao
• Number: RR2008-10
• Get it (Web Archive)

### Optimizing polynomials for floating-point implementation.

• By: Florent de Dinechin, Christoph Lauter
• Number: RR2008-11
• Get it (Web Archive)

### Optimizing Latency and Reliability of Pipeline Workflow Applications.

• By: Anne Benoit , Veronika Rehn-Sonigo , Yves Robert
• Number: RR2008-12
• Date: March 2008
• Abstract:
Mapping applications onto heterogeneous platforms is a difficult challenge, even for simple application patterns such as pipeline graphs. The problem is even more complex when processors are subject to failure during the execution of the application. In this paper, we study the complexity of a bi-criteria mapping which aims at optimizing the latency (i.e., the response time) and the reliability (i.e., the probability that the computation will be successful) of the application. Latency is minimized by using faster processors, while reliability is increased by replicating computations on a set of processors. However, replication increases latency (additional communications, slower processors). The application fails to be executed only if all the processors fail during execution. While simple polynomial algorithms can be found for fully homogeneous platforms, the problem becomes NP-hard when tackling heterogeneous platforms. This is yet another illustration of the additional complexity added by heterogeneity.
• Keywords:
Heterogeneity, scheduling, complexity results, reliability, response time.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+15p
• Format: Portable Document Format
• Get it
• Revisions: This article is a new version of this one.

### Directed Percolation arising in Stochastic Cellular Automata Analysis.

• By: Damien REGNAULT
• Number: RR2008-13
• Get it (Web Archive)

### Wadge Degrees of Infinitary Rational Relations.

• By: Olivier Finkel
• Number: RR2008-14
• Get it (Web Archive)

### Computing Correctly Rounded Integer Powers in Floating-Point Arithmetic.

• By: Peter Kornerup, Christoph Lauter, Nicolas Louvet, Vincent Lefèvre and Jean-Michel Muller
• Number: RR2008-15
• Get it (Web Archive)

### (Mechanical) Reasoning on Infinite Extensive Games.

• By: Pierre Lescanne
• Number: RR2008-16
• Get it (Web Archive)

### Topological Complexity of Context-Free omega-Languages : A Survey.

• By: Olivier Finkel
• Number: RR2008-17
• Get it (Web Archive)

### Efficiency of Tree-Structured Peer-to-peer Service Discovery Systems.

• By: Eddy Caron , Frédéric Desprez , Cédric Tedeschi
• Number: RR2008-18
• Date: June 2008
• Abstract:
The efficiency of service discovery is a crucial point in the development of fully decentralized middlewares intended to manage large scale computational grids. The work conducted on this issue led to the design of many peer-to-peer fashioned approaches. More specifically, the need for flexibility and complexity in the service discovery has seen the emergence of a new kind of overlays, based on tries, also known as lexicographic trees. Although these overlays are efficient and well designed, they require a costly maintenance and do not accurately take into account the heterogeneity of nodes and the changing popularity of the services requested by users. In this paper, we focus on reducing the cost of the maintenance of a particular architecture, based on a dynamic prefix tree, while enhancing it with some load balancing techniques that dynamically adapt the load of the nodes in order to maximize the throughput of the system. The algorithms developed couple a self-organizing prefix tree overlay with load balancing techniques inspired by similar previous works undertaken for distributed hash tables. After some simulation results showing how our load balancing heuristics perform in such an overlay and compare to other heuristics, we provide a fair comparison of this architecture and similar overlays recently proposed.
• Keywords:
Service discovery, computational grids, peer-to-peer, prefix trees, mapping, load balancing.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+12p
• Format: Portable Document Format
• Get it
• Revisions: none.

### Mapping filter services on heterogeneous platforms.

• By: Anne Benoit, Fanny Dufossé and Yves Robert
• Number: RR2008-19
• Date: June 2008
• Abstract:
In this paper, we explore the problem of mapping filtering web services on large-scale heterogeneous platforms. Two important optimization criteria should be considered in such a framework. The period, which is the inverse of the throughput, measures the rate at which data sets can enter the system. The latency measures the response time of the system in order to process one single data set entirely. Both criteria are antagonistic. For homogeneous platforms, the complexity of period minimization is already known~\cite{sriv2}; we derive an algorithm to solve the latency minimization problem, and we provide a bi-criteria algorithm which minimizes latency without exceeding a prescribed value for the period. However, when adding heterogeneity to the platform, we prove that minimizing the period or the latency becomes NP-hard. We provide an integer linear program to solve both problems in the heterogeneous case. For period minimization on heterogeneous platforms, we design some efficient polynomial time heuristics and we assess their relative and absolute performance through a set of experiments. For small problem instances, the results are very close to the optimal solution returned by the integer linear program.
• Keywords:
web services, filters, scheduling, mapping, period, latency, complexity results, heuristics.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 26p
• Format: Portable Document Format
• Get it
• Revisions: none.

### Resource Allocation Strategies for In-Network Stream Processing.

• By: Anne Benoit, Henri Casanova, Veronika Rehn-Sonigo, and Yves Robert
• Number: RR2008-20
• Date: June 2008
• Abstract:
In this paper we consider the operator mapping problem for in-network stream processing applications. In-network stream processing consists in applying a tree of operators in steady-state to multiple data objects that are continually updated at various locations on a network. Examples of in-network stream processing include the processing of data in a sensor network, or of continuous queries on distributed relational databases. We study the operator mapping problem in a constructive'' scenario, i.e., a scenario in which one builds a platform dedicated to the application buy purchasing processing servers with various costs and capabilities. The objective is to minimize the cost of the platform while ensuring that the application achieves a minimum steady-state throughput. The first contribution of this paper is the formalization of a set of relevant operator-placement problems as linear programs, and a proof that even simple versions of the problem are NP-complete. Our second contribution is the design of several polynomial time heuristics, which are evaluated via extensive simulations and compared to theoretical bounds for optimal solutions.
• Keywords:
in-network stream processing, trees of operators, operator mapping, optimization, complexity results, polynomial heuristics.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+24p
• Format: Portable Document Format
• Get it
• Revisions: none.

### Mapping linear workflows with computation/communication overlap.

• By: Kunal Agrawal, Anne Benoit and Yves Robert
• Number: RR2008-21
• Date: June 2008
• Abstract:
This paper presents theoretical results related to mapping and scheduling linear workflows onto heterogeneous platforms. We use a realistic architectural model with bounded communication capabilities and full computation/communication overlap. This model is representative of current multi-threaded systems. In these workflow applications, the goal is often to maximize throughput or to minimize latency. We present several complexity results related to both these criteria. To be precise, we prove that maximizing the period is NP-complete even for homogeneous platforms and minimizing the latency is NP-complete for heterogeneous platforms. Moreover, we present an approximation algorithm for throughput maximization for linear chain applications on homogeneous platforms, and an approximation algorithm for latency minimization for linear chain applications on all platforms where communication is homogeneous (the processor speeds can differ). In addition, we present algorithms for several important special cases for linear chain applications. Finally, we consider the implications of adding feedback loops to linear chain applications.
• Keywords:
pipeline graphs, workflow, scheduling, mapping, period, latenc y, feedback loop, complexity results.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 23p
• Format: Portable Document Format
• Get it
• Revisions: none.

### An FPGA-specific Approach to Floating-Point Accumulation and Sum-of-Products.

• By: Florent de Dinechin, Bogdan Pasca, Octavian Cret, Radu Tudoran
• Number: RR2008-22
• Get it (Web Archive)

### Automatic Middleware Deployment Planning on Heterogeneous Platforms.

• By: Eddy Caron, Pushpinder Kaur Chouhan, Frédéric Desprez
• Number: RR2008-23
• Date: June 2008
• Abstract:
The use of many distributed, heterogeneous resources as a large collective platform offers great potential. A key issue for these grid platforms is middleware scalability and how middleware services can be mapped on the available resources. Optimizing deployment is a difficult problem with no existing general solutions. In this paper, we address the following problem: how to perform out an adapted deployment for a hierarchy of servers and resource brokers on a heterogeneous system? Our objective is to generate a best platform from the available nodes so as to fulfill the clients demands. However, finding the best deployment among heterogeneous resources is a hard problem since it is close to find the best broadcast tree in a general graph, which is known to be NP-complete. Thus, in this paper, we present a heuristic for middleware deployment on heterogeneous resources. We apply our heuristic to automatically deploy a distributed Problem Solving Environment on a large scale grid. We present experiments comparing the automatically generated deployment against a number of other reasonable deployments.
• Keywords:
Deployment, Grid computing, Network Enabled Server, Scheduling, Resource localization and selection.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+14p
• Format: Portable Document Format
• Get it
• Revisions: none.

### All in one Graphical Tool for the management of DIET a GridRPC Middleware.

• By: Eddy Caron , Frédéric Desprez , David Loureiro
• Number: RR2008-24
• Date: July 2008
• Abstract:
Grid Middleware are the link between large scale (and distributed) platforms and applications. Managing such a software system and the Grid environment itself can be a hard task when no dedicated (and integrated) tool exist. Some can be used through nice graphical interfaces, but they are usually dedicated to one or some limited tasks. They do not fulfill all the needs of a Grid end-user who wants to deploy Grid applications easily and rapidly. The aim of this paper is to present the case study of an all-in-one software system, designed for the management of a Grid Middleware and gathering user-friendly graphical interfaces answering to the various needs of end-users. Moreover the software system eases the use of the Grid by avoiding the scripting layer under a nice GUI enabling the user a faster and more efficient use of the Grid environment. By this means we demonstrate how the DIETDashboard fulfills all the needs of a unified tool for Grid management. This paper gives a comparison with existing and well-known tools dedicated to some specific tasks such as Grid resources management, Grid monitoring, or Middleware management.
• Keywords:
Grid Middleware, Grid management, Grid monitoring, Deployment, Workflow management.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+14p
• Format: Portable Document Format
• Get it
• Revisions: none.

### Iso-Level CAFT: How to Tackle the Combination of Communication Overhead Reduction and Fault Tolerance Schedulingg.

• By: Anne Benoit, Mourad Hakem, Yves Robert
• Number: RR2008-25
• Date: July 2008
• Abstract:
To schedule precedence task graphs in a more realistic framework, we introduce an efficient fault tolerant scheduling algorithm that is both contention-aware and capable of supporting $\varepsilon$ arbitrary fail-silent (fail-stop) processor failures. The design of the proposed algorithm which we call Iso-Level CAFT, is motivated by (i) the search for a better load-balance and (ii) the generation of fewer communications. These goals are achieved by scheduling a chunk of ready tasks simultaneously, which enables for a global view of the potential communications. Our goal is to minimize the total execution time, or latency, while tolerating an arbitrary number of processor failures. Our approach is based on an active replication scheme to mask failures, so that there is no need for detecting and handling such failures. Major achievements include a low complexity, and a drastic reduction of the number of additional communications induced by the replication mechanism. The experimental results fully demonstrate the usefulness of Iso-Level CAFT.
• Keywords:
Communication contention, fault tolerance, multi-criteria scheduling, heterogeneous systems.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+16p
• Format: Portable Document Format
• Get it
• Revisions: none.

### Ocean-Atmosphere Modelization over the Grid.

• By: Yves Caniou, Eddy Caron, Ghislain Charrier, Andreea Chis, Frédéric Desprez, Éric Maisonnave
• Number: RR2008-26
• Date: July 2008
• Abstract:
In this report, we tackle the problem of scheduling an Ocean-Atmosphere application used for climate prediction on the grid. An experiment is composed of several 1D-meshes of identical \dags composed of parallel tasks. To obtain a good completion time, we divide groups of processors into sets each working on parallel tasks. The group sizes are chosen by computing the best makespan for several grouping possibilities. We improved this heuristic method by different means. The improvement yielding to the best makespan is the representation of the problem as an instance of the Knapsack problem. As this heuristic is firstly designed for homogeneous platforms, we present its adaptation to heterogeneous platforms. Simulations show improvements of the makespan up to 12\%.
• Keywords:
Grid computing, Ocean-Atmosphere application, Scheduling.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+11p
• Format: Portable Document Format
• Get it
• Revisions: none.

### Topological Complexity of omega-Powers : Extended Abstract.

• By: Olivier Finkel, Dominique Lecomte
• Number: RR2008-27
• Get it (Web Archive)

### Generating high-performance arithmetic operators for FPGAs.

• By: Florent de Dinechin, Cristian Klein and Bogdan Pascae
• Number: RR2008-28
• Get it (Web Archive)

### Static Strategies for Worksharing with Unrecoverable Interruptions.

• By: Anne Benoit , Yves Robert , Arnold L. Rosenberg , Fr´ed´eric Vivien
• Number: RR2008-29
• Date: October 2008
• Abstract:
One has a large workload that is divisible''---its constituent work's granularity can be adjusted arbitrarily---and one has access to p remote computers that can assist in computing the workload. The problem is that the remote computers are subject to interruptions of known likelihood that kill all work in progress. One wishes to orchestrate sharing the workload with the remote computers in a way that maximizes the expected amount of work completed. Strategies for achieving this goal, by balancing the desire to checkpoint often, in order to decrease the amount of vulnerable work at any point, vs. the desire to avoid the context-switching required to checkpoint, are studied. Strategies are devised that provably maximize the expected amount of work when there is only one remote computer (the case p=1). Results are presented that suggest the intractability of such maximization for higher values of p, which motivates the development of heuristic approaches. Heuristics are developed that replicate work on several remote computers, in the hope of thereby decreasing the impact of work-killing interruptions.
• Keywords:
Fault-tolerance, scheduling, divisible loads, probabilities.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+57p
• Format: Portable Document Format
• Get it
• Revisions: none.

### On the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms.

• By: Anne Benoit, Fanny Dufossé and Yves Robert
• Number: RR2008-30
• Date: October 2008
• Abstract:
In this paper, we explore the problem of mapping filtering services on large-scale heterogeneous platforms. Two important optimization criteria should be considered in such a framework. The period, which is the inverse of the throughput, measures the rate at which data sets can enter the system. The latency measures the response time of the system in order to process one single data set entirely. Both criteria are antagonistic. For homogeneous platforms, the complexity of period minimization is already known; we derive an algorithm to solve the latency minimization problem in the general case with service precedence constraints; for independent services we also show that the bi-criteria problem (latency minimization without exceeding a prescribed value for the period) is of polynomial complexity. However, when adding heterogeneity to the platform, we prove that minimizing the period or the latency becomes NP-hard, and that these problems cannot be approximated by any constant factor (unless P=NP). The latter results hold true even for independent services. We provide an integer linear program to solve both problems in the heterogeneous case with independent services. For period minimization on heterogeneous platforms, we design some efficient polynomial time heuristics and we assess their relative and absolute performance through a set of experiments. For small problem instances, the results are very close to the optimal solution returned by the integer linear program.
• Keywords:
query optimization, web service, filter, workflow, period, latency, complexity results.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 28p
• Format: Portable Document Format
• Get it
• Revisions: none.

### A Self-Stabilizing K-clustering algorithm using an arbitrary metric

• By: E. Caron, A. K. Datta, B. Depardon and L. L. Larmore.
• Number: RR2008-31
• Date: December 2008
• Abstract:
Mobile ad hoc networks as well as grid platforms are distributed, changing and error prone environments. Communication costs within such infrastructure can be improved, or at least bounded, by using k-clustering. A k-clustering of a graph, is a partition of the nodes into disjoint sets, called clusters, in which every node is distance at most k from a designated node in its cluster, called the clusterhead. A self-stabilizing asynchronous distributed algorithm is given for constructing a k-clustering of a connected network of processes with unique IDs and weighted edges. The algorithm is comparison-based, takes O(nk) time, and uses O(log(n) + log(k)) space per process, where n is the size of the network. This is the first distributed solution to the k-clustering problem on weighted graphs.
• Keywords:
K-Clustering, Self-Stabilization, Weighted Graph.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+36p
• Format: Portable Document Format
• Get it
• Revisions: revision of the october 2008 version.

### On the Complexity of Mapping Linear Chain Applications onto Heterogeneous Platforms.

• By: Anne Benoit, Yves Robert and Eric Thierry
• Number: RR2008-32
• Date: October 2008
• Abstract:
In this paper, we explore the problem of mapping simple application patterns onto large-scale heterogeneous platforms. An important optimization criteria that should be considered in such a framework is the latency, or makespan, which measures the response time of the system in order to process one single data set entirely. We focus in this work on linear chain applications, which are representative of a broad class of real-life applications. For such applications, we can consider one-to-one mappings, in which each stage is mapped onto a single processor. However, in order to reduce the communication cost, it seems natural to group stages into intervals. The interval mapping problem can be solved in a straightforward way if the platform has homogeneous communications: the whole chain is grouped into a single interval, which in turn is mapped onto the fastest processor. But the problem becomes harder when considering a fully heterogeneous platform. Indeed, we prove the NP-completeness of this problem. Furthermore, we prove that neither the interval mapping problem nor the similar one-to-one mapping problem can be approximated by any constant factor (unless P=NP).
• Keywords:
pipeline graphs, interval mappings, latency, makespan, complexity results, NP-hardness, approximation.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 12p
• Format: Portable Document Format
• Get it
• Revisions: none.

### Resource Allocation using Virtual Clusters

• By: Henri Casanova , David Schanzenbach , Mark Stillwell , Frédéric Vivien.
• Number: RR2008-33
• Date: October 2008
• Abstract:
In this report we demonstrate the potential utility of resource allocation management systems that use virtual machine technology for sharing parallel computing resources among competing jobs. We formalize the resource allocation problem with a number of underlying assumptions, determine its complexity, propose several heuristic algorithms to find near-optimal solutions, and evaluate these algorithms in simulation. We find that among our algorithms one is very efficient and also leads to the best resource allocations. We then describe how our approach can be made more general by removing several of the underlying assumptions.
• Keywords:
Virtualization, virtual cluster, scheduling, resource management, fairness.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+30p
• Format: Portable Document Format
• Get it
• Revisions: none.

### Who needs a scheduler?

• By: Anne Benoit, Loris Marchal and Yves Robert.
• Number: RR2008-34
• Date: October 2008
• Abstract:
This position paper advocates the need for scheduling. Even if resources at our disposal would become abundant and cheap, not to say unlimited and free (a~perspective that is not granted), we would still need to assign the right task to the right device. We give several simple examples of such situations where resource selection and allocation is mandatory. Finally we expose our views on the important algorithmic challenges that need be addressed in the future.
• Keywords:
scheduling, algorithm design, resource selection.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 11p
• Format: Portable Document Format
• Get it
• Revisions: none.

### On the computation of correctly-rounded sums.

• By: P. Kornerup, V. Lefèvre, N. Louvet et J.-M. Muller
• Number: RR2008-35
• Get it (Web Archive)

### Energy-aware scheduling of flow applications on master-worker platforms

• By: Jean-François Pineau, Yves Robert, and Frédéric Vivien.
• Number: RR2008-36
• Date: October 2008
• Abstract:
In this report, we consider the problem of scheduling an application composed of independent tasks on a fully heterogeneous master-worker platform with communication costs. We introduce a bi-criteria approach aiming at maximizing the throughput of the application while minimizing the energy consumed by participating resources. Assuming arbitrary super-linear power consumption laws, we investigate different models for energy consumption, with and without start-up overheads. Building upon closed-form expressions for the uniprocessor case, we are able to derive optimal or asymptotically optimal solutions for both models.
• Keywords:
Scheduling, energy, master-worker platforms, communication.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+21p
• Format: Portable Document Format
• Get it
• Revisions: none.

### Certified and fast computation of supremum norms of approximation errors.

• By: Sylvain Chevillard , Mioara Joldes , Christoph Lauter
• Number: RR2008-37
• Date: October 2008
• Abstract:
In many numerical programs there is a need for a high-quality floating-point approximation of useful functions f, such as exp, sin, erf. In the actual implementation, the function is replaced by a polynomial p, leading to an approximation error (absolute or relative) epsilon = p-f or epsilon = p/f-1. The tight yet certain bounding of this error is an important step towards safe implementations. The main difficulty of this problem is due to the fact that this approximation error is very small and the difference p-f is highly cancellating. In consequence, previous approaches for computing the supremum norm in this degenerate case, have proven to be either unsafe, not sufficiently tight or too tedious in manual work. We present a safe and fast algorithm that computes a tight lower and upper bound for the supremum norms of approximation errors. The algorithm is based on a combination of several techniques, including enhanced interval arithmetic, automatic differentiation and isolation of the roots of a polynomial. We have implemented our algorithm and timings on several examples are given.
• Keywords:
supremum norm, approximation error, certified computation, elementary function, interval arithmetic, automatic differentiation, roots isolation technique.
• Citation: Not published yet.
• Size: 2+11p
• Format: Portable Document Format
• Get it
• Get it (Web Archive)
• Revisions: none

### Computing floating-point square roots via bivariate polynomial evaluation.

• By: Claude-Pierre Jeannerod, Hervé Knochel, Christophe Monat, Guillaume Revy
• Number: RR2008-38
• Get it (Web Archive)

### A new binary floating-point division algorithm and its software implementation on the ST231 processor.

• By: Claude-Pierre Jeannerod, Hervé Knochel, Christophe Monat, Guillaume Revy, Gilles Villard
• Number: RR2008-39
• Get it (Web Archive)

### Revisiting Out-of-SSA Translation for Correctness, Code Quality, and Efficiency.

• By: Benoit Boissinot, Alain Darte, Benoît Dupont de Dinechin, Christophe Guillon, Fabrice Rastello
• Number: RR2008-40
• Get it (Web Archive)

### Complexity analysis of matrix product on multicore architectures.

• By: Mathias Jacquelin, Loris Marchal, Yves Robert
• Number: RR2008-41
• Date: December 2008
• Abstract:
The multicore revolution is underway. Classical algorithms have to be revisited in order to take hierarchical memory layout into account. In this paper, we aim at minimizing the number of cache misses paid during the execution of the matrix product kernel on a multicore processor, and we show how to achieve the best possible trade-off between shared and distributed caches.
• Keywords:
multicore processors, matrix product, complexity.
• Citation: Not published yet.
• Size: 2+16p
• Format: Portable Document Format
• Get it
• Revisions: none