Automatic Generation of Modular Multipliers for FPGA Applications.

  • By: Jean-Luc Beuchat et Jean-Michel Muller
  • Number: RR2007-01
  • Get it (Web Archive)

Solving Systems of Linear Equations in Complex Domain: Complex E-Method.

  • By: Milos D. Ercegovac and Jean-Michel Muller
  • Number: RR2007-02
  • Get it (Web Archive)

Certification of the QR factor R, and of lattice basis reducedness.

  • By: Gilles Villard
  • Number: RR2007-03
  • Get it (Web Archive)

Mapping pipeline skeletons onto heterogeneous platforms.

  • By: Anne Benoit, Yves Robert
  • Number: RR2007-05
  • Date: January 2007
  • Abstract:

    Mapping applications onto parallel platforms is a challenging problem, that becomes even more difficult when platforms are heterogeneous --nowadays a standard assumption. A high-level approach to parallel programming not only eases the application developer's task, but it also provides additional information which can help realize an efficient mapping of the application. In this paper, we discuss the mapping of pipeline skeletons onto different types of platforms: fully homogeneous platforms with identical processors and interconnection links; communication homogeneous platforms, with identical links but different speed processors; and finally, heterogeneous platforms. We assume that a pipeline stage must be mapped on a single processor, and we establish new theoretical complexity results for two different mapping policies: a mapping can be either one-to-one (a processor is assigned at most one stage), or interval-based (a processor is assigned an interval of consecutive stages). We provide several efficient polynomial heuristics for the most important policy/platform combination, namely interval-based mappings on communication homogeneous platforms.
  • Keywords:

    Algorithmic skeletons, pipeline, scheduling, complexity results, heuristics, heterogeneous clusters.
  • Availability: Electronic copy only.
  • Citation: Not published yet.
  • Size: 2+31p
  • Format: Portable Document Format
  • Get it
  • Revisions: This article is a new version of this one.

Towards a Parallel Out-of-core Multifrontal Solver: Preliminary Study.

  • By: Emmanuel Agullo , Abdou Guermouche , Jean-Yves L'Excellent
  • Number: RR2007-06
  • Date: February 2007
  • Abstract:

    The memory usage of sparse direct solvers can be the bottleneck to solve large-scale problems involving sparse systems of linear equations of the form A x = b. This report describes a prototype implementation of an out-of-core extension to a parallel multifrontal solver (MUMPS), where disk is used to store data that cannot fit in memory. We show that, by storing the factors to disk, larger problems can be solved on limited-memory machines with reasonable performance. We illustrate the impact of low-level IO mechanisms on the behaviour of our parallel out-of-core factorization. Then we use simulations to analyze the gains that can be expected when also storing the so called active memory on disk. We discuss both the minimum memory requirements and the minimum volume of I/O in a limited memory environment. Finally we summarize the main points that we identified to be critical when designing parallel sparse direct solvers in an out-of-core environment.
  • Keywords:

    Sparse direct solver; large matrices; multifrontal method; out-of-core; parallel factorization; IO volume; direct IO; performance study.
  • Availability: Electronic copy only.
  • Citation: Not published yet.
  • Size: 2+43p
  • Format: Portable Document Format
  • Get it
  • Revisions: none

Comments on ``Design and performance evaluation of load distribution strategies for multiple loads on heterogeneous linear daisy chain networks''.

  • By: Matthieu Gallet, Yves Robert, Frédéric Vivien
  • Number: RR2007-07
  • Date: February 2007
  • Abstract:

    Min, Veeravalli, and Barlas proposed strategies to minimize the overall execution time of one or several divisible loads on a heterogeneous linear network, using one or more installments. We show on a very simple example that the proposed approach does not always produce a solution and that, when it does, the solution is often suboptimal. We also show how to find an optimal scheduling for any instance, once the number of installments per load is given. Finally, we formally prove that under a linear cost model, as in the original paper, an optimal schedule has an infinite number of installments. Such a cost model can therefore not be used to design practical multi-installment strategies.
  • Keywords:

    scheduling, heterogeneous processors, divisible loads, single-installment, multiple-installments..
  • Availability: Electronic copy only.
  • Citation: Not published yet.
  • Size: 2+15p
  • Format: Portable Document Format
  • Get it
  • Get it (Web Archive)
  • Get it (Web Archive)
  • Revisions: none

A First Step Towards Automatically Building Network Representations.

  • By: Lionel Eyraud-Dubois, Arnaud Legrand, Martin Quinson, and Frédéric Vivien
  • Number: RR2007-08
  • Date: February 2007
  • Abstract:

    To fully harness Grids, users or middlewares must have some knowledge on the topology of the platform interconnection network. As such knowledge is usually not available, one must uses tools which automatically build a topological network model through some measurements. In this article, we define a methodology to assess the quality of these network model building tools, and we apply this methodology to representatives of the main classes of model builders and to two new algorithms. We show that none of the main existing techniques build models that enable to accurately predict the running time of simple application kernels for actual platforms. However some of the new algorithms we propose give excellent results in a wide range of situations.
  • Keywords:

    Network model, topology reconstruction, Grids.
  • Availability: Electronic copy only.
  • Citation: Not published yet.
  • Size: 2+9p
  • Format: Portable Document Format
  • Get it
  • Get it (Web Archive)
  • Get it (Web Archive)
  • Revisions: none

An example of $\Pi0_3$-complete infinitary rational relation.

  • By: Olivier Finkel
  • Number: RR2007-09
  • Get it (Web Archive)

Optimal Closest Policy with QoS and Bandwidth Constraints for Placing Replicas in Tree Networks.

  • By: Veronika Rehn
  • Number: RR2007-10
  • Date: March 2007
  • Abstract:

    This paper deals with the replica placement problem on fully homogeneous tree networks known as the Replica Placement optimization problem. The client requests are known beforehand, while the number and location of the servers are to be determined. We investigate the latter problem using the Closest access policy when adding QoS and bandwidth constraints. We propose an optimal algorithm in two passes using dynamic programming.
  • Keywords:

    Replica placement, tree networks, Closest policy, quality of service, bandwidth constraints.
  • Availability: Electronic copy only.
  • Citation: Not published yet.
  • Size: 2+11p
  • Format: Portable Document Format
  • Get it
  • Revisions: none

Cosmological Simulations using Grid Middleware.

  • By: Yves Caniou , Eddy Caron , Benjamin Depardon , Hélène Courtois , Romain Teyssier
  • Number: RR2007-11
  • Date: March 2007
  • Abstract:

    Large problems ranging from numerical simulation can now be solved through the Internet using grid middleware. This report describes the different steps involved to make available a service in the \diet grid middleware. The cosmological \ramses application is taken as an example to detail the implementation. Furthermore, several results are given in order to show the benefits of using \diet, among which the transparent usage of numerous clusters and a low overhead (finding the right resource and submitting the computing task).
  • Keywords:

    Grid computing, cosmological simulations, DIET.
  • Availability: Electronic copy only.
  • Citation: Not published yet.
  • Size: 2+15p
  • Format: Portable Document Format
  • Get it
  • Revisions: none

Complexity results for throughput and latency optimization of replicated and data-parallel workflows.

  • By: Anne Benoit, Yves Robert
  • Number: RR2007-12
  • Date: March 2007
  • Abstract:

    Mapping applications onto parallel platforms is a challenging problem, even for simple application patterns such as pipeline or fork graphs. Several antagonist criteria should be optimized for workflow applications, such as throughput and latency (or a combination). In this paper, we consider a simplified model with no communication cost, and we provide an exhaustive list of complexity results for different problem instances. Pipeline or fork stages can be replicated in order to increase the throughput of the workflow, by sending consecutive data sets onto different processors. In some cases, stages can also be data-parallelized, i.e. the computation of one single data set is shared between several processors. This leads to a decrease of the latency and an increase of the throughput. Some instances of this simple model are shown to be NP-hard, thereby exposing the inherent complexity of the mapping problem. We provide polynomial algorithms for other problem instances. Altogether, we provide solid theoretical foundations for the study of mono-criterion or bi-criteria mapping optimization problems.
  • Keywords:

    pipeline graphs, fork graphs, scheduling algorithms, throughput maximization, latency minimization, bi-criteria optimization, heterogeneous platforms, complexity results.
  • Citation: Not published yet.
  • Size: 2+32p
  • Format: Portable Document Format
  • Get it
  • Revisions: none

Arithmetic Operators for Pairing-Based Cryptography.

  • By: Jean-Luc Beuchat, Nicolas Brisebarre, Jérémie Detrey, and Eiji Okamoto
  • Number: RR2007-13
  • Date: March 2007
  • Abstract:

    Since their introduction in constructive cryptographic applications, pairings over (hyper)elliptic curves are at the heart of an ever increasing number of protocols. Software implementations being rather slow, the study of hardware architectures became an active research area. In this paper, we first study an accelerator for the ηT pairing over F3[x]/(x97 + x12 + 2). Our architecture is based on a unified arithmetic operator which performs addition, multiplication, and cubing over F397. This design methodology allows us to design a compact coprocessor (1888 slices on a Virtex-II Pro 4 FPGA) which compares favorably with other solutions described in the open literature. We then describe ways to extend our approach to any characteristic and any extension field.
  • Keywords:

    pairing, finite field arithmetic, elliptic curve, hardware accelerator, FPGA.
  • Citation: Not published yet.
  • Size: 2+16p
  • Format: Portable Document Format
  • Get it
  • Get it (Web Archive)
  • Revisions: none

Acyclicity and Finite Linear Extendability: a Formal and Constructive Equivalence.

  • By: Stéphane Le Roux
  • Number: RR2007-14
  • Date: March 2007
  • Abstract:

    Linear extension of partial orders emerged in the late 1920's. Its computer-oriented version, \emph{i.e.}, topological sorting of finite partial orders, arose in the late 1950's. However, those issues have not yet been considered from a viewpoint that is both formal and constructive; this paper discusses a few related claims formally proved with the constructive proof assistant Coq. For instance, it states that a given decidable binary relation is acyclic and equality is decidable on its domain \emph{iff} an irreflexive linear extension can be computed uniformly for any of its finite restriction. A detailed introduction and proofs written in plain English shall help readers who are not familiar with constructive issues or Coq formalism.
  • Keywords:

    Binary relation, finite restriction, linear extension, (non-)uniform computability, topological sorting, constructivism, induction, proof assistant.
  • Citation: Not published yet.
  • Size: 2+22p
  • Format: Portable Document Format
  • Get it
  • Revisions: none

Markov chain analysis of an agent based growth model.

  • By: Bruno Gaujal, Laszlo Gulyas, Yuri Mansury, Eric Thierry
  • Number: RR2007-15
  • Get it (Web Archive)

MA study of stochastic 2D Minority CA: would wearing stripes be a fatality for snob people ?

  • By: Damien Regnault, Nicolas Schabanel, Eric Thierry
  • Number: RR2007-16
  • Get it (Web Archive)

An omega-power of a context-free language which is Borel above Delta0_omega

  • By: Jacques Duparc and Olivier Finkel
  • Number: RR2007-17
  • Get it (Web Archive)

Acyclicity of Preferences, Nash Equilibria, and Subgame Perfect Equilibria: a Formal and Constructive Equivalence

  • By: Stéphane Le Roux
  • Number: RR2007-18
  • Citation: Not published yet.
  • Size: 2+40p
  • Format: Portable Document Format
  • Get it
  • Get it (Web Archive)
  • Revisions: none

Horner's Rule-Based Multiplication over GF(p) and GF(p^n): A Survey

  • By: Jean-Luc Beuchat, Takanori Miyoshi, Jean-Michel Muller, Eiji Okamoto
  • Number: RR2007-19
  • Get it (Web Archive)

On the expressive power of planar perfect matching and permanents of bounded treewidth matrices

  • By: Uffe Flarup,Pascal Koiran, Laurent Lyaudet
  • Number: RR2007-20
  • Get it (Web Archive)

Comparison and tuning of MPI implementations in a grid context.

  • By: Ludovic Hablot
  • Number: RR2007-21
  • Date: May 2007
  • Abstract:

    Today, clusters are often interconnected by long distance networks within grids to offer a huge number of available ressources to a range of users. MPI, the standard communication library used to write parallel applications, has been implemented for clusters. Two main features of grids: long distance networks and technological heterogeneity, raise the question of MPI efficiency in grids. This report presents an evaluation of four recent MPI implementations (MPICH2, MPICH-Madeleine, OpenMPI and GridMPI) in the french research grid: Grid'5000. The comparison is based on the execution of pingpong, NAS Parallel Benchmarks and a real application in geophysics. We show that this implementations present performance differences. Executing MPI applications on the grid can be beneficial if the parameters are well tuned. The paper details the tuning required on each implementation to get the best performances.
  • Keywords:

    MPI, MPI implementations, Grid'5000, TCP tuning, MPI optimizations.
  • Citation: Not published yet.
  • Size: 2+19p
  • Format: Portable Document Format
  • Get it
  • Revisions: none

Reducing the I/O Volume in an Out-of-core Sparse Multifrontal Solver.

  • By: Emmanuel Agullo, Abdou Guermouche, Jean-Yves L'Excellent
  • Number: RR2007-22
  • Date: May 2007
  • Abstract:

    High performance sparse direct solvers are often a method of choice in various simulation problems. However, they require a large amount of memory compared to iterative methods. In this context, out-of-core solvers must be employed, where disks are used when the storage requirements are too large with respect to the physical memory available. In this paper, we study how to minimize the I/O requirements in the multifrontal method, a particular direct method to solve large-scale problems efficiently. From a theoretical point of view, we show that minimizing the storage requirement can lead to a huge volume of I/O compared to directly minimizing the I/O volume. Then experiments on large real-life problems also show that the volume of I/O obtained when minimizing the storage requirement can be significantly reduced by applying algorithms designed to reduce the I/O volume. We finally propose efficient memory management algorithms that can be applied to all the variants proposed.
  • Keywords:

    Sparse direct solver; out-of-core; large matrices; multifrontal method; IO volume minimization; memory management schemes.
  • Citation: Not published yet.
  • Size: 2+24p
  • Format: Compressed Postscript
  • Get it
  • Revisions: none

Computing Integer Powers in Floating-Point Arithmeti

  • By:Peter Kornerup, Vincent Lefèvre and Jean-Michel Muller
  • Number: RR2007-23
  • Get it (Web Archive)

End-host based mechanisms for implementing Flow Scheduling in GridNetworks.

  • By:Sebastien Soudan, Romaric Guillier, Pascale Primet
  • Number: RR2007-24
  • Date: May 2007
  • Abstract:

    In Grids, transfer jobs and network resources need to be managed in amore deterministic way than in the Internet. New approaches like flow scheduling are proposed and studied as alternatives to traditional QoS and reservation proposals. To enable such flow scheduling approaches, run-time mechanisms controlling flow sending time and rate have to be implemented in the data plane. This paper quantify and compares in a range of latency conditions, such end-host based mechanisms combined with transport protocols to instantiate different scheduling strategies. We show that, in high speed network, a single-rate scheduling strategy implemented by an AIMD-based protocol with packet pacing mechanism offers predictable performanceand is insensitive to latency. This paper also highlights the limits of other strategies and rate limitation mechanisms like token bucket which may present unpredictability and other drawbacks.
  • Keywords:

    Grid networks, flow scheduling, bulk data transfers, rate limitation, pacing.
  • Citation: Not published yet.
  • Size: 2+16p
  • Format: Portable Document Format
  • Get it
  • Revisions: none

Optimal routing for end-to-end guarantees: the price of multiplexing.

  • By: Anne Bouillard, Bruno Gaujal, Sebastien Lagrange, Eric Thierry
  • Number: RR2007-25
  • Get it (Web Archive)

A certified infinite norm for the implementation of elementary functions.

  • By:Sylvain Chevillard, Christoph Quirin Lauter
  • Number: RR2007-26
  • Date: May 2007
  • Abstract:

    The high-quality floating-point implementation of useful functions f : R -> R, such as exp, sin, erf requires bounding the error eps = (p-f)/f of an approximation p with regard to the function f. This involves bounding the infinite norm ||eps|| of the error function. Its value must not be underestimated when implementations must be safe. Previous approaches for computing infinite norm are shown to be either unsafe, not sufficiently tight or too tedious in manual work. We present a safe and self-validating algorithm for automatically upper- and lower-bounding infinite norms of error functions. The algorithm is based on enhanced interval arithmetic. It can overcome high cancellation and high condition number around points where the error function is defined only by continuous extension. The given algorithm is implemented in a software tool. It can generate a proof of correctness for each instance on which it is run.
  • Keywords:

    Certified infinite norm, interval arithmetic, elementary functions.
  • Citation: Not published yet.
  • Size: 2+11p
  • Format: Portable Document Format
  • Get it
  • Get it (Web Archive)
  • Revisions: This article is a new version of this one.

VPSPACE and a transfer theorem over the complex field

  • By: Pascal Koiran et Sylvain Perifel
  • Number: RR2007-27
  • Get it (Web Archive)

Elementary transformation analysis for Array-OL.

  • By:Paul Feautrier
  • Number: RR2007-28
  • Date: June 2007
  • Abstract:

    Array-OL is a high-level specification language dedicated to the definition of intensive signal processing applications. Several tools exist for implementing an Array-OL specification as a data parallel program. While Array-OL can be used directly, it is often convenient to be able to deduce part of the specification from a sequential version of the application. This paper proposes such an analysis and examines its feasibility and its limits.
  • Keywords:

    Array-OL, multidimensional signal processing, program analysis.
  • Citation: Not published yet.
  • Size: 2+7p
  • Format: Compressed Postscript
  • Get it
  • Get it (Web Archive)
  • Revisions: none

Scheduling multiple divisible loads on a linear processor network.

  • By:Matthieu Gallet, Yves Robert, Frédéric Vivien
  • Number: RR2007-29
  • Date: June 2007
  • Abstract:

    Min, Veeravalli, and Barlas have recently proposed strategies to minimize the overall execution time of one or several divisible loads on a heterogeneous linear network, using one or more installments. We show on a very simple example that their approach does not always produce a solution and that, when it does, the solution is often suboptimal. We also show how to find an optimal schedule for any instance, once the number of installments per load is given. Then, we formally state that any optimal schedule has an infinite number of installments under a linear cost model as the one assumed in the original papers. Therefore, such a cost model cannot be used to design practical multi-installment strategies. Finally, through extensive simulations we confirmed that the best solution is always produced by the linear programming approach, while solutions of the original papers can be far away from the optimal..
  • Keywords:

    scheduling, heterogeneous processors, divisible loads, single-installment, multiple-installments.
  • Citation: Not published yet.
  • Size: 2+11p
  • Format: Portable Document Format
  • Get it
  • Revisions: none

Complete Lattices and Up-to Techniques (extended version, with proofs)

  • By: Damien Pous
  • Number: RR2007-30
  • Get it (Web Archive)

On Bisimulation Proofs for the Analysis of Distributed Abstract Machines

  • By: Damien Pous
  • Number: RR2007-31
  • Get it (Web Archive)

Multi-criteria scheduling of pipeline workflows.

  • By:Anne Benoit, Veronika Rehn-Sonigo, Yves Robert
  • Number: RR2007-32
  • Date: June 2007
  • Abstract:

    Mapping workflow applications onto parallel platforms is a challenging problem, even for simple application patterns such as pipeline graphs. Several antagonist criteria should be optimized, such as throughput and latency (or a combination). In this paper, we study the complexity of the bi-criteria mapping problem for pipeline graphs on communication homogeneous platforms. In particular, we assess the complexity of the well-known chains-to-chains problem for different-speed processors, which turns out to be NP-hard. We provide several efficient polynomial bi-criteria heuristics, and their relative performance is evaluated through extensive simulations.
  • Keywords:

    algorithmic skeletons, pipeline, multi-criteria optimization, complexity results, heuristics, heterogeneous platforms.
  • Citation: Not published yet.
  • Size: 2+17p
  • Format: Portable Document Format
  • Get it
  • Revisions: none

There exist some omega-powers of any Borel rank.

  • By: Olivier Finkel, Dominique Lecomte
  • Number: RR2007-33
  • Get it (Web Archive)

Topological complexity of locally finite omega-languages.

  • By: Olivier Finkel
  • Number: RR2007-34
  • Get it (Web Archive)

Towards a User-Oriented Benchmark for Transport Protocols Comparison in very High Speed Networks.

  • By: Romaric Guillier, Ludovic Hablot, Pascale Vicat-Blanc Primet
  • Number: RR2007-35
  • Date: July 2007
  • Abstract:

    Standard TCP faces some performance limitations in very high speed wide area networks, mainly due to a long end-to-end feedback loop and a conservative behaviour with respect to congestion. Many TCP variants have been proposed to overcome these limitations. However, TCP is a complex protocol with many user-configurable parameters and a range of different implementations. It is then important to define measurement methods so that the transport services and protocols can evolve guided by scientific principles and compared quantitatively. The goal of this report is to present some steps towards a user-oriented benchmark, called ITB, for high speed transport protocols comparison. We first present and analyse some results reported in the literature. From this study we identify classes of representative applications and useful metrics. We then isolate infrastructure parameters and traffic factors which influence the protocol behaviour. This enable us to define scenario capturing and synthesising comprehensive and useful properties. We finally illustrate this proposal by preliminary results obtained on our experimental environment, Grid'5000, we have built and are using for contributing in this benchmark design.
  • Keywords:

    Protocol Benchmark, TCP, Performance evaluation, High Speed transport, High Speed networks.
  • Citation: Not published yet.
  • Size: 2+28p
  • Format: Portable Document Format
  • Get it
  • Revisions: none

An efficient rounding boundary test for pow(x,y) in double precision.

  • By: Christoph Lauter, Vincent Lefèvre
  • Number: RR2007-36
  • Get it (Web Archive)

TCP Variants and Transfer Time Predictability in Very High Speed Networks.

  • By: Romaric Guillier, Sébastien Soudan, Pascale Vicat-Blanc Primet
  • Number: RR2007-37
  • Date: August 2007
  • Abstract:

    In high performance distributed computing applications, data movements have demanding performance requirements such as reliable and predictable delivery. Predicting the throughput of large transfers is very difficult in paths that are heavily loaded with just a few big flows. This report explores how current high speed transport protocols behave and may improve transfer time predictability of gigabits of data among endpoints in a range of conditions. In a fully controlled long distance 10~Gbps network testbed, we compare several TCP variants behaviour in presence of diverse congestion level and reverse traffic situations. We show that these factors have a very strong impact on transfer time predictability of several transport protocols.
  • Keywords:

    Bulk Data Transfers, Bandwidth Sharing, Transfer Delay Predictability, Transport Protocol Experimentation.
  • Citation: Not published yet.
  • Size: 2+30p
  • Format: Portable Document Format
  • Get it
  • Revisions: none

Classical and Effective Descriptive Complexities of omega-Powers.

  • By: Olivier Finkel and Dominique Lecomte
  • Number: RR2007-38
  • Get it (Web Archive)

Snap-stabilizing Prefix Tree for Peer-to-peer Systems.

  • By: Eddy Caron , Frédéric Desprez , Franck Petit, Cédric Tedeschi
  • Number: RR2007-39
  • Date: September 2007
  • Abstract:

    Resource Discovery is a crucial issue in the deployment of computational grids over large scale peer-to-peer platforms. Because they efficiently allow range queries, Tries, also knonw as Prefix Trees appear to be among promising ways in the design of distributed data structures indexing resources. Self-stabilization is an efficient approach in the design of reliable solutions for dynamic systems. A snap-stabilizing algorithm guarantees that it always behaves according to its specification. In other words, a snap-stabilizing algorithm is also a self-stabilizing algorithm which stabilizes in 0 steps. In this paper, we provide the first snap-stabilizing protocol for trie construction. We design particular tries called Proper Greatest Common Prefix (PGCP) Tree. The proposed algorithm arranges the n label values stored in the tree, in average, in O(h+h') rounds, where h and h' are the initial and final heights of the tree, respectively. In the worst case, the algorithm requires an O(n) extra space on each node, O(n) rounds and O(n^2) actions. However, simulations show that, using relevant data sets, this worst case is far from being reached and confirm the average complexities, making this algorithm efficient in practice.
  • Keywords:

    Peer-to-peer systems, Fault-tolerance, Self-stabilization, Snap-stabilization, Grid computing.
  • Citation: Not published yet.
  • Size: 2+14p
  • Format: Portable Document Format
  • Get it
  • Get it (Web Archive)
  • Revisions: none

When FPGAs are better at floating-point than microprocessors.

  • By: Florent de Dinechin, Jérémie Detrey, Octavian Cret, Radu Tudoran
  • Number: RR2007-40
  • Get it (Web Archive)

Improvements to Conservative and Optimistic Register Coalescing.

  • By: Florent Bouchez, Alain Darte, Fabrice Rastello
  • Number: RR2007-41
  • Get it (Web Archive)

On the Complexity of Spill Everywhere under SSA Form.

  • By: Florent Bouchez, Alain Darte, Fabrice Rastello/li>
  • Number: RR2007-42
  • Get it (Web Archive)

Optimizing Latency and Reliability of Pipeline Workflow Applications.

  • By: Anne Benoit, Veronika Rehn-Sonigo, Yves Robert
  • Number: RR2007-43
  • Date: November 2007
  • Abstract:

    Mapping applications onto heterogeneous platforms is a difficult challenge, even for simple application patterns such as pipeline graphs. The problem is even more complex when processors are subject to failure during the execution of the application. In this paper, we study the complexity of a bi-criteria mapping which aims at optimizing the latency (\em i.e., the response time) and the reliability (\em i.e., the probability that the computation will be successful) of the application. Latency is minimized by using faster processors, while reliability is increased by replicating computations on a set of processors. However, replication increases latency (additional communications, slower processors). The application fails to be executed only if all the processors fail during execution. While simple polynomial algorithms can be found for fully homogeneous platforms, the problem becomes NP-hard when tackling heterogeneous platforms. This is yet another illustration of the additional complexity added by heterogeneity.
  • Keywords:

    Heterogeneity, scheduling, complexity results, reliability, response time.
  • Citation: Not published yet.
  • Size: 2+11p
  • Format: Portable Document Format
  • Get it
  • Format: A new version of this article is available here.

Lattice-Based Array Contraction: From Theory to Practice.

  • By: Christophe Alias, Fabrice Baray, and Alain Darte
  • Number: RR2007-44
  • Date: November 2007
  • Abstract:

    We build on prior work on intra-array memory reuse, for which a general theoretical framework was proposed based on lattice theory. Intra-array memory reuse is a way of reducing the size of a temporary array by folding, thanks to affine mappings and modulo operations, reusing memory locations when they contain a value not used later. We describe the algorithms needed to implement such a strategy. Our implementation has two parts. The first part, Bee, uses the source-to-source transformer ROSE to extract from the program all necessary information on the lifetime of array elements and to generate the code after memory reduction. The second part, Cl@k, is a stand-alone mathematical tool dedicated to optimizations on polyhedra, in particular the computation of successive minima and the computation of good admissible lattices, which are the basis for lattice-based memory reuse. Both tools are developed in C++ and use linear programming and polyhedra manipulations. They can be used either for program optimizations - e.g, to limit memory expansion introduced for parallelization - or in high-level synthesis - e.g., to design memories between communicating hardware accelerators.
  • Keywords:

    Memory Reduction, Program Analysis and Source Transformations, Integer Lattices.
  • Citation: Not published yet.
  • Size: 2+26p
  • Format: Portable Document Format
  • Get it
  • Revisions: none

Fast Liveness Checking for SSA-Form Programs.

  • By: Benoit Boissinot, Sebastian Hack, Daniel Grund, Benoît Dupond de Dinechin, Fabrice Rastello
  • Number: RR2007-45
  • Get it (Web Archive)

Fairness Issues When Transferring Large Volumes of Data on High Speed Networks With Router-Assisted Transport Protocols.

  • By: Dino M. Lopez Pacheco , Laurent Lefèvre , Congduc Pham
  • Number: RR2007-46
  • Date: December 2007
  • Abstract:

    This report presents improvements of the eXplicit Control Protocol (XCP) in order to support fairness between XCP and TCP flows. Our mechanisms are based on two main components: estimations of the resources needed by XCP, and limitation of the resources taken by TCP. The set of simulations shown in this report proves that we can guarantee a fairness level between XCP and TCP. We think that our TCP-XCP fairness mechanism can be perfectly applied for Long Distance Data Grids, where senders need to transfer large volumes of data.
  • Keywords:

    TCP, XCP, routers-assisted protocols, inter-protocol fairness, active flow number estimations.
  • Citation: Not published yet.
  • Size: 2+11p
  • Format: Portable Document Format
  • Get it
  • Revisions: none

XCP-i: ``eXplicit Control Protocol'' pour l'interconnexion de réseaux haut-débit hétérogènes.

  • By: Dino M. Lopez Pacheco, Laurent Lefèvre, Congduc Pham
  • Number: RR2007-47
  • Date: December 2007
  • Abstract:

    eXplicit Control Protocol (XCP) est un protocole de transport qui contrôle efficacement l'évolution de la fenêtre de congestion de l'émetteur, évitant ainsi la phase de slow-start et d'évitement de congestion. XCP nécessite cependant la collaboration de tous les routeurs sur le chemin de la source vers le récepteur, ce qui est pratiquement impossible à réaliser en réalité, sinon ses performances peuvent être beaucoup moins bonnes que celles de TCP. Cette forte dépendance de XCP en des routeurs spécialisés limite considérablement l'intérêt de déployer des routeurs XCP sur une portion du réseau. Nous proposons dans cet article une extension de XCP, appelée XCP-i, qui permet d'interconnecter des nuages XCP avec des nuages non-XCP sans perdre le bénéfice du contrôle précis de XCP sur la fenêtre de congestion. Les résultats de simulation sur des topologies correspondant typiquement à des scénarios de déploiement incrémental montrent que les performances de XCP-i sont bien supérieures à celles de TCP sur des liens à haut-débit.
  • Keywords:

    Réseaux haut-débit hétérogènes, nuage non-XCP, routeur virtuel XCP.
  • Citation: Not published yet.
  • Size: 2+11p
  • Format: Portable Document Format
  • Get it
  • Revisions: none

Offline and online scheduling of concurrent bags-of-tasks on heterogeneous platforms.

  • By: Anne Benoit, Loris Marchal, Jean-François Pineau, Yves Robert, Frédéric Vivien
  • Number: RR2007-48
  • Date: December 2007
  • Abstract:

    Scheduling problems are already difficult on traditional parallel machines. They become extremely challenging on heterogeneous clusters, even when embarrassingly parallel applications are considered. In this paper we deal with the problem of scheduling multiple applications, made of collections of independent and identical tasks, on a heterogeneous master-worker platform. The applications are submitted online, which means that there is no a priori (static) knowledge of the workload distribution at the beginning of the execution. The objective is to minimize the maximum stretch, i.e. the maximum ratio between the actual time an application has spent in the system and the time this application would have spent if executed alone. On the theoretical side, we design an optimal algorithm for the offline version of the problem (when all release dates and application characteristics are known beforehand). We also introduce several heuristics for the general case of online applications. On the practical side, we have conducted extensive simulations and MPI experiments, showing that we are able to deal with very large problem instances in a few seconds. Also, the solution that we compute totally outperforms classical heuristics from the literature, thereby fully assessing the usefulness of our approach.
  • Keywords:

    Heterogeneous master-worker platform, online scheduling, multiple applications.
  • Citation: Not published yet.
  • Size: 2+36p
  • Format: Portable Document Format
  • Get it
  • Revisions: none

Large Scale Execution of a Bioinformatic Application on a Volunteer Grid.

  • By: Viktors Bertis , Raphaël Bolze , Frédéric Desprez , Kevin Reed
  • Number: RR2007-49
  • Date: December 2007
  • Abstract:

    Large volunteer desktop platforms are now available for several applications. This paper presents the work that we did to prepare the first phase of the Help Cure Muscular Dystrophy project to run on \wcg. The project was launched on December 19, 2006, and took 26 weeks to complete. We present performance evaluation of the overall execution and compare a volunteer grid with a dedicated one.
  • Keywords:

    desktop computing, docking application, \wcg, grid performance evaluation, grids comparison.
  • Citation: Not published yet.
  • Size: 2+16p
  • Format: Portable Document Format
  • Get it
  • Revisions: none

Dynamic Logic of Common Knowledge in a Proof Assistant.

  • By: Pierre Lescanne et Jérôme Puisségur
  • Number: RR2007-50
  • Get it (Web Archive)

Common knowledge logic in a higher order proof assistant?.

  • By: Pierre Lescanne
  • Number: RR2007-51
  • Get it (Web Archive)