### Simultaneous Scheduling of Replication and Computation for Data-Intensive Applications on the Grid.

• By: Frédéric Desprez, Antoine Vernois
• Number: RR2005-01
• Date: January 2005
• Abstract:

One of the first motivations of using grids comes from applications managing large data sets like for example in High Energy Physic or Life Sciences. To improve the global throughput of software environments, replicas are usually put at wisely selected sites. Moreover, computation requests have to be scheduled among the available resources. To get the best performance, scheduling and data replication have to be tightly coupled which is not always the case in existing approaches. This paper presents an algorithm that combines data management and scheduling at the same time using a steady-state approach. Our theoretical results are validated using simulation and logs from a large life science application (ACI GRID GriPPS).
• Keywords:

scheduling, replication, data management, grid computing.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+17p
• Format: pdf
• Get it
• Revisions: none

### A study of various load information exchange mechanisms for a distributed application using dynamic scheduling.

• By: Abdou Guermouche, Jean-Yves L'Excellent
• Number: RR2005-02
• Date: January 2005
• Abstract:

We consider a distributed asynchronous system where processes can only communicate by message passing and need a coherent view of the load (e.g., workload, memory) of others to take dynamic decisions (scheduling). We present several mechanisms to obtain a distributed view of such information, based either on maintaining that view or demand-driven with a snapshot algorithm. We perform an experimental study in the context of a real application, an asynchronous parallel solver for large sparse systems of linear equations.
• Keywords:

Snapshot, distributed system, dynamic scheduling, load balancing, message passing.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+14p
• Format: pdf
• Get it
• Revisions: none

### Computing the rank and a small nullspace basis of a polynomial matrix.

• By: A. Storjohann (U. Waterloo), Gilles Villard
• Number: RR2005-03
• Date: January 2005
• Abstract:

We reduce the problem of computing the rank and a nullspace basis of a univariate polynomial matrix to polynomial matrix multiplication. For an input n x n matrix of degree d over a field K we give a rank and nullspace algorithm using about the same number of operations as for multiplying two matrices of dimension n and degree d. If the latter multiplication is done in MM(n,d)=softO((n^omega)d) operations, with omega the exponent of matrix multiplication over K, then the algorithm uses softO(MM(n,d)) operations in K. For m x n matrices of rank r and degree d, the cost expression is softO(nmr^(omega -2)d). The softO notation indicates some missing logarithmic factors. The method is randomized with Las Vegas certification. We achieve our results in part through a combination of matrix Hensel high-order lifting and matrix minimal fraction reconstruction, and through the computation of minimal or small degree vectors in the nullspace seen as a K[x]-module.
• Keywords:

Linear algebra, polynomial matrix, matrix multiplication, nullspace basis, minimal polynomial basis, randomized algorithm.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+24p
• Format: pdf
• Get it
• Revisions: none

### Fully asynchronous behavior of double-quiescent elementary cellular automata.

• By: Nazim Fatès , Michel Morvan , Nicolas Schabanel , Éric Thierry
• Number: RR2005-04
• Date: February 2005
• Abstract:

In this paper we propose a probabilistic analysis of the fully asynchronous behavior (i.e., two cells are never simultaneously updated, as in a continuous time process) of elementary finite cellular automata (i.e.,\0,1\ states, radius 1 and unidimensional) for which both states are quiescent (i.e., (0,0,0) \mapsto 0 and (1,1,1) \mapsto 1). It has been experimentally shown in previous works that introducing asynchronism in the global function of a cellular automata was perturbing its behavior, but as far as we know, only few theoretical work exists on the subject. The cellular automata we consider live on a ring of size n and asynchronism is introduced as follows: at each time step one cell is selected uniformly at random and the transition is made on this cell while the others stay in the same state. Among the sixty-four cellular automata belonging to the class we consider, we show that nine of them diverge almost surely on all non-trivial configurations while the fifty-five other converge almost surely to a random fixed point. We show that the exact convergence time of these fifty-five automata can only take the following values: either 0, \Theta(n \ln n), \Theta(n^2), \Theta(n^3), or \Theta(n2^n). Furthermore, the global behavior of each of these cellular automata is fully determined by reading its code.
• Keywords:

cellular automata, discrete dynamical systems, convergence, stochastic process.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+20p
• Format: pdf
• Get it
• Revisions: none

### Multiplication Algorithms for Radix-$2$ RN-Codings and Two's Complement Numbers.

• By: Jean-Luc Beuchat, Jean-Michel Muller
• Number: RR2005-05
• Date: February 2005
• Abstract:

The RN-codings are particular cases of signed-digit representations, for which rounding to the nearest is always identical to truncation. In radix $2$, Booth recoding is an RN-coding. In this paper, we suggest several multiplication algorithms able to handle RN-codings, and we analyze their properties.
• Keywords:

Computer arithmetic, RN-codings, (modified) Booth recoding, rounding to the nearest.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+11p
• Format: pdf
• Get it
• Revisions: none

### GoDIET: A tool for managing distributed hierarchies of DIET agents and servers.

• By: Eddy Caron, Holly Dail
• Number: RR2005-06
• Date: February 2005
• Abstract:

The Distributed Interactive Engineering Toolbox (DIET) is an Application Service Provider (ASP) platform providing remote execution of computational problems on distributed, heterogeneous resources. Traditional ASP toolkits are based on a single, centralized scheduling agent coordinating computation requests from clients with service offerings from servers. DIET is based on a hierarchy of agents that collaborate to perform scheduling decisions; we are exploring the benefits of hierarchical agent architectures for scalability and adaptation to heterogeneous network performance. This paper describes GoDIET, a new tool for the configuration, launch, and management of DIET on computational grids. Users of GoDIET write an XML file describing their available compute and storage resources and the desired overlay of DIET agents and servers onto those resources. GoDIET automatically generates and stages all necessary configuration files, launches agents and servers in appropriate hierarchical order, reports feedback on the status of running components, and allows shutdown of all launched software. We present a series of experiments that permit the evaluation of the performance of GoDIET for several launch and management approaches. We also evaluate the robustness of the DIET platform for a large number of servers.
• Keywords:

Deployment, PSE, Network enabled server, Grid computing.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+13p
• Format: pdf
• Get it
• Revisions: none

### Experiences with hierarchical request flow management for Network-Enabled Server Environments.

• By: Holly Dail, Frédéric Desprez
• Number: RR2005-07
• Date: February 2005
• Abstract:

DIET (Distributed Interactive Engineering Toolbox) is a toolbox for the construction of Network Enabled Server systems. Client requests for computational services are automatically matched with available resources by a distributed hierarchy of scheduling agents. Traditionally NES systems have used a centralized agent for performance prediction and resource information management; in DIET, these services are distributed by providing them at the DIET server level. DIET has traditionally offered an online scheduling model whereby all requests are scheduled immediately or refused. This approach can overload interactive servers in high-load conditions and does not allow adaptation of the schedule to task or data dependencies. In this article we consider an alternative model based on active management of the flow of requests throughout the system. We have added support for (1) limiting the number of concurrent requests on interactive servers, (2) server and agent-level queues, and (3) window-based scheduling algorithms whereby the request release rate to servers can be controlled and some re-arrangement of request to host mappings is possible. We present experiments demonstrating that these approaches can improve performance and that the overheads introduced are not significantly different from those of the standard DIET approach.
• Keywords:

Scheduling, Grid, Window, Distributed, Hierarchical.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+18p
• Format: pdf
• Get it
• Revisions: none

### Small FPGA polynomial approximations with 3-bit coefficients and low-precision estimations of the powers of x.

• By: Romain Michard, Arnaud Tisserand, Nicolas Veyrat-Charvillon
• Number: RR2005-08
• Date: February 2005
• Abstract:

This paper presents small FPGA implementations of low precision polynomial approximations of functions without multipliers. Our method uses degree-2 or degree-3 polynomial approximations with at most 3-bit coefficients and low precision estimations of the powers of x. Here, we denote by 3-bit coefficients values with at most 3 non-zero and possibly non-contiguous signed bits (e.g. 1.001000-1). This leads to very small operators by replacing the costly multipliers by a small number of additions. Our method provides approximations with very low average error and is suitable for signal processing applications.
• Keywords:

computer arithmetic, hardware arithmetic operator, polynomial evaluation, digital signal application.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+9p
• Format: Compressed postscript
• Get it
• Revisions: none

### On the definition of ulp(x).

• By: Jean-Michel Muller
• Number: RR2005-09
• Date: February 2005
• Abstract:

Function ulp (acronym for unit in the last place) is frequently used for expressing errors in floating-point computations. We present several previously suggested definitions of that function, and analyse some of their properties.
• Keywords:

Computer arithmetic, floating-point arithmetic, unit in the last place, ULP.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+11p
• Format: pdf
• Get it
• Revisions: none

### Optimizing Network Resource Sharing in Grids.

• By: Loris Marchal, Yves Robert, Pascale Vicat-Blanc Primet, Jingdi Zeng
• Number: RR2005-10
• Date: March 2005
• Abstract:

While grid computing reaches further to geographically separated clusters, data warehouses, and disks, it poses demanding requirements on end-to-end performance guarantee. Its pre-defined destinations and service criteria ease the performance control; however, expensive resources and equipments used by grid applications determine that optimal resource sharing, especially at network access points, is critical. From the resource reservation perspective, this article looks at communication resources shared by grid sites. Two resource request scenarios have been identified, aiming at optimizing the request accept rate and resource utilization. The optimization problems, proven NP-complete, are then solved by heuristic algorithms. Simulation results, aside from showing satisfying results, illustrate the pros and cons of each algorithm.
• Keywords:

grid computing, communication resource, resource sharing, optimization.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+14p
• Format: pdf
• Get it
• Revisions: none

### The language X: circuits, computations and Classical Logic.

• By: Steffen van Bakel, Stéphane Lengrand, Pierre Lescanne
• Number: RR2005-11
• Date: March 2005
• Abstract:

X is an untyped language for describing circuits by composition of basic components. This language is well suited to describe structures which we call "circuits" and which are made of parts that are connected by wires. Moreover X gives an expressive platform on which algebraic objects and many different (applicative) programming paradigms can be mapped. In this paper we will present the syntax and reduction rules for X and some its potential uses. To demonstrate the expressive power of X, we will show how, even in an untyped setting, elaborate calculi can be embedded, like the naturals, the $`l$-calculus, Bloe and Rose's calculus of explicit substitutions lambda-x, Parigot's lambda-mu and Curien and Herbelin's lambda-mu-mu~.
• Keywords:

Language design, mobility, circuits, classical logic, Curry-Howard correspondance.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+25p
• Format: pdf
• Get it
• Revisions: none

### Perfect Sampling for Fork-Join networks.

• By: Anne Bouillard, Bruno Gaujal
• Number: RR2005-12
• Date: March 2005
• Abstract:

In this paper, we show how to design a perfect simulation for Markovian fork-join networks, or equivalently, free-choice Petri nets. For pure fork-join networks and for event graphs, the simulation time can be greatly reduced by using extremal initial states, namely blocking states, although such nets do not exhibit any natural monotonicity property. Another approach for perfect simulation of pure fork-join networks is based on a (max,plus) representation of the system. For that, we show how the theory of (max,plus) stochastic systems can be used to provide perfect samplings. Finally, experimental runs show that the (max,plus) approach couples within fewer steps but needs a larger simulation time than the Markovian approach.
• Keywords:

Perfect simulation, Petri nets, fork and join.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+14p
• Format: pdf
• Get it
• Revisions: none

### A study of meta-scheduling architectures for high throughput computing.

• By: Eddy Caron , Vincent Garonne , Andreï Tsaregorodtsev
• Number: RR2005-13
• Date: March 2005
• Abstract:

In this paper we present a model and a simulator for large-scale system. Such platforms are composed of heterogeneous clusters of PCs belonging to a local network. These clusters are then connected to each other through a global network. Moreover each cluster is managed via a local scheduler and is shared by many users. We validate our simulator by comparing the experimental results and the analytical results of a M/M/4 queuing system. These studies indicate that the simulator is consistent. After that we do the comparison with a real batch system and we obtain a mean error of 10.5 \% for the response time and 12 \% for the makespan. We conclude that our simulator is realistic and describes well the behavior of a large-scale system. Thus we can study the scheduling of our system called \dirac in a high throughput context. We justify our decentralized, adaptive and opportunistic approach in comparison to a centralized approach in such a context.
• Keywords:

Metascheduler, Grid computing.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+13p
• Format: Compressed postscript
• Get it
• Revisions: none

### Extremal throughputs in free-choice nets.

• By: Anne Bouillard, Bruno Gaujal, Jean Mairesse
• Number: RR2005-14
• Date: April 2005
• Abstract:

We give a method to compute the throughput in a timed live and bounded free-choice Petri net under a total allocation (i.e. a 0-1 routing). We also characterize and compute the conflict-solving policies that achieve the smallest throughput in the special case of a 1-bounded net. They do not correspond to total allocations, but still have a small period.
• Keywords:

Free-choice Petri nets, timed and routed Petri nets, throughput.
• Availability: Electronic copy only.
• Citation: Short version without proofs: In 26th International Conference On Application and Theory of Petri Nets, LNCS, Springer-Verlag, 2005.
• Size: 2+20p
• Format: Compressed postscript
• Get it
• Revisions: none

### Hardware/Software Interface for Multi-Dimensional Processor Arrays..

• By: Alain Darte, Steven Derrien, Tanguy Risset
• Number: RR2005-15
• Date: April 2005
• Abstract:

On most recent systems on chip, the performance bottleneck is the on-chip communication medium, bus or network. Multimedia applications require a large communication bandwidth between the processor and graphic hardware accelerators, hence an efficient communication scheme using burst mode is mandatory. In the context of data-flow hardware accelerators, we approach this problem as a classical resource-constrained problem. We explain how to use recent optimization techniques so as to define a conflict-free schedule of input/output for multi-dimensional processor arrays (e.g., 2D grids). This schedule is static and allows us to perform further optimizations such as grouping successive data in packets to operate in burst mode. We also present an effective VHDL implementation on FPGA and compare our approach to a run-time congestion resolution showing important gains in hardware area.
• Keywords:

Hardware accelerators, interface, commmunications, processor arrays, bus arbiter, resource constrained scheduling.
• Availability: Electronic copy only.
• Citation: See ASAP'05.
• Size: 2+16p
• Format: pdf
• Get it
• Revisions: none

### Up-to Techniques for Weak Bisimulation.

• By: Damien Pous
• Number: RR2005-16
• Date: April 2005
• Abstract:

Up-to techniques have been introduced to enhance the bisimulation proof method for establishing bisimilarity results. While up-to techniques for strong bisimilarity are well understood, in the weak case they come as a collection of unrelated results, and lack a unified presentation.
We propose a uniform and modular theory of up-to techniques for weak bisimulation that captures existing proof technology and introduces new techniques. Some proofs rely on non trivial -- and new -- commutation results based on termination guarantees.
The flexibility and usefulness of our framework is illustrated on two examples, one of them coming from a recent study of a distributed abstract machine.
• Keywords:

behavioural equivalence, bisimulation, termination, commutation diagrams.
• Availability: Electronic copy only.
• Citation: To appear in the proceedings of ICALP 2005.
• Size: 2+20p
• Format: pdf
• Get it
• Revisions: none

### The Quantum Query Complexity of the Abelian Hidden Subgroup Problem.

• By: Pascal Koiran, Vincent Nesme, Natacha Portier
• Number: RR2005-17
• Date: April 2005
• Abstract:

Simon in his FOCS'94 paper was the first to show an exponential gap between classical and quantum computation. The problem he dealt with is now part of a well-studied class of problems, the hidden subgroup problems. We study Simon's problem from the point of view of quantum query complexity and give here a first nontrivial lower bound on the query complexity of a hidden subgroup problem, namely Simon's problem. Our bound is optimal up to a constant factor. We also show how, as a consequence, this gives us the query complexity of the Abelian hidden subgroup problem, up to a constant factor. At last we expose some elementary facts about complexity in weaker query models.
• Keywords:

quantum computation, query complexity,hidden subgroup, Simon's problem, lower bound.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+12p
• Format: Compressed postscript
• Get it

### An Efficient Network API for in-Kernel Applications in Clusters.

• By: Brice Goglin, Olivier Glück, Pascale Vicat-Blanc Primet
• Number: RR2005-18
• Date: April 2005
• Abstract:

Running parallel applications on clusters with high-speed local networks requires fast communication between computing nodes but also low latency and high bandwidth file access. However, the application programming interfaces of high-speed local networks were designed for MPI communication and do not always meet the requirements of other applications like distributed file systems. In this paper, we explore several solutions to improve the use of high-speed network for in-kernel applications. Distributed file systems implemented on top of the GM interface of Myrinet are first examined to demonstrate how hard it is to get an efficient interaction between such applications and the network. Then, we propose solutions to simplify and improve this interaction and integrate them into the kernel interface of the new Myrinet driver, MX. Performance comparisons between MX and GM, and their usage in both a distributed file system and a zero-copy protocol show nice improvements. Moreover, we are able to improve the performance of the flexible kernel API we designed in MX that allows to remove some intermediate copy.
• Keywords:

High-speed local networks, in-kernel network API, distributed file systems, zero-copy socket protocols.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 3+16p
• Format: pdf
• Get it
• Revisions: none

### Structure of spaces of rhombus tilings in the lexicograhic case.

• By: Eric Rémila
• Number: RR2005-19
• Date: May 2005
• Abstract:

We study a class of lexicographic rhombus tilings of zonotopes, which are deduced from higher Bruhat orders relaxing the unitarity condition. We prove that a space of such tilings is a graded poset with minimal and maximal element.
• Keywords:

rhombus tiling, flip, connectivity.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+26p
• Format: Compressed postscript
• Get it
• Revisions: none

### Component-oriented programming with sharing: containment is not ownership.

• By: Daniel Hirschkoff, Tom Hirschowitz, Damien Pous, Alan Schmitt, and Jean-Bernard Stefani
• Number: RR2005-20
• Date: May 2005
• Abstract:

Component-oriented programming yields a tension between higher-order features (deployment, reconfiguration, passivation), encapsulation, and component sharing. We propose a programming discipline for component-oriented programming to address this issue, and we define a process calculus whose operational semantics embodies this programming discipline. We present several examples that illustrate how the calculus supports component sharing, while allowing strong encapsulation and higher-order primitives.
• Keywords:

Component-oriented programming, higher-order process algebras, component sharing.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+12p
• Format: Compressed postscript
• Get it
• Revisions: none

### Scheduling divisible loads with return messages on heterogeneous master-worker platforms.

• By: Olivier Beaumont, Loris Marchal, Yves Robert
• Number: RR2005-21
• Date: May 2005
• Abstract:

In this paper, we consider the problem of scheduling independent tasks, or divisible loads, onto an heterogeneous star platform, with both heterogeneous computing and communication resources. We consider the case where the workers, after processing the tasks, send back some results to the master processor. This corresponds to a more general framework than the one used in many divisible load papers, where only forward communications are taken into account. To the best of our knowledge, this paper constitutes the first attempt to derive optimality results under this general framework (forward and backward communications, heterogeneous processing and communication resources). We prove that it is possible to derive the optimal solution both for LIFO and FIFO distribution schemes. Nevertheless, the complexity of the general problem remains open: we also prove in the paper that the optimal distribution scheme may be neither LIFO nor FIFO.
• Keywords:

Scheduling, divisible load, master-worker platform, heterogeneous cluster..
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+19p
• Format: pdf
• Get it
• Revisions: none

### A practical type system for generalized recursion.

• By: Tom Hirschowitz, Serguei Lenglet
• Number: RR2005-22
• Date: May 2005
• Abstract:

The ML language is equipped with a sophisticated module system, especially thanks to its notions of functor (higher-order functions on modules) and of controlled type abstraction (opaque or transparent types). Nevertheless, an important weakness of this system hinders modularization: the impossibility to define mutually recursive modules. In particular, mutually recursive functions must all reside in the same module. Recently, Leroy extended the OCaml language, a dialect of ML, with an unsafe notion of recursive modules. In this extension, one can define recursive modules, but the system does not check that they are well-founded. If not, the system throws an exception at runtime, which is annoying given the strong typing of ML. Powerful type systems have been proposed to tackle this issue, but they require rather deep modifications to the ML type system. This report presents a system requiring only local modifications to the ML type system. We prove its soundness by injection into one of the evoked more general formalisms.
• Keywords:

Programming languages, semantics, typing, modularity, recursion.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+50p
• Format: Compressed postscript
• Get it
• Revisions: none

### DIET: A Scalable Toolbox to Build Network Enabled Servers on the Grid.

• By: Eddy Caron , Frédéric Desprez
• Number: RR2005-23
• Date: June 2005
• Abstract:

Among existing grid middleware approaches, one simple, powerful, and flexible approach consists of using servers available in different administrative domains through the classical client-server or Remote Procedure Call (RPC) paradigm. Network Enabled Servers implement this model also called GridRPC. Clients submit computation requests to a scheduler whose goal is to find a server available on the grid. The aim of this paper is to give an overview of a middleware developed in the GRAAL team called DIET (http://graal.ens-lyon.fr/DIET. DIET (Distributed Interactive Engineering Toolbox) is a hierarchical set of components used for the development of applications based on computational servers on the grid.
• Keywords:

Grid Computing, Network Enabled Servers, Client-Server Computing.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+20p
• Format: Portable Document Format
• Get it
• Revisions: none

### Markov chain analysis of an agent-based growth model.

• By: Bruno Gaujal, Eric Thierry
• Number: RR2005-24
• Date: May 2005
• Abstract:

In this paper we investigate the asymptotic behavior of a discrete and probabilistic dynamical system which can be described as a growth model where autonomous agents aggregates. The aim of this paper is to give a mathematical analysis of the dynamics. The analysis uses face homogeneous Markov chains and thanks to this study we validate a conjecture set by Laszlo Gulyas concerning a growth model for cities where simulations had shown that the sizes of the cities asymptotically distribute as a Zipf's law. In light of our analysis, we discuss how the emergence of such a Zipf's law could be expected in Gulyas' model and in its variants.
• Keywords:

Discrete probabilistic dynamics, face homogeneous Markov chains, Zipf's law.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+13p
• Format: Compressed postscript
• Get it
• Revisions: none

### Statistical study of the activity due to the selection function in the E-method algorithm.

• By: Romain Michard, Arnaud Tisserand, Nicolas Veyrat-Charvillon
• Number: RR2005-25
• Date: May 2005
• Abstract:

This work is a statistical study of the activity due to the selection function in the polynomial approximation algorithm called E-method and proposed by M. Ercegovac in [Erc75,Erc77]. The latitude in the choice of the result digits in the selection function, when using a redundant representation, allows us to consider a reduced electrical activity in some cases. This article presents the begining of a study on the profits in such a situation.
• Keywords:

computer arithmetic, low-power consumption, polynomial evaluation, E-method.
• Availability: Electronic copy only (in French).
• Citation: Published in FTFC 2005.
• Size: 2+6p
• Format: Portable Document Format
• Get it
• Revisions: none

### Automatic Middleware Deployment Planning on Clusters.

• By: Eddy Caron, Pushpinder Kaur Chouhan, Holly Dail
• Number: RR2005-26
• Date: May 2005
• Abstract:

The use of many distributed, heterogeneous resources as a large collective resource offers great potential and has become an increasingly popular idea. A key issue for these Grid platforms is middleware scalability and how middleware services can best be mapped to the resource platform structure. Optimizing deployment is a difficult problem with no existing, general solutions. In this paper we address a simpler sub-problem: how to carry out an adapted deployment on a cluster with hundreds of nodes? Efficient use of clusters alone or as part of the Grid is an important issue. We present a deployment model that predicts the maximum throughput of each element of a deployment. Our deployment construction algorithm uses this model to automatically create a mapping of middleware elements onto resources with the goal of maximizing throughput. We apply our approach to automatically deploy a distributed Problem Solving Environment (PSE) on a homogeneous cluster environment. We present experiments comparing the automatically-generated deployment against a number of other reasonable deployments.
• Keywords:

Deployment, Cluster, Middleware, Modeling.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+13p
• Format: Portable Document Format
• Get it
• Revisions: none

### Evaluation of Meta-scheduler Architectures and Task Assignment Policies for High Throughput Computing.

• By: Eddy Caron, Vincent Garonne , Andreï Tsaregorodtsev
• Number: RR2005-27
• Date: May 2005
• Abstract:

In this paper we present a model and simulator for many clusters of heterogeneous PCs belonging to a local network. These clusters are assumed to be connected to each other through a global network and each cluster is managed via a local scheduler which is shared by many users. We validate our simulator by comparing the experimental and analytical results of a M/M/4 queuing system. These studies indicate that the simulator is consistent. Next, we do the comparison with a real batch system and we obtain an average error of 10.5\% for the response time and 12\% for the makespan. We conclude that the simulator is realistic and well describes the behaviour of a large-scale system. Thus we can study the scheduling of our system called \dirac in a high throughput context. We justify our decentralized, adaptive and opportunistic approach in comparison to a centralized approach in such a context.
• Keywords:

Simulation, Model, Multi-clusters platform, Meta-scheduling, Grid Computing.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+11p
• Format: Portable Document Format
• Get it
• Revisions: none

### GDS: an Architecture Proposal for a Grid Data-Sharing Service.

• By: Gabriel Antoniu, Marin Bertier, Luc Bougé, Eddy Caron, Frédéric Desprez, Mathieu Jan, Sébastien Monnet, Pierre Sens
• Number: RR2005-28
• Date: June 2005
• Abstract:

Grid computing has recently emerged as a response to the growing demand for resources (processing power, storage, etc.) exhibited by scientific applications. We address the challenge of sharing large amounts of data on such infrastructures, typically consisting of a federation of node clusters. We claim that storing, accessing, updating and sharing such data should be considered by applications as an external service. We propose an architecture for such a service, whose goal is to provide transparent access to mutable data, while enhancing data persistence and consistency despite node disconnections or failures. Our approach leverages on weaving together previous results in the areas of distributed shared memory systems, peer-to-peer systems, and fault-tolerant systems.
• Keywords:

Data sharing, grid computing, transparent access, mutable data, peer-to-peer systems, fault tolerance, consistency protocols.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+13p
• Format: Portable Document Format
• Get it
• Revisions: none

### Hiérarchies algébriques de classes d'automates cellulaires.

• By: Marianne Delorme, Jacques Mazoyer, Guillaume Theyssier
• Number: RR2005-29
• Date: June 2005
• Abstract:

Cellular automata are a formal model of locally interacting systems which is very simple but suitable to study complex systems in general. Many classifications have been proposed in the literature, often relying on the observation of dynamics. In a first part, we present more recent approaches of algebraic nature based on notions of sub-systems or embeddings. A second part, which is more technical, is dedicated to new results concerning these algebraic tools. This framework allows us to give formal definitions to several intuitive global notions (e.g., universality, particles) and to derive formal proofs of positive results and, more interestingly, of negative results. More precisely, we show that modifying local rules may be more powerful in some sense than increasing the number of states ; then, we illustrate by the construction of an infinite lattice that dynamical universality is incredibly more powerful than usual computation universality.
• Keywords:

Cellular Automata, Complexity, Computation, Universality, Algebraic Classifications, Methods.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+51p
• Format: Portable Document Format
• Get it
• Revisions: none

### Use of A Network Enabled Server System for a Sparse Linear Algebra Grid Application.

• By: Eddy Caron, Frédéric Desprez, Christophe Hamerling, Jean-Yves L'Excellent, Marc Pantel, Chiara Puglisi-Amestoy
• Number: RR2005-30
• Date: June 2005
• Abstract:

Solving systems of linear equations is one of the key operations in linear algebra. Many different algorithms are available in that purpose. These algorithms require a very accurate tuning to minimise runtime and memory consumption. The TLSE project provides, on one hand, a scenario-driven expert site to help users choose the right algorithm according to their problem and tune accurately this algorithm, and, on the other hand, a test-bed for experts in order to compare algorithms and define scenarios for the expert site. Both features require to run the available solvers a large number of times with many different values for the control parameters (and maybe with many different architectures). Currently, only the grid can provide enough computing power for this kind of application. The DIET middleware is the GRID backbone for TLSE. It manages the solver services and their scheduling in a scalable way.
• Keywords:

Grid Computing, Sparse Linear System Solvers, Expert Site Framework.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+16p
• Format: Portable Document Format
• Get it
• Revisions: none

### Off-line and on-line scheduling on heterogeneous master-slave platforms.

• By: Jean-François Pineau, Yves Robert, Frédéric Vivien
• Number: RR2005-31
• Date: July 2005
• Abstract:

In this work, we deal with the problem of scheduling independent tasks on heterogeneous master-slave platforms. We target both off-line and on-line problems, with several objective functions (makespan, maximum response time, total completion time). On the theoretical side, our results are two-fold: (i) For off-line scheduling, we prove several optimality results for problems with release dates; (ii) For on-line scheduling, we establish lower bounds on the competitive ratio of any deterministic algorithm. On the practical side, we have implemented several heuristics, some classical and some new ones derived in this paper. We studied experimentally these heuristics on a small but fully heterogeneous MPI platform. Our results show the superiority of those heuristics which fully take into account the relative capacity of the communication links.
• Keywords:

Scheduling, Master-slave platforms, Heterogeneous computing, On-line, Release dates.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+16p
• Format: Portable Document Format
• Get it
• Revisions: none

### Scheduling network requests with transmission window.

• By: Loris Marchal, Yves Robert, Pascale Vicat-Blanc Primet, Jingdi Zeng
• Number: RR2005-32
• Date: July 2005
• Abstract:

We consider the problem of bulk data transfers and bandwidth sharing in the context of grid infrastructures. Grid computing empowers high-performance computing in a large-scale distributed environment. Network bandwidth, which makes the expensive computational and storage resources work in concert, plays an active role on performance. Due to specific traffic patterns, network topology and application scenarios, bandwidth sharing encounters new challenges. From this perspective, this research report looks at bulk transfers among computing and storage elements. Referred to as short-lived, transfer requests with transmission window and volume are scheduled in the network. By manipulating the transmission window, the request accept rate and network resource utilization are to be optimized. The formulated optimization problem is proven NP-complete. Associated with proposed heuristics, simulations are carried out to study each bandwidth sharing strategy and its application scenarios. A tuning factor, that allows adaptation of performance objective, is introduced to adjust network infrastructure and workload.
• Keywords:

grid computing, network bandwidth sharing, online scheduling, optimization, transmission window, scheduling window.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+13p
• Format: Portable Document Format
• Get it
• Revisions: none

### Register allocation and spill complexity under SSA.

• By: Florent Bouchez, Alain Darte, Christophe Guillon, Fabrice Rastello
• Number: RR2005-33
• Date: August 2005
• Abstract:

This report deals with the problem of choosing which variables to spill during the register allocation phase. Spilling is used when the number of variables is higher than the number of registers, and consists of storing the value of a variable in memory and loading it when necessary. The problem is that instructions dealing with memory are time-consumming. Hence the goal is to minimize the amount of spilled variables, which is a highly studied problem for compiler design, but nevertheless NP-complete. Meanwhile, a program under SSA form has the interesting property that on cases where spill is unnecessary, the problem of register allocation is not anymore NP-complete but polynomial. The interesting question is: can we solve the spilling problem under SSA, come back from SSA by splitting live-ranges as SSA does and finaly use classical register allocator? We show in this report that unfortunately many formulations of the spilling problem are also NP-complete under SSA form. In particular, the node-deletion approach used in most compilers remains NP-complete on most cases even on basic-blocks. The only polynomial solution has a too high complexity in practice. But this first advanced study on the complexity of the spill problem under SSA greatly helps to the understanding and gives directions for polynomial approximations. In this report, we also talk about the problem of splitting variables aggressively as in SSA. This shows the weakness of the "iterated register coalescing" regarding false interferences created by the adding of move instructions. We then implemented in a production compiler for st200 an interference graph based on liveness analysis and value numbering: two variables interfere if at one's definition the other is live and carries another value. The experiments we made and present in this report show that with such an interference graph aggressive splitting is not a problem.
• Keywords:

Compilation, register allocation, spill, coalescing, SSA, NP-completeness, perfect graph.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+9p
• Format: Portable Document Format
• Get it
• Revisions: none

### Optimizing the translation out-of-SSA with renaming constraints.

• By: Fabrice Rastello, Francois de Ferriere, Christophe Guillon
• Number: RR2005-34
• Date: August 2005
• Abstract:

Static Single Assignment form is an intermediate representation that uses phi instructions to merge values at each confluent point of the control flow graph. phi instructions are not machine instructions and must be renamed back to move instructions when translating out of SSA form. Without a coalescing algorithm, the out of SSA translation generates many move instructions. Leung and George use a SSA form for programs represented as native machine instructions, including the use of machine dedicated registers. For this purpose, they handle renaming constraints thanks to a pinning mechanism. Pinning phi arguments and their corresponding definition to a common resource is also a very attractive technique for coalescing variables. In this paper, extending this idea, we propose a method to reduce the phi-related copies during the out of SSA translation, thanks to a pinning-based coalescing algorithm that is aware of renaming constraints. This report provides also a discussion about the formulation of this problem, its complexity and its motivations. We implemented our algorithm in the STMicroelectronics Linear Assembly Optimizer. Our experiments show interesting results when comparing to the existing approaches of Leung and George, Sreedhar et al., and Appel and George for register coalescing.
• Keywords:

Static Single Assignment, Coalescing, NP-complete, K-COLORABILITY, Machine code level, register allocation.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+26p
• Format: Portable Document Format
• Get it

### Non Gaussian and Long Memory Statistical Characterisations for Internet Traffic with Anomalies.

• By: Antoine Scherrer, Nicolas Larrieu, Philippe Owezarski, Pierre Borgnat, Patrice Abryn
• Number: RR2005-35
• Date: September 2005
• Abstract:

The Internet aims at providing a wide range of services for a large variety of applications. Hence, it is highly sensitive to traffic anomalies (e.g., failures, flash crowds,...) as well as to DoS attacks, which are likely to significantly reduce the Quality of Service level. Current intrusion detection systems, specially those based on anomaly detection, are not providing efficient nor satisfactory solutions for DoS attack tracking. This is mainly due to difficulties in distinguishing between strong but legitimate traffic variations and DoS attack induced changes. The goal of this work is to compare relevant statistical characteristics of regular traffic to those of traffic presenting anomalies. To do so, we introduce a non Gaussian long memory model and develop estimators for the corresponding parameters. First, we show that this model relevantly describes Internet traffic for a wide range of aggregation levels, using both a large set of data taken from public reference repositories (Bellcore, LBL, Auckland, UNC, CAIDA) and data collected by ourselves. Second, we show that the proposed model also describes meaningfully traffic with anomalies such as flash crowd and DoS attacks which we generated and collected. We show that the behaviors of the parameters of the model enables us to discriminate between regular and anomalous traffic, and between flash crowds and DoS attacks. We also derive analytically procedures to numerically synthesize realizations of stochastic processes with prescribed non Gaussian marginals and long range dependent covariances. This enables us to validate the relevance and accuracy of our characterization procedures. Finally, we describe various applications based on the proposed model.
• Keywords:

Intrusion detection, DoS attack, Flash Crowd, Non Gaussian Long Range Dependent Process.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+22p
• Format: Portable Document Format
• Get it
• Revisions: none

### Asymptotically Fast Polynomial Matrix Algorithms for Mutivariable Systems.

• By: Claude-Pierre Jeannerod, Gilles Villard
• Number: RR2005-36
• Date: September 2005
• Abstract:

We present the asymptotically fastest known algorithms for some basic problems on univariate polynomial matrices: rank, nullspace, determinant, generic inverse, reduced form. We show that they essentially can be reduced to two computer algebra techniques, minimal basis computations and matrix fraction expansion/reconstruction, and to polynomial matrix multiplication. Such reductions eventually imply that all these problems can be solved in about the same amount of time as polynomial matrix multiplication (up to logarithmic factors and the size of the output).
• Keywords:

Linear algebra, polynomial matrix, matrix rank, matrix determinant, nullspace basis, matrix inversion, matrix reduced form.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+12p
• Format: Portable Document Format
• Get it
• Revisions: none

### Fast and correctly rounded logarithms in double-precision.

• By: Florent de Dinechin, Christoph Lauter, Jean-Michel Muller
• Number: RR2005-37
• Date: September 2005
• Abstract:

This article is a case study in the implementation of a portable, proven and efficient correctly rounded elementary function in double-precision. We describe the methodology used to achieve these goals in the \textttcrlibm library. There are two novel aspects to this approach. The first is the proof framework, and in general the techniques used to balance performance and provability. The second is the introduction of processor-specific optimizations to get performance equivalent to the best current mathematical libraries, while trying to minimize the proof work. The implementation of the natural logarithm is detailed to illustrate these questions.
• Keywords:

floating-point, elementary functions, logarithm, correct rounding.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+14p
• Format: Portable Document Format
• Get it
• Revisions: none

### Basic building blocks for a triple-double intermediate format.

• By: Christoph Quirin Lauter
• Number: RR2005-38
• Date: September 2005
• Abstract:

The implementation of correctly rounded elementary functions needs high intermediate accuracy before final rounding. This accuracy can be provided by (pseudo-) expansions of size three, i.e. a triple-double format. The report presents all basic operators for such a format. Triple-double numbers can be redundant. A renormalization procedure is presented and proven. Elementary functions' implementations need addition and multiplication sequences. These operators must take operands in double, double-double and triple-double format. The results must be accordingly in one of the formats. Several procedures are presented. Proofs are given for their accuracy bounds. Intermediate triple-double results must finally be correctly rounded to double precision. Two effective rounding sequences are presented, one for round-to-nearest mode, one for the directed rounding modes. Their complete proofs constitute half of the report.
• Keywords:

elementary functions, multiple precision, expansions, correct rounding.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+65p
• Format: Portable Document Format
• Get it
• Revisions: none

### High-Radix Floating-Point Division Algorithms for Embedded VLIW Integer Processors.

• By: Claude-Pierre Jeannerod, Saurabh-Kumar Raina and Arnaud Tisserand
• Number: RR2005-39
• Date: September 2005
• Abstract:

This paper presents floating-point division algorithms and implementations for embedded VLIW integer processors. On those processors, there is no hardware floating-point unit, for cost reasons. But, for portability and/or accuracy reasons, a software FP emulation layer is sometime useful. Here, we focus on high-radix digit-recurrence algorithms for FP division on integer VLIW processors such as the ST200 from STMicroelectronics.
• Keywords:

Computer arithmetic, floating-point arithmetic, high-radix division algorithm, integer processor, VLIW processor.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+11p
• Format: Compressed postscript
• Get it
• Revisions: none

### Scheduling with Resource Constraints using Dis-Equations.

• By: Hadda Cherroun, Alain Darte, and Paul Feautrier
• Number: RR2005-40
• Date: September 2005
• Abstract:

Scheduling is an important step in high-level synthesis (HLS). We perform scheduling in two steps: (1) coarse-grain scheduling, in which we take into account the whole control structure of the program including imperfect loop nests, and (2) fine-grain scheduling, where we refine each logical step using a detailed description of the available resources. This paper focuses on the second step. We present an efficient formalism to express resource constraints, using dis-equations (i.e., negations of equations). We give an exact branch-and-bound algorithm, based on variants of Floyd's and Dijkstra's algorithms. We compare this approach with a greedy-scheduling heuristic on pieces of scientific applications from the PerfectClub and the HLSynth95 benchmarks. The results show that both methods are suitable for HLS tools.
• Keywords:

Scheduling, resource constraints, dis-equations, branch-and-bound, Dijkstra.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+20p
• Format: Gzipped postscript
• Get it
• Revisions: none

### An algorithmic toolbox for Network Calculus.

• By: Anne Bouillard, Eric Thierry
• Number: RR2005-41
• Date: September 2005
• Abstract:

Network Calculus offers powerful tools to analyze the performances in communication networks, in particular to obtain determinismic bounds. This theory is based on a strong mathematical ground, notably by the use of (min,+) algebra. However the algorithmic aspects of this theory have not been much addressed yet. This paper is an attempt to provide some efficient algorithms implementing Network Calculus operations for some classical functions. A class of functions which is often used are the piecewise linear functions which have a constant growth from a certain point. As a first step towards algorithmic design, we present a class containing these functions and closed under the Network Calculus operations: it is the piecewise linear functions which are periodic from a certain point. They can be finitely described which enables us to propose some algorithms for each of the Network Calculus operations.
• Keywords:

(min,+)-algebra, Network Calculus, algorithms.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+26p
• Format: Compressed postscript
• Get it
• Revisions: none

### Service Discovery in a peer-to-peer environment for Computational Grids.

• By: Eddy Caron, Frédéric Desprez and Cédric Tedeschi
• Number: RR2005-42
• Date: September 2005
• Abstract:

Grid computing is a form of distributed computing aiming at aggregating available geographically distributed heterogeneous resources for high performace computing. The services of a grid is the set of software components made available by servers to clients. Service discovery is today broadly considered by grid' users as inefficient on large scale dynamic platforms. Today's tools of service discovery does not reach several requirements: flexibility of the discovery, efficiency on wide-area dynamic platforms. Therefore, it becomes crucial to propose new tools coping with such platforms. Emerging peer-to-peer technologies provide algorithms allowing the distribution and the search of files taking into account the dynamicity of the underlying network. This report present a new tool merging grid computing and peer-to-peer technologies for an efficient and flexible service discovery in a dynamic and large scale environment, taking into account the topology of the underlying network.
• Keywords:

grid computing, peer-to-peer technology, service discovery.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+21p
• Format: Compressed postscript
• Get it
• Revisions: none

### Assisted verification of elementary functions.

• By: Florent de Dinechin, Christoph Lauter, Guillaume Melquiond
• Number: RR2005-43
• Date: September 2005
• Abstract:

The implementation of a correctly rounded or interval elementary function needs to be proven carefully in the very last details. The proof requires a tight bound on the overall error of the implementation with respect to the mathematical function. Such work is function specific, concerns tens of lines of code for each function, and will usually be broken by the smallest change to the code (e.g. for maintenance or optimization purpose). Therefore, it is very tedious and error-prone if done by hand. This article discusses the use of the Gappa proof assistant in this context. Gappa has two main advantages over previous approaches: Its input format is very close to the actual C code to validate, and it automates error evaluation and propagation using interval arithmetic. Besides, it can be used to incrementally prove complex mathematical properties pertaining to the C code. Yet it does not require any specific knowledge about automatic theorem proving, and thus is accessible to a wider community. Moreover, Gappa may generate a formal proof of the results that can be checked independently by a lower-level proof assistant like Coq, hence providing an even higher confidence in the certification of the numerical code.
• Keywords:

proof assistant, floating-point, elementary functions, numerical code.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+15p
• Format: Portable Document Format
• Get it
• Revisions: none

### Characterizing Valiant's algebraic complexity classes.

• By: Guillaume Malod and Natacha Portier
• Number: RR2005-44
• Date: September 2005
• Abstract:

Valiant introduced 20 years ago $VP$ and $VNP$: an algebraic analogue to the classes $P$ and $NP$. They are defined via non-uniform sequences of arithmetic circuits and are especially useful to study the complexity of polynomials families. In this paper we gather known results and new techniques under a unifying theme, namely the restrictions imposed upon the gates of the circuit, building a hierarchy from formulas to circuits. As a consequence we get a simpler proofs for known results such as the equality of the classes $VNP$ and $VNP_e$, the completeness of the determinant for $VQP$ or the equivalence of skew and weakly skew arithmetic circuits, and new results such as a characterisation of the classes $VP$ and $VQP$ or a full answer to a conjecture of Bürgisser. We also show that for circuits of polynomial depth and unbounded size these models all have the same expressive power and can be used to characterize a uniform version of $VNP$.
• Keywords:

Valiant's theory, algebraic complexity, arithmetic circuits.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+19p
• Format: Portable Document Format
• Get it
• Revisions: none

### Scheduling multiple bags of tasks on heterogeneous master-worker platforms: centralized versus distributed solutions.

• By: Olivier Beaumont, Larry Carter, Jeanne Ferrante, Arnaud Legrand, Loris Marchal, Yves Robert
• Number: RR2005-45
• Date: September 2005
• Abstract:

Multiple applications that execute concurrently on heterogeneous platforms compete for CPU and network resources. In this paper we consider the problem of scheduling applications to ensure fair and efficient execution on master-worker platforms where the communication is restricted to a tree embedded in the network. The goal of the scheduling is to obtain the best throughput while enforcing some fairness between applications. We show how to derive an asymptotically optimal periodic schedule by solving a linear program expressing all problem constraints. For single-level trees, the optimal solution can be analytically computed. For large-scale platforms, gathering the global knowledge needed by the linear programming approach might be unrealistic. One solution is to adapt the multi-commodity flow algorithm of Awerbuch and Leighton, but it still requires some global knowledge. Thus, we also investigates heuristic solutions using only local information, and test them via simulations. The best of our heuristics achieves the optimal performance on about two-thirds of our test cases, but is far worse in a few cases.
• Keywords:

Parallel computing, scheduling, divisible load, multiple applications, resource sharing.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+33p
• Format: Portable Document Format
• Get it
• Revisions: none

### Threshold effect in algorithmic complexity: NP-hard and linear variants of hypergraph partitioning.

• By: Laurent Lyaudet
• Number: RR2005-46
• Date: September 2005
• Abstract:

This report present an infinite family of combinatorial problems that shows extremely abrupt changes of complexity between neighbours problems. I will show that the problem $P_l^k$ that is a purely constraint-driven variant of hypergraph partitioning is for fixed parameters $l$ and $k$ NP-complete when $k=1$ and is linear when $k\neq1$ on the class of hypergraphs with disjoint hyperedges.
• Keywords:

Hypergraph partitioning, NP-Completeness, Threshold effect.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+14p
• Format: Compressed postscript
• Get it
• Revisions: none

### Managing Data Persistence in Network Enabled Servers.

• By: Eddy Caron , Bruno DelFabbro , Frédéric Desprez , Emmanuel Jeannot , Jean-Marc Nicod
• Number: RR2005-47
• Date: October 2005
• Abstract:

The GridRPC model \citegridrpc is an emerging standard promoted by the Global Grid Forum (GGF)\footnote\urlhttp://www.ggf.org that defines how to perform remote client-server computations on a distributed architecture. In this model data are sent back to the client at the end of every computation. This implies unnecessary communications when computed data are needed by an other server in further computations. Since, communication time is sometimes the dominant cost of remote computations, this cost has to be lowered. Several tools instantiate the GridRPC model such as NetSolve developed at the University of Tennessee, Knoxville, USA, and DIET developed at LIP laboratory, ENS Lyon, France. They are usually called Network Enabled Servers (NES). In this paper, we present a discussion of the data management solutions chosen for these two NES (NetSolve and DIET) as well as experimental results.
• Keywords:

Data peristence, Distributed computing, Network Enabled Servers.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+25p
• Format: Portable Document Format
• Get it
• Revisions: none

### Minimizing the stretch when scheduling flows of biological requests.

• By: Arnaud Legrand, Alan Su, Frédéric Vivien
• Number: RR2005-48
• Date: October 2005
• Abstract:

In this paper, we consider the problem of scheduling comparisons of motifs against biological databanks. This problem lies in the divisible load framework with negligible communication costs. Thus far, very few results have been proposed in this model. We first explain the relationship between this model and the preemptive uni-processor one. After having selected a few relevant metrics (max-stretch and sum-stretch), we show how to extend algorithms that have been proposed in the literature for the uni-processor model to our setting. Then we extensively study the performance of these algorithms in realistic scenarios. Our study clearly suggest an efficient heuristic for each of the two metrics, though a combined optimization is in theory not possible in the general case.
• Keywords:

Bioinformatics, heterogeneous computing, scheduling, divisible load, linear programming, stretch.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+20p
• Format: Portable Document Format
• Get it
• Revisions: none

### Simple Seed Architectures for Reciprocal and Square Root Reciprocal.

• By: M. Ercegovac, J.-M. Muller and A. Tisserand
• Number: RR2005-49
• Date: October 2005
• Abstract:

This report presents a simple hardware architecture for computing the seed values for reciprocal and square root reciprocal. These seeds are used in the initialization of floating-point division and square root software iterations. The proposed solution is based on polynomial approximation with specific coefficients and a table lookup. The obtained architectures lead to small and fast circuits.
• Keywords:

computer arithmetic, reciprocal, square root reciprocal, polynomial approximation, table based method, hardware arithmetic operator, Newton-Raphson iteration.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+19p
• Format: Compressed postscript
• Get it
• Revisions: none

### Automatic Middleware Deployment Planning on Clusters.

• By: Pushpinder Kaur Chouhan, Holly Dail, Eddy Caron, Frédéric Vivien
• Number: RR2005-50
• Date: October 2005
• Abstract:

The use of many distributed, heterogeneous resources as a large collective resource offers great potential and has become an increasingly popular idea. A key issue for these Grid platforms is middleware scalability and how middleware services can best be mapped to the resource platform structure. Optimizing deployment is a difficult problem with no existing general solutions. In this paper we address a simpler sub-problem: how to carry out an adapted deployment on a cluster with hundreds of nodes? Efficient use of clusters alone or as part of the Grid is an important issue. In this paper we present an approach for automatically determining an optimal deployment for hierarchically distributed middleware services on clusters where the goal is to optimize steady-state request throughput. We prove that a complete spanning d-ary tree provides an optimal deployment and we present an algorithm to construct this optimal tree. We use a distributed Problem Solving Environment called DIET to test our approach. We define a performance model for each component of DIET and validate these models in a real-world cluster environment. Additional experiments demonstrate that our approach selects a deployment that performs better than other reasonable deployments.
• Keywords:

Deployment, Cluster, Cluster computing, Network Enabled Server, Steady-state scheduling.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+23p
• Format: Portable Document Format
• Get it

### The impact of heterogeneity on master-slave on-line scheduling.

• By: Jean-François Pineau, Yves Robert, Frédéric Vivien
• Number: RR2005-51
• Date: October 2005
• Abstract:

In this paper, we assess the impact of heterogeneity for scheduling independent tasks on master-slave platforms. We assume a realistic one-port model where the master can communicate with a single slave at any time-step. We target on-line scheduling problems, and we focus on simpler instances where all tasks have the same size. While such problems can be solved in polynomial time on homogeneous platforms, we show that there does not exist any optimal deterministic algorithm for heterogeneous platforms. Whether the source of heterogeneity comes from computation speeds, or from communication bandwidths, or from both, we establish lower bounds on the competitive ratio of any deterministic algorithm. We provide such bounds for the most important objective functions: the minimization of the makespan (or total execution time), the minimization of the maximum response time (difference between completion time and release time), and the minimization of the sum of all response times. Altogether, we obtain nine theorems which nicely assess the impact of heterogeneity on on-line scheduling. These theoretical contributions are complemented on the practical side by the implementation of several heuristics on a small but fully heterogeneous MPI platform. Our (preliminary) results show the superiority of those heuristics which fully take into account the relative capacity of the communication links.
• Keywords:

Scheduling, Master-slave platforms, Heterogeneous computing, On-line.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+20p
• Format: Portable Document Format
• Get it
• Revisions: none

### FIFO scheduling of divisible loads with return messages under the one-port model.

• By: Olivier Beaumont, Loris Marchal, Veronika Rehn, Yves Robert
• Number: RR2005-52
• Date: October 2005
• Abstract:

This paper deals with scheduling divisible load applications on star networks, in presence of return messages. This work is a follow-on of RR2005-21, where the same problem was considered under the 2-port model, where a given processor can simultaneously send and receive messages. Here, we concentrate on the one-port model, where a processor can either send or receive a message at a given time step. The problem of scheduling divisible load on star platforms turns out to be very difficult as soon as return messages are involved. Unfortunately, we have not been able to assess its complexity, but we provide an optimal solution in the special (but important) case of FIFO communication schemes. We also provide an explicit formula for the optimal number of load units that can be processed by a FIFO ordering on a bus network. Finally, we provide a set of MPI experiments to assess the accuracy and usefulness of our results in a real framework.
• Keywords:

Scheduling, divisible load, master-worker platform, heterogeneous cluster, return messages, one-port model.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+23p
• Format: Portable Document Format
• Get it
• Revisions: none

### Analysis and Synthesis of Cycle-Accurate On-Chip Traffic with Long-Range-Dependence.

• By: Antoine Scherrer, Antoine Fraboulet, Tanguy Risset
• Number: RR2005-53
• Date: December 2005
• Abstract:

In this report we propose a stochastic traffic analysis and synthesis methodology adapted to on-chip network traffic. This work confirms and extends recent works (Marculescu et al.~\cite{marculescu-tvlsi}) by providing more precise (cycle-accurate) simulations. Our framework based on System-C simulations is able to model precisely the latency of each request-acknowledge transaction or aggregated throughput taking into account its bursty behavior. Our experiments show that on-chip traffic is non-stationary, with long-range-dependence property and that simulation platforms of systems-on-chip will now need to use advanced statistical models for traffic simulation.
• Keywords:

on-chip traffic, stochastic analysis, long-range-dependance, process synthesis.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+9p
• Format: Portable Document Format
• Get it
• Revisions: none