### A simple test qualifying the accuracy of Horner's rule for polynomials.

**By:**Marc Daumas, Sylvie Boldo**Number:**RR2003-01**Date:**January 2003**Abstract:**

- Polynomials are used in many applications and hidden in libraries such as libm. Whereas the accuracy of the functions used by linear algebra have long been studied, little is available to decide on one scheme to evaluate a polynomial. Common knowledge solely emphasizes that Horner's rule is a good scheme unless the indeterminate is close to one of the polynomial's roots. We propose here a criterion for one step of Horner's scheme to be faithful. A result is defined to be faithful when it was correctly rounded whereas the rounding mode (up, down or to the nearest) cannot be known by the user. Our criterion is checked against the IEEE standard for floating point arithmetic using the Coq automatic proof checker. We then present three programs in Maple, Java and C that check the criterion for a whole polynomial associated with a domain for the indeterminate and a possible truncation error. An example of use is given with the approximation of elementary functions.

**Keywords:**

- Floating Point, IEEE 754 Standard, Horner's Rule, Formal Proof, Coq.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+39p**Format:**Compressed postscript- Get it

### On the Complexity of Polynomial Matrix Computations.

**By:**Pascal Giorgi, Claude-Pierre Jeannerod, Gilles Villard**Number:**RR2003-02**Date:**January 2003**Abstract:**

- We study the link between the complexity of polynomial matrix multiplication and the complexity of solving other basic linear algebra problems on polynomial matrices. By polynomial matrices we mean n x n matrices of degree d over K[x] where K is a commutative field. Under the straight-line program model we show that multiplication is reducible to the problem of computing the coefficient of degree d of the determinant. Conversely, we propose algorithms for minimal approximant computation and column reduction that are based on polynomial matrix multiplication; for the determinant, the straight-line program we give also relies on matrix product over K[x] and provides an alternative to Storjohann's determinant algorithm. We further show that all these problems can be solved in particular in O~(n^w d) operations in K. Here the "soft Oh'' notation O~ indicates some missing log(nd) factors and w is the exponent of matrix multiplication overK.

**Keywords:**

- Matrix Polynomial, Minimal Basis, Column Reduced Form, Matrix gcd, Determinant, Polynomial Matrix Multiplication.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+15p**Format:**Compressed postscript- Get it

### A Hierarchical Resource Reservation Algorithm for Network Enabled Servers.

**By:**Eddy Caron, Frédéric Desprez, Franck Petit and Vincent Villain**Number:**RR2003-03**Date:**January 2003**Abstract:**

- This paper presents the application of the PIF algorithm to a Network Enabled Server environment. Hierarchical scheduling is applied to improve the scalability of the overall architecture and fault tolerance problems are addressed using timers. The simulation shows that gains can be obtained using such a platform over single scheduler approaches.

**Keywords:**

- Distributed computing, hierarchical scheduling, Network Enabled Servers.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+13p**Format:**Compressed postscript- Get it

### Small An optimal algorithm to generate tilings.

**By:**Sébastien Desreux, Eric Rémila**Number:**RR2003-04**Date:**January 2003**Abstract:**

- A lot of progress has been made in tiling theory in the last ten years after Thurston (\cite{Thu90}), building on previous work by Conway and Lagarias (\cite{CL90}), introduced height functions as a tool to encode and study tilings. This allowed the authors of this paper, in previous work (\cite{Rem99}, \cite{Des01}), to prove that the set of lozenge (or domino) tilings of a hole-free, general-shape domain in the plane can be endowed with a distributive lattice structure. In this paper, we see that this structure allows us in turn to construct an algorithm that is optimal with respect to both space and execution time to generate all the tilings of a domain~$D$. We first recall some results about tilings and then we describe the algorithm.

**Keywords:**

- Tiling, Height Function, Flip, Generation Virtex-II Family.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+12p**Format:**Compressed postscript- Get it

### DyRAM : a Reliable Multicast Protocol.

**By:**Moufida Maimour, Cong-Duc Pham**Number:**RR2003-05**Date:**January 2003**Abstract:**

- Group communications (multicast) are foreseen to be one of the most critical yet challenging technologies to meet the exponentially growing demands for data distribution in a large variety of applications of the Internet (grid computing, web applications, distributed simulations\ldots). When reliability is required, there is no straightforward solutions and meeting the objectives of reliable multicast is not an easy task. Active networks open a new perspective in providing more efficient solutions for the problem of reliability. In this context, routers are able to perform customized computations on the packets flowing through them. In this report, we propose a receiver-based (replier) local recovery multicast protocol with dynamic repliers elected on a per-packet basis. Designed to provide an efficient reliable multicast service without any cache facilities inside the network, our approach, uses low-overhead active services in routers. The current report addresses the design, evaluation and the implementation of an efficient and scalable reliable multicast protocol noted DyRAM standin for Dynamic Replier Active reliable Multicast.

**Keywords:**

- Active networks, Reliable multicast, Congestion avoidance, Evaluation, Analysis, Simulation, Implementation.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+31p**Format:**Compressed postscript- Get it

### An Analysis of a Router-based Loss Detection Service for Active Reliable Multicast Protocols.

**By:**Moufida Maimour, Cong-Duc Pham**Number:**RR2003-06**Date:**January 2003**Abstract:**

- Group communications (multicast) are foreseen to be one of the most critical yet challenging technologies to meet the exponentially growing demands for data distribution in a large variety of applications of the Internet (grid computing, video-conferencing, web applications, distributed simulations\ldots). When reliability is required, there is no straightforward solutions and meeting the objectives of reliable multicast is not an easy task. Active networks open a new perspective in providing more efficient solutions for the problem of reliability. In this context, routers are able to perform customized computations (services) on the packets flowing through them. In this paper, we propose a new service consisting in an early loss detection service to be deployed into routers. We also show how the loss detection service can improve the performances (in term of the recovery delays) of an active reliable multicast protocol such as DyRAM making it more suitable for applications requiring low latencies.

**Keywords:**

- Active networks, Reliable multicast, Analysis, Simulation.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+15p**Format:**Compressed postscript- Get it

### AMCA: an Active-based Multicast Congestion Avoidance Algorithm.

**By:**Moufida Maimour, Cong-Duc Pham**Number:**RR2003-07**Date:**January 2003**Abstract:**

- Many works have recently addressed the issue of congestion control for multicast communications and the problem is known to be highly complex. Scalability, responsiveness, stability and fairness with TCP are some of the required properties. In this report, we present a congestion avoidance scheme for bulk data distribution called AMCA (Active-based Multicast Congestion Avoidance algorithm) that tries to meet these properties. We use the active networking technology to perform on a per-section dialogue to probe for available bandwidth along a multicast tree. The solution uses the RTT variations experienced by every branch to estimate the congestion situation in the multicast tree. The physical multicast tree is also used to appropriately aggregate the RTT variations at intermediate nodes before they reach the source. Simulations show that AMCA converges, makes use of the available bandwidth and reacts rapidly to dynamic changes while being TCP-fair. The results also show that losses are very few demonstrating the conservative property of AMCA which reacts to congestion before a loss occurs.

**Keywords:**

- Active Networks, Reliable Multicast, Congestion Avoidance.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+20p**Format:**Compressed postscript- Get it

### Analysis and Improvments of the Memory Usage of a Multifrontal Solver.

**By:**Abdou Guermouche, Jean-Yves L'Excellent, and Gil Utard**Number:**RR2003-08**Date:**February 2003**Abstract:**

- We are concerned with the memory usage of sparse direct solvers. We particularly focus on the influence of state-of-the-art sparse matrix reordering techniques on the dynamic memory usage of a multifrontal solver, MUMPS, and present algorithms to modify the assembly tree traversal that aim at reducing the memory usage. A large number of experiments show the interest of the approach for sequential executions.

**Keywords:**

- Sparse matrices, multifrontal method, assembly tree, reordering techniques, memory, tree traversal.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+40p**Format:**Compressed postscript- Get it

### An Application-Level Network Mapper.

**By:**Arnaud Legrand, Frédéric Mazoit, Martin Quinson**Number:**RR2003-09**Date:**February 2003**Abstract:**

- This paper presents a tool to automatically discover the network topology. The goal is to evaluate the performance of concurrent transfers (for example to improve collective communications) and not to discover the physical machines interconnection scheme (for administration purposes). The problems encountered, preliminary algorithms to solve them, as well as theoretical proof of their validity (under some conditions) are presented.

**Keywords:**

- Grid Computing, Simulation, Network Topology, Communication Performance Prediction.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+15p**Format:**Compressed postscript- Get it

### Complex Division with Prescaling of Operands.

**By:**Milos D. Ercegovac and Jean-Michel Muller**Number:**RR2003-10**Date:**February 2003**Abstract:**

- We adapt the radix-$r$ digit-recurrence division algorithm to complex division. By prescaling the operands, we make the selection of quotient digits simple. This leads to a simple hardware implementation, and allows correct rounding of complex quotient. To reduce large prescaling tables required for radices greater than $4$, we adapt the bipartite-table method to multiple-operand functions.

**Keywords:**

- Computer Arithmetic, Complex Division, Recurrence Division.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+12p**Format:**Compressed postscript- Get it

### Taylor models and floating-point arithmetic: proof that arithmetic operations are validated in COSY.

**By:**Nathalie Revol, Kyoko Makino, Martin Berz**Number:**RR2003-11**Date:**February 2003**Abstract:**

- The goal of this paper is to prove that the implementation of Taylor
models in COSY, based on floating-point arithmetic, computes results
satisfying the ``containment property'', i.e. guaranteed results.

First, Taylor models are defined and their implementation in the COSY software by Makino and Berz is detailed. Afterwards IEEE-754 floating-point arithmetic is introduced. Then the core of this paper is given: the algorithms implemented in COSY for multiplying a Taylor model by a scalar, for adding or multiplying two Taylor models are given and are proven to return Taylor models satisfying the containment property.

**Keywords:**

- Taylor Model, COSY Software, Floating-Point Operation, Rounding Error, Containment Property, Validated Result.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+17p**Format:**Compressed postscript- Get it

### Static load-balancing techniques for iterative computations on heterogeneous clusters.

**By:**Hélène Renard, Yves Robert, Frédéric Vivien**Number:**RR2003-12**Date:**February 2003**Abstract:**

- This paper is devoted to static load balancing techniques for mapping iterative algorithms onto heterogeneous clusters. The application data is partitioned over the processors. At each iteration, independent calculations are carried out in parallel, and some communications take place. The question is to determine how to slice the application data into chunks, and to assign these chunks to the processors, so that the total execution time is minimized. We establish a complexity result that assesses the difficulty of this problem, and we design practical heuristics that provide efficient distribution schemes.

**Keywords:**

- Heterogeneous Clusters, Static Load-balancing Techniques, Communication, Complexity.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+16p**Format:**Compressed postscript- Get it

### A brief survey of the theory of the Pi-calculus.

**By:**Daniel Hirschkoff**Number:**RR2003-13**Date:**February 2003**Abstract:**

- This document collects some
important results about the theory of Milner's pi-calculus and
related formalisms. We present the syntax and semantics of a monadic
calculus, and discuss type systems and the most commonly used
notions of behavioural equivalences. Pi-calculus dialects are also
briefly introduced, as well as several encodings of the
lambda-calculus.

*Nota:*These notes have been used for a course in the winter school on theoretical computer science organised by the Franco-Italian University in Aussois between the 29th of january and the 1st of february, 2003.

**Keywords:**

- Process Algebra, Mobility, Name Passing, Pi-calculus, Lambda-calculus, Type System, Behavioural Equivalence.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+15p**Format:**Compressed postscript- Get it

### More on Modulo 2n -1 Addition.

**By:**Jean-Luc Beuchat**Number:**RR2003-14**Date:**February 2003**Abstract:**

- This brief paper describes an improvement of the FPGA implementation of the modulo $(2^n-1)$ adder investigated in a previous research report.

**Keywords:**

- Computer arithmetic, modulo $2^n-1$ addition, FPGA.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+2p**Format:**Compressed postscript- Get it

### Distributed computing power : from local function to global computing.

**By:**Laurent Bienvenu, Christophe Papazian**Number:**RR2003-15**Date:**March 2003**Abstract:**

- We show here a natural extension of finite graph automata, by allowing each node of a network to store in its memory some pieces of information that are only bounded by the size of the underlying network (like a unique address). Depending of the power of the new local transition function, we show results about the power of the global function computed by the graph automata. The main result is that the global power is always less than the local computing power and even with very powerful local function (non recursive) we can not compute all global functions (even some primitive recursive ones) : Distributed computing is limited by its own structure.

**Keywords:**

- Graph Automata, Computability, Distributed Algorithms.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+10p**Format:**Compressed postscript- Get it

### A RTT-based Partitioning Algorithm for a Multi-rate Reliable Multicast Protocol.

**By:**Moufida Maimour, Cong-Duc Pham**Number:**RR2003-16**Date:**March 2003**Abstract:**

- Various Internet applications involve multiple parties and usually adopt a one-to-many communication paradigm (multicast). The presence of multiple receivers in a multicast session rises the problem of {\it inter-receiver fairness}. Transmitting with a rate which matches the slowest receiver will limit the throughput of other receivers and thus their {\it satisfaction}. A multi-rate mechanism where the receivers are distributed into subgroups with similar capacities, can improve the inter-receiver fairness for multicast sessions. In this report, we deal with the problem of receivers partitioning and propose a simple algorithm based on the receivers RTT variations where an explicit estimation of the receivers capacities is avoided. Our partitioning algorithm, although simple, performs an on-the-fly partitioning depending on the receivers' feedback. We show that our partitioning algorithm approximates and in many cases, achieves the optimal solution with a minimum computation effort.

**Keywords:**

- Partitioning algorithm, Reliable multicast, Heterogeneous receivers.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+12p**Format:**Compressed postscript- Get it

### Load-Balancing Scatter Operations for Grid Computing.

**By:**Stéphane Genaud, Arnaud Giersch, Frédéric Vivien**Number:**RR2003-17**Date:**March 2003**Abstract:**

- We present solutions to statically load-balance scatter operations in parallel codes run on grids. Our load-balancing strategy is based on the modification of the data distributions used in scatter operations. We study the replacement of scatter operations with parameterized scatters, allowing custom distributions of data. The paper presents: 1) a general algorithm which finds an optimal distribution of data across processors; 2) a quicker guaranteed heuristic relying on hypotheses on communications and computations; 3) a policy on the ordering of the processors. Experimental results with an MPI scientific code illustrate the benefits obtained from our load-balancing.

**Keywords:**

- Parallel Programming, Grid Computing, Heterogeneous Computing, Load-balancing, Scatter Operation.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+19p**Format:**Compressed postscript- Get it

### Performance et dynamicité dans les réseaux : l'approche Tamanoir.

**By:**Jean-Patrick Gelas, Laurent Lefèvre**Number:**RR2003-18**Date:**March 2003**Abstract:**

- Les réseaux actifs constituent un domaine en pleine expansion. Bien que différentes approches aient déjà été explorées, privilégiant les aspects sécurité et portabilité, rares sont les projets s'attaquant aux problèmes des hautes performances dans les environnements actifs. En combinant la haute performance et le déploiement de services portables, nous explorons de nouvelles approches capables d'ouvrir des pistes pour le développement des réseaux actifs. Nous proposons ainsi une nouvelle modélisation des réseaux actifs haute performance et sa mise en \oe uvre sous la forme de l'environnement \emph{Tamanoir}. Cet environnement fournit aux utilisateurs la possibilité de déployer et de maintenir dynamiquement des routeurs actifs distribués sur un réseau à grande échelle (WAN) à l'aide d'outils spécifiques. Les premières expérimentations démontrent l'efficacité et la pertinence de notre approche.

**Keywords:**

- Réseaux Actifs, Performance, Environnement d'exécution.

**Availability:**Electronic copy only.**Citation:**Publié à JDIR 2002, Toulouse, 4-6 mars 2002.**Size:**2+10p**Format:**Compressed postscript- Get it

### Towards an IPv6-based Security Framework for Distributed Storage Resources.

**By:**Alessandro Bassi, Julien Laganier**Number:**RR2003-19**Date:**March 2003**Abstract:**

- Some security problems can be often solved through authorization rather than authentication. Furthermore, authorization approach avoids usual drawbacks of centralized systems such as bottlenecks or single point of failure. In this paper, we propose a solution that could bring an appropriate security architecture to the Internet Backplane Protocol (IBP), a distributed shared storage protocol. The three basic building blocks are IPsec, Simple Public Key Infrastructure (SPKI) certificates and Crypto-Based Identifiers (CBID). CBID allows entities to prove ownership of their identifiers, SPKI allows entities to prove that they have been authorized to performs specific actions while IPsec provides data origin authentication and confidentiality. We propose to use them to bring some level of 'opportunistic' security in the absence of any trusted central authority. This is particularly tailored to ad-hoc environments where collaborations might be very short-termed.

**Keywords:**

- IBP, IPv6, IPsec, authorization certificates, SPKI, CBID, CGA.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+9p**Format:**Compressed postscript- Get it

### Dealing with Heterogeneity in an Active-based Multicast Congestion Avoidance Protocol.

**By:**Moufida Maimour, Cong-Duc Pham**Number:**RR2003-20**Date:**March 2003**Abstract:**

- Many of the proposed multicast congestion avoidance algorithms are single-rate where heterogeneity is accommodated by adjusting the transmission rate as a response to the worst receiver in the group. Due to the Internet heterogeneity, a single-rate congestion control affects the overall satisfaction of a multicast session receivers. In this report, we propose a multi-rate replicated scheme where some receivers (instead of the source) are designated to perform data replication for other receivers with lower capacity. To be more scalable and to minimize the bandwidth consumption due to data replication, the partitioning algorithm is performed on-the-fly by the routers depending on the feedbacks they receive. Neither a prior estimation of the receivers capacity is necessary nor a complex computation is required to execute our partitioning algorithm.

**Keywords:**

- Reliable Multicast, Congestion Control, Heterogeneity, Active Networks.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+16p**Format:**Compressed postscript- Get it

### Finding the "truncated" polynomial that is closest to a function.

**By:**Nicolas Brisebarre, Jean-Michel Muller**Number:**RR2003-21**Date:**April 2003**Abstract:**

- When implementing regular enough functions (e.g., elementary or special functions) on a computing system, we frequently use polynomial approximations. In most cases, the polynomial that best approximates (for a given distance and in a given interval) a function has coefficients that are not exactly representable with a finite number of bits. And yet, the polynomial approximations that are actually implemented do have coefficients that are represented with a finite - and sometimes small - number of bits: this is due to the finiteness of the floating-point representations (for software implementations), and to the need to have small, hence fast and/or inexpensive, multipliers (for hardware implementations). We then have to consider polynomial approximations for which the degree $i$ coefficient has at most $m_i$ fractional bits (in other words, it is a rational number with denominator $2^{m_i}$).We provide a method for finding the best polynomial approximation under this constraint.

**Keywords:**

- Computer Arithmetic, Polynomial Approximations.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+10p**Format:**Compressed postscript- Get it

### Performance Analysis of Remote File System Access over High Bandwidth Local Network.

**By:**Brice Goglin, Loïc Prylli**Number:**RR2003-22**Date:**April 2003**Abstract:**

- We study the performance of file servers, comparing NFS implementation in Linux to our experimental lightweight system called ORFA. The aim is to find out NFS bottlenecks in the case of high bandwidth local network. Using a simple protocol without cache allow us to get best performance from the underlying communication subsystem. Our user-level implementation avoids several kernel-level constraints and thus provides better performance than NFS. Moreover we explore several optimization techniques to reduce the server overhead which usually is the bottleneck.

**Keywords:**

- Distributed File System, Remote Access, High Bandwidth Local Network, Cluster, Linux, NFS.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+13p**Format:**Compressed postscript- Get it

### Load-balancing iterative computations in heterogeneous clusters with shared communication links.

**By:**Arnaud Legrand, Hélène Renard, Yves Robert, Frédéric Vivien**Number:**RR2003-23**Date:**April 2003**Abstract:**

- This paper is devoted to mapping iterative algorithms onto heterogeneous clusters. The application data is partitioned over the processors, which are arranged along a virtual ring. At each iteration, independent calculations are carried out in parallel, and some communications take place between consecutive processors in the ring. The question is to determine how to slice the application data into chunks, and to assign these chunks to the processors, so that the total execution time is minimized. One major difficulty is to embed a processor ring into a network that typically is not fully connected, so that some communication links have to be shared by several processor pairs. We establish a complexity result that assesses the difficulty of this problem, and we design a practical heuristic that provides efficient mapping, routing, and data distribution schemes.

**Keywords:**

- Heterogeneous Clusters, Load-balancing, Shared Communication Links, Complexity.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+17p**Format:**Compressed postscript- Get it

### Decidable and undecidable problems about quantum automata.

**By:**Vincent D. Blondel, Emmanuel Jeandel, Pascal Koiran, Natacha Portier**Number:**RR2003-24**Date:**April 2003**Abstract:**

- We study the following decision problem: is the language recognized by a quantum finite automaton empty or non-empty? We prove that this problem is decidable or undecidable depending on whether recognition is defined by strict or non-strict thresholds. This result is in contrast with the corresponding situation for probabilistic finite automata for which it is known that strict and non-strict thresholds both lead to undecidable problems.

**Keywords:**

- Quantum Computing, Automaton, Decidability.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+11p**Format:**Compressed postscript- Get it

### Lattices of Tilings and Stability.

**By:**Nathalie Caspard, Michel Morvan, Eric Remila, Eric Thierry**Number:**RR2003-25**Date:**April 2003**Abstract:**

- Many tiling spaces such as domino tilings of fixed figures have an underlying lattice structure. This lattice structure corresponds to the dynamics induced by flips. In this paper, we further investigate the properties of these lattices of tilings. In particular, we point out a stability property: the set of all the shortest sequences of flips joining to fixed tilings also have a lattice structure close to the lattice of all tilings. We also show that some of these properties also apply to other discrete dynamical systems and more generally may be satisfied by some partially ordered sets. It gives a new perspective on the lattice structure of tiling spaces and enables to deduce some of their properties only by means of partial order theoretical tools.

**Keywords:**

- Tilings, Lattices, Stability of the Tilting Operation, Partial Order Theory.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+13p**Format:**Compressed postscript- Get it

### Dynamics of the Picking transformation on integer partitions.

**By:**Duong Phan Ti, Eric Thierry**Number:**RR2003-26**Date:**April 2003**Abstract:**

- This paper studies a conservative transformation defined on families of finite sets. It consists in removing one element from each set and adding a new set composed of the removed elements. This transformation is conservative in the sense that the union of all sets of the family always remains the same. We study the dynamical process obtained when iterating this transformation on a family of sets and we focus on the evolution of the cardinalities of the sets of the family. This point of view allows to consider the transformation as an application defined on the set of all partitions of a fixed integer (which is the total number of elements in the sets). We show that iterating this particular transformation always leads to a heterogeneous distribution of the cardinalities, where almost all integers within an interval are represented. We also tackle some issues concerning the structure of the transition graph which sums up the whole dynamics of this process for all partitions of a fixed integer.

**Keywords:**

- Discrete Dynamical System, Integer Partitions.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+13p**Format:**Compressed postscript- Get it

### On Poset Sandwich Problems.

**By:**Michel Habib, David Kelly, Emmanuelle Lebhar, Christophe Paul**Number:**RR2003-27**Date:**April 2003**Abstract:**

- A graph $G_s=(V,E_s)$ is a sandwich for a pair of graph $G_t=(V,E_t)$ and $G=(V,E)$ if $E_t\subseteq E_s\subseteq E$. Any poset, or partially ordered set, admits a unique graph representation which is directed and transitive. In this paper we introduce the notion of sandwich poset problems inspired by former sandwich problems on comparability graphs. In particular, we are interested in series-parallel and interval posets which are subclasses of 2-dimensional posets, we describe polynomial algorithms for these two classes of poset sandwich problems and then prove that the problem of deciding the existence of a 2-dimensional sandwich poset is NP-complete.

**Keywords:**

- Analysis of Algorithms, Partially Ordered Sets, Graph Sandwich Problems.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+8p**Format:**Compressed postscript- Get it

### Scheduling tasks sharing files on heterogeneous clusters.

**By:**Arnaud Giersch, Yves Robert, Frédéric Vivien**Number:**RR2003-28**Date:**May 2003**Abstract:**

- This paper is devoted to scheduling a large collection of independent tasks onto heterogeneous clusters. The tasks depend upon (input) files which initially reside on a master processor. A given file may well be shared by several tasks. The role of the master is to distribute the files to the processors, so that they can execute the tasks. The objective for the master is to select which file to send to which slave, and in which order, so as to minimize the total execution time. The contribution of this paper is twofold. On the theoretical side, we establish complexity results that assess the difficulty of the problem. On the practical side, we design several new heuristics, which are shown to perform as efficiently as the best heuristics designed by Casanova et al., although their cost is an order of magnitude lower.

**Keywords:**

- Scheduling, Heterogeneous Clusters, Grid, Independent Tasks, File-sharing, Heuristics.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+27p**Format:**Compressed postscript- Get it

### Steady-state scheduling of task graphs on heterogeneous computing platforms.

**By:**Olivier Beaumont, Arnaud Legrand, Loris Marchal, Yves Robert**Number:**2003-29**Date:**May 2003**Abstract:**

- In this paper, we consider the execution of a complex application on a heterogeneous "grid" computing platform. The complex application consists of a suite of identical, independent problems to be solved. In turn, each problem consists of a set of tasks. There are dependences (precedence constraints) between these tasks. A typical example is the repeated execution of the same algorithm on several distinct data samples. We use a non-oriented graph to model the grid platform, where resources have different speeds of computation and communication. We show how to determine the optimal steady-state scheduling strategy for each processor (the fraction of time spent computing and the fraction of time spent communicating with each neighbor) and how to build such as schedule. This result holds for a quite general framework, allowing for cycles and multiple paths in the platform graph.

**Keywords:**

- Scheduling, Steady State, Task Graphs, Heterogeneous Platforms.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+28 p**Format:**Compressed postscript- Get it

### On Colorations Induced by Discrete Rotations.

**By:**Bertrand Nouvel, Eric Remila**Number:**RR2003-30**Date:**June 2003**Abstract:**

- We consider a non numerable family of colorations induced by discrete rotations. The symbolical dynamical system associated with the coloration is first explained. We introduce then a group that supports the dynamics of the system. The periodical cases are precised, they are induced by Pythagorean triples. Finally, a proof of the quasi-periodicity of the colorations, and a description of asymmetrical colorations concludes this paper.

**Keywords:**

- Discrete Rotations, Discretization , Quasi-periodicity,Discrete Geometry.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+10p**Format:**Compressed postscript- Get it

### Multiplication-addition modulaire: algorithmes itératifs et implantations sur FPGA.

**By:**Jean-Luc Beuchat, Jean-Michel Muller**Number:**RR2003-31**Date:**Juin 2003**Abstract:**

- Cet article présente tout d'abord diverses améliorations d'un algorithme itératif de multiplication modulaire proposé en 1997 par Jeong et Burleson. Une simple modification de la récurrence permet l'implantation d'une multiplication-addition modulaire. Nous montrons ensuite comment réduire d'un facteur deux la taille du circuit lorsque l'opérateur offre le choix du modulo parmi un ensemble $m_1, m_2,\ldots, m_q$. Nous proposons finalement un nouvel algorithme facilitant la réalisation de l'exponentiation modulaire et présentons quelques résultats de synthèse pour des circuits FPGA de la famille Virtex de Xilinx. Pour des nombres de $16$ bits, les opérateurs développés permettent par exemple d'effectuer 6 millions d'opérations à la seconde en utilisant uniquement 17 tranches.

**Keywords:**

- Arithmétique des Ordinateurs, Multiplication Modulaire, FPGA.

**Availability:**Electronic copy only.**Citation:**Soumis à SympAAA 2003.**Size:**2+7p**Format:**Compressed postscript- Get it

### Multiple Precision Interval Packages: Comparing Different Approaches.

**By:**Markus Grimmer, Knut Petras, Nathalie Revol.**Number:**RR2003-32**Date:**June 2003**Abstract:**

- We give a survey on packages for multiple precision interval arithmetic, with the main focus on three specific packages. One is within a Maple environment, intpakX, and two are C/C++ libraries, GMP-XSC and MPFI. We discuss their different features, present timing results and show several applications from various fields, where high precision intervals are fundamental.

**Keywords:**

- Multiple Precision Interval Arithmetic, Packages, Ease of use, Efficiency, Reliability.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+24p**Format:**Compressed postscript- Get it

### Optimizing the steady-state throughput of scatter and reduce operations on heterogeneous platforms.

**By:**Arnaud Legrand, Loris Marchal, Yves Robert**Number:**RR2003-33**Date:**June 2003**Abstract:**

- In this paper, we consider the communications involved by the execution of a complex application, deployed on a heterogeneous ``grid'' platform. Such applications intensively use collective macro-communication schemes, such as scatters, personalized all-to-alls or gather/reduce operations. Rather than aiming at minimizing the execution time of a single macro-communication, we focus on the steady-state operation. We assume that there is a large number of macro-communication to perform in pipeline fashion, and we aim at maximizing the throughput, i.e. the (rational) number of macro-communications which can be initiated every time-step. We target heterogeneous platforms, modeled by a graph where resources have different communication and computation speeds. The situation is simpler for series of scatters or personalized all-to-alls than for series of reduces operations, because of the possibility of combining various partial reductions of the local values, and of interleaving computations with communications. In all cases, we show how to determine the optimal throughput, and how to exhibit a concrete periodic schedule that achieves this throughput.

**Keywords:**

- Scheduling, Steady-state, Collective communications, Heterogeneous platforms.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+26p**Format:**Compressed postscript- Get it

### Optimizing the steady-state throughput of Broadcasts on heterogeneous platforms.

**By:**Arnaud Legrand, Olivier Beaumont, Loris Marchal , Yves Robert**Number:**RR2003-34**Date:**June 2003**Abstract:**

- In this paper, we consider the communications involved by the execution of a complex application, deployed on a heterogeneous ``grid'' platform. Such applications extensively use macro-communication schemes, for example to broadcast data items. Rather than aiming at minimizing the execution time of a single broadcast, we focus on the steady-state operation. We assume that there is a large number of messages to be broadcast in pipeline fashion, and we aim at maximizing the throughput, i.e. the (rational) number of messages which can be broadcast every time-step. We target heterogeneous platforms, modeled by a graph where resources have different communication and computation speeds. Achieving the best throughput may well require that the target platform is used in totality: we show that neither spanning trees nor DAGs are as powerful as general graphs. We show how to compute the best throughput using linear programming, and how to exhibit a periodic schedule, first when restricting to a DAG, and then when using a general graph. The polynomial compactness of the description comes from the decomposition of the schedule into several broadcast trees that are used concurrently to reach the best throughput. It is important to point out that a concrete scheduling algorithm based upon the steady-state operation is asymptotically optimal, in the class of all possible schedules (not only periodic solutions).

**Keywords:**

- Scheduling, Steady-state, Broadcast, Heterogeneous Platforms.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+30p**Format:**Compressed postscript- Get it

### Optimizing the translation out-of-SSA with renaming constraints.

**By:**F. Rastello, F. de Ferrière, C. Guillon**Number:**RR2003-35**Date:**June 2003**Abstract:**

- Static Single Assignment form is an intermediate representation,
that uses phi-functions to merge values at each confluent points of the control
flow graph. phi-functions are not machine instructions and should be renamed
back to
`move`operations when translating out-of-SSA form. Without a coalescing algorithm, out-of-SSA translation generates many`move`instructions. In this paper we propose an extension of the algorithm of Leung and George to minimize the phi-related copies during the out-of-SSA translation. Leung et al. constructed SSA form for programs represented as native machine instructions, including the use of machine dedicated registers. For this purpose, the out-of-SSA translation contains renaming constraints that are represented using a pinning principle. Pinning the phi-function arguments and their corresponding definition to a common resource is a very attractive technique for coalescing variables, even if this is not a true minimization: this article presents a renaming-constraints aware and pinning-based coalescing algorithm. Even without renaming constraints, the`move`instructions minimization problem is still considered an open issue. This article provides also a discussion about the formulation of this problem, its complexity and its motivations. Finally, we implemented our algorithm in the STMicroelectronics Linear Assembly Optimizer. This provides many interesting results when comparing several possible approaches. We also explain, using hand crafted examples, the limitations of Leung's, Sreedhar's and classical register coalescing algorithms.

**Keywords:**

- Static Single Assignment, Coalescing, NP-complete, K-COLORABILITY, Machine code level, Register allocation.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+24p**Format:**Compressed postscript- Get it
**Revisions: New version**

### On the complexity of computing determinants.

**By:**Erich Kaltofen, Gilles Villard**Number:**RR2003-36**Date:**July 2003**Abstract:****Keywords:****Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+35p**Format:**Compressed postscript- Get it

### CR-LIBM: The evaluation of the exponential.

**By:**C. Daramy, D. Defour, F. de Dinechin, J.-M. Muller**Number:**RR2003-37**Date:**July 2003**Abstract:**

- We present a new elementary function library, called CR-LIBM. This library implements the various functions defined by the Ansi99 C standard. It provides correctly rounded functions. When writing this library, our primarily goal was to certify correct rounding, and make it reasonably fast, and with a low utilisation of memory. Hence, our library can be used without any problem on real-scale problems. We are also giving the proof and the elements to understand the implementation of the exponential function of CR-LIBM.

**Keywords:**

- Elementary Functions, Exponential, CRlibm, correct rounding.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+37p**Format:**Compressed postscript- Get it

### AMCA: A linear algorithm for real-time scheduling with optimal energy use.

**By:**Bruno Gaujal, Nicolas Navet, Cormac Walsh**Number:**RR2003-38**Date:**July 2003**Abstract:**

- We present an algorithm for scheduling a set of non-recurrent tasks (or jobs) with real-time constraints so as to minimize the total energy consumption on a dynamically variable voltage processor. Our algorithm runs in linear time and is thus an improvement over the classical algorithm of Yao et al. It was made possible by considering the problem as a shortest path problem. We also propose an algorithm for the case where the processor possesses only a limited number of clock frequencies. We extend this algorithm to provide the minimum number of speed changes, which is important when the speed switching overhead cannot be neglected. All our algorithms are linear in the number of tasks if the arrivals and deadlines are sorted and need $O(N\log N)$ time otherwise and these complexities are shown optimal. We extend the results to fluid tasks and non-convex cost functions. An interesting by-product of this study is that it furnishes a linear time feasibility test for a set of non-recurrent tasks scheduled under EDF. This is an improvement over the classical one.

**Keywords:**

- Real-Time Systems, Low-Power, Scheduling, Dynamic Voltage Scaling.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+20p**Format:**Compressed postscript- Get it

### Quantum automata and algebraic groups.

**By:**Harm Derksen, Emmanuel Jeandel, Pascal Koiran**Number:**RR2003-39**Date:**July 2003**Abstract:**

- We show that several problems which are known to be undecidable for probabilistic automata become decidable for quantum finite automata. Our main tool is an algebraic result of independent interest: we give an algorithm which, given a finite number of invertible matrices, computes the Zariski closure of the group generated by these matrices.

**Keywords:**

- Quantum Automata, Probabilistic Automata, Undecidability, Algebraic Groups, Algebraic Geometry.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+15p**Format:**Compressed postscript- Get it

### Opérateurs itératifs de multiplication-addition modulaire pour FPGA.

**By:**Jean-Luc Beuchat, Jean-Michel Muller**Number:**RR2003-40**Date:**August 2003**Abstract:**

- Cet article présente tout d'abord diverses améliorations d'un algorithme itératif de multiplication modulaire proposé en 1997 par Jeong et Burleson. Une simple modification de la récurrence permet l'implantation d'une multiplication-addition modulaire. Nous montrons ensuite comment réduire d'un facteur deux la taille du circuit lorsque l'opérateur offre le choix du modulo parmi un ensemble $m_1, m_2,\ldots, m_q$. Nous proposons un nouvel algorithme facilitant la réalisation de l'exponentiation modulaire et présentons quelques résultats de synthèse pour des circuits FPGA de la famille Virtex de Xilinx. Pour des nombres de $16$ bits, les opérateurs développés permettent par exemple d'effectuer 6 millions d'opérations à la seconde en utilisant uniquement 17 tranches. Ces différents opérateurs nécessitent toutefois de petites tables dépendant du modulo $m$ choisi au moment de l'écriture du code VHDL. Afin de remédier à cet inconvénient, nous proposons une méthode récursive du calcul des tables s'effectuant parallèlement aux premières itérations de la multiplication modulaire.

**Keywords:**

- Arithmétique des ordinateurs, multiplication modulaire, FPGA.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+13p**Format:**Compressed postscript- Get it

### Scheduling Divisible Loads on Star and Tree Networks: Results and Open Problems.

**By:**Olivier Beaumount, Henri Casanova , Arnaud Legrand, Yves Robert, Yang Yang**Number:**RR2003-41**Date:**September 2003**Abstract:**

- Applications in many scientific and engineering domains are structured in large numbers of independent tasks with low granularity. These applications can thus be naturally parallelized, typically in master-worker fashion, provided that efficient scheduling strategies are available. Such applications have been called divisible loads because a scheduler may divide the computation among worker processes arbitrarily, both in terms of number of tasks and of task sizes. Divisible load scheduling has been an active area of research for the last twenty years. A vast literature offers results and scheduling algorithms for various models for the underlying distributed computing platform. Broad surveys are available that report on accomplishments in the field. By contrast, in this paper we propose a unified theoretical perspective that synthesizes previously published results, several novel results, and open questions, in a view to foster novel divisible load scheduling research. Specifically, we discuss both one-round and multi-round algorithms, and we restrict our scope to the popular star and tree network topologies, which we study with both linear and affine cost models for communication and computation.

**Keywords:**

- Parallel Computing, Scheduling, Divisible Load.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+19p**Format:**Compressed postscript- Get it

### Automatic deployment of the Network Weather Service using the Effective Network View.

**By:**Arnaud Legrand, Martin Quinson**Number:**RR2003-42**Date:**September 2003**Abstract:**

- The monitoring infrastructure constitutes a key component of any Grid middleware. The Network Weather Service (NWS) is the most commonly used tool to fulfill this need. Unfortunately, users have to deploy the NWS manually, which can be very tedious and error-prone.% This paper introduces a method based on the Effective Network View (ENV) network mapper to automatically deploy of NWS using the deployment on our lab's LAN as lead.

**Keywords:**

- Network Weather Service, Effective Network View, Application-level topology.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+18p**Format:**Compressed postscript- Get it

### Graph encoding of 2D-gon tilings.

**By:**Frederic Chavanon, Matthieu Latapy, Michel Morvan and Laurent Vuillon**Number:**RR2003-43**Date:**September 2003**Abstract:**

- 2D-gons tilings with parallelograms are the main model used in physics to study quasicrystals, and they are also important in combinatorics for the study of aperiodic structures. In this paper, we study the graph induced by the adjacency relation between tiles. This relation can been used to encode simply and efficiently 2D-gon tilings for algorithmic manipulation. We show for example how it can be used to sample random 2D-gon tilings.

**Keywords:**

- Tilings, Quasicrystals, Data structures.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+17p**Format:**Compressed postscript- Get it

### Contractions of octagonal tilings with rhombic tiles.

**By:**Frederic Chavanon, Eric Remila**Number:**RR2003-44**Date:**September 2003**Abstract:**

- We prove that the space of rhombic tilings of a fixed octagon can be given a canonical order structure. We make a first study of this order, proving that it is a graded poset. As a consequence, we obtain the diameter of the space and a lower bound for the distance between tilings.

**Keywords:**

- Tilings, Order theory.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+12p**Format:**Compressed postscript- Get it

### Nombre de solutions dans une binade de l'equation A^2+B^2=C^2+C.

**By:**Jean-Michel Muller, Jean-Louis Nicolas, Xavier-François Roblot**Number:**RR2003-45**Date:**October 2003**Abstract:**

- Let us denote by Q(N,\la) the number of solutions of the diophantine equation A^2+B^2=C^2+C satisfying N\le A\le B\le C \le \la N-\frac 12. We prove that, for \la fixed and N\to \iy, there exists a constant \al(\la) such that Q(N,\la)=\al(\la)N+O_\la\left(N^{7/8}\log N\right). When \la=2, Q(2^{n-1},2) counts the number of solutions of A^2+B^2=C^2+C with the same number, n, of binary digits; these solutions are interesting in the problem of computing the function (a,b) \to \sqrt{a^2+b^2} in radix-2 floating-point arithmetic. By elementary arguments, (N,\la) can be expressed in terms of four sums of the type S(u,v;f)=\sum_{\substack{u\le d\le v\\ d \text{ odd}}} \left(\sum_{\substack{1\le A\le f(d) \\ 4A^2\equiv -1 \hspace{-1mm}\pmod{d}}} 1\right) where u and v are real numbers and f: [u,v]\longrightarrow \R is a function. These sums are estimated by a classical, but deep, method of number theory, using Fourier analysis and Kloosterman sums. This method is effective, and, in the case \la=2, a precise upper bound for |Q(N,\la)-\al(\la)N| is given.

**Keywords:**

- Correct Rounding, Diophantine Equations.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+34p**Format:**Compressed postscript- Get it

### Rigid mixin modules.

**By:**Tom Hirschowitz**Number:**RR2003-46**Date:**October 2003**Abstract:**

- Mixin modules are a notion of modules that allows cross-module recursion and late binding, two features missing in ML-style modules. Unfortunately, they are easier to define in a call-by-name setting. In a call-by-value setting, mixin modules tend to conflict with the usual static restrictions on recursive definitions. Moreover, the semantics of instantiation has to specify an order of evaluation, which involves a difficult design choice. Previous proposals rely on the dependencies between components to compute a valid order of evaluation. In such systems, mixin module types must carry some information about the dependencies between their components, which makes them rather impractical. In this paper, we propose a new design for mixin modules in a call-by-value setting, which avoids this problem. The formalism we obtain is much simpler than previous notions of mixin modules, although slightly less powerful.

**Keywords:**

- Programming Languages, Semantics, Typing, Modularity, Mixin Modules.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+26p**Format:**Compressed postscript- Get it

### A new bound on the 2-dimension of partially ordered sets.

**By:**Eric Thierry**Number:**RR2003-47**Date:**October 2003**Abstract:**

- This paper provides a new upper bound on the 2-dimension
of partially ordered sets. The 2-dimension of an ordered set P is the smallest

cardinality of a set S such that there exists an order-embedding of P into the boolean lattice 2^S (all the subsets of S ordered by inclusion). The proof is non-contructive and uses a probabilistic argument. We link the result and the proof with two known theorems of the theory of ordered sets.

**Keywords:**

- Partially Ordered Set, 2-dimension, Order-embedding, Boolean Lattice.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+5p**Format:**Compressed postscript- Get it

### Study of a non intrusive and accurate method for measuring the end-to-end useful bandwidth.

**By:**Mathieu Goutelle, Pascale Vicat-Blanc/Primet**Number:**RR2003-48**Date:**October 2003**Abstract:**

- Studies and tools development for applications sensitive to data rates is a very active research field for distributed application performance optimization. The research community works to propose tools for measuring the end-to-end performance of a link between two hosts. Delay measurements provide a first approximation but aren't sufficient enough because the delay isn't a relevant metric. A bandwidth evaluation method would give a more realistic view of the raw capacity but also of the dynamic behaviour of the interconnection, when we want to evaluate the transfer time of an amount of data. Among all the existing methods, there are some differences according to the measurements strategies and the evaluated metric. We first describe the available bandwidth measurements and then the total capacity measurements approaches. Among all the presented methods, none of them can evaluate both metrics, while giving an overview of the link topology. By using a hop-by-hop packet pair method, we show that we can provide such informations with a fine analysis of the measurements. In this report, we detail our proposition of a solution for an hop-by-hop measurement of the capacity and available bandwidth. This method has been validated in simulation, then implemented in Linux and validated experimentally. We compare this method with others to define its limits and the future utilisations on the newly developed tool.

**Keywords:**

- IP Network Measurements, Capacity Evaluation, Available Bandwidth, Packet Pair Method, Tracerate.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+24p**Format:**Compressed postscript- Get it

### Scheduling tasks sharing files from distributed repositories.

**By:**Arnaud Giersch, Yves Robert, Frédéric Vivien**Number:**RR2003-49**Date:**October 2003**Abstract:**

- This paper is devoted to scheduling a large collection of independent tasks onto a large distributed heterogeneous platform, which is composed of a set of servers. Each server is a processor cluster equipped with a file repository. The tasks to be scheduled depend upon (input) files which initially reside on the server repositories. A given file may well be shared by several tasks. For each task, the problem is to decide on which server to execute it, and to transfer the required files (those which the task depends upon) to that server repository. The objective is to find a task allocation, and to schedule the induced communications, so as to minimize the total execution time. The contribution of this paper is twofold. On the theoretical side, we establish complexity results that assess the difficulty of the problem. On the practical side, we design several new heuristics, including an extension of the \texttt{min-min} heuristic to the decentralized framework, and several lower cost heuristics, which we compare through extensive simulations.

**Keywords:**

- Scheduling, heterogeneous clusters, grid, independent tasks, file-sharing, heuristics.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+29p**Format:**Compressed postscript- Get it

### Effective lower and upper bounds for the Fourier coefficients of powers of the modular invariant j.

**By:**Nicolas Brisebarre, Georges Philibert**Number:**RR2003-50**Date:**November 2003**Abstract:**

- Using an elementary approach based on
careful handlings of Cauchy integrals, we give
precise effective lower and upper bounds for the Fourier
coefficients of powers of the modular invariant
*j*. Moreover, we adapt an old result of Rademacher to get a convergent series expansion of these Fourier coefficients and we show that this expansion allows to find again these estimates. Our results improve previous ones by K. Mahler and O. Herrmann. In particular, we show that the Fourier coefficients of*j*are smaller than their asymptotically equivalent given by Petersson and Rademacher.

**Keywords:**

- Modular Function, Modular Invariant, Fourier Coefficients, Circle Method, Hankel Function.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+28p**Format:**Compressed postscript- Get it

### Automatic Deployment for Hierarchical Network Enabled Server.

**By:**Eddy Caron, Pushpinder Kaur Chouhan, Arnaud Legrand**Number:**RR2003-51**Date:**November 2003**Abstract:**

- This paper focus on the deployment of grid infrastructures, more specifically Problem Solving Environments (PSE) for numerical applications on the grid. Even if the deployment of such an architecture is forced by physical constraints (firewall, access permission, security, ...) its efficiency heavily depends on the quality of the mapping between its different components and the grid resources. This paper proposes a new model based on linear programming to estimate the performance of a deployment of a hierarchical PSE. The advantages of the modeling approach in this case are multiple: evaluate a virtual deployment before an actual deployment, provide a decision builder tool (i.e., designed to compare different architectures or buy new resource), take into account the platform scalability. Using this modeling, it is possible to determine the bottleneck of the platform and thus to know whether a given deployment can be improved or not. We illustrate this modeling by applying this results to an existing hierarchical PSE called DIET.

**Keywords:**

- Deployment, Grid computing, Network Enabled Server, Steady-state scheduling, Resource localization and selection.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+10p**Format:**Compressed postscript- Get it

### From Heterogeneous Task Scheduling to Heterogeneous Mixed Data and Task Parallel Scheduling.

**By:**Frédéric Suter, Henri Casanova, Frédéric Desprez, Vincent Boudet**Number:**RR2003-52**Date:**November 2003**Abstract:**

- Mixed-parallelism, the combination of data- and task-parallelism, is a powerful way of increasing the scalability of entire classes of parallel applications. Exploiting both types of parallelism simultaneously makes it possible to deploy these applications on platforms comprising multiple compute clusters, which have become increasingly popular in the last decade. However, high performance application executions are only possible if effective scheduling strategies are available. While multi-cluster platforms are predominantly heterogeneous, previous work on mixed-parallel application scheduling targets only homogeneous platforms. In this paper we develop a method for extending existing scheduling algorithms for task-parallel applications on heterogeneous platforms to the mixed-parallel case. After detailing the foundations of our method and our assumptions, we present a case study in which we generate a mixed-parallel version of the popular HEFT scheduling algorithm, which we evaluate with an extensive set of simulation experiments.

**Keywords:**

- Mixed Parallelism, heterogeneous task scheduling.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+15p**Format:**Compressed postscript- Get it

### An algorithm for finding entire solutions of systems of difference equations.

**By:**Nicolas Brisebarre**Number:**RR2003-53**Date:**November 2003**Abstract:**

- We present an algorithm that computes the entire solutions of systems
of two difference equations and of systems of one differential
equation and one difference equation, all with complex polynomials
coefficients. The problem of the determination of such solutions arose in
the field of diophantine approximation. Our algorithm, which uses
previous works by Abramov and Petkovsek, allows also to determine, for
each of the systems considered, all the solutions of the form
*R*, with_{1}(z) e^{d1z}+ ... + R_{s}(z) e^{dSz}*d*in_{1}, ..., d_{s}**C**and*R*in_{1}(z), ..., R_{s}(z).**C**(z)

**Keywords:**

- Difference Equations, Exponential
Polynomials, Exponential Fractions, Linear Differential
Equations,
*P*-recursive Functions, Quasirational Functions.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+28p**Format:**Compressed postscript- Get it

### A correctly rounded implementation of the exponential function on the Intel Itanium architecture.

**By:**Christoph Quirin Lauter**Number:**RR2003-54**Date:**November 2003**Abstract:**

- This article presents an efficient implementation of a correctly rounded exponential function in double precision on the Intel Itanium processor family. This work combines advanced processor features (like the double-extended precision fused multiply-and-add units of the Itanium processors) with recent research results giving the worst-case precision needed for correctly rounding the exponential function. We give and prove an algorithm which returns a correctly rounded result (in any of the four IEEE-754 rounding modes) within 172 machine cycles on the Intel Itanium 2 processor. This is about four times slower than the less accurate function present in the standard Intel mathematical library. The evaluation is performed in one phase only and is therefore fast even in the worst case, contrary to other implementations which use a multilevel strategy used by Ziv and Defour: We show that the worst-case required precision of 157 bits can always be stored in the sum of two double-extended floating-point numbers. Another algorithm is given with a 92 cycles execution time, but its proof has to be formally completed.

**Keywords:**

- Correct rounding, IEEE-754, exponential, Itanium.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+54p**Format:**Compressed postscript- Get it

### Searching for optimal paths in long-range contact networks.

**By:**Emmanuelle Lebhar, Nicolas Schabanel**Number:**RR2003-55**Date:**November 2003**Abstract:**

- Since Milgram experiment in 1967, that demonstrated the ability of people to find short paths efficiently in networks, based only on their own local view of the network, different models have been proposed to study the ``small world phenomenon''. In 2000, Kleinberg shows that most of the previously known models for small world fail to capture an important feature of the phenomenon: no local information based (i.e., decentralized) algorithm can find short paths in these graphs, when they exists. He then introduces a model composed of a lattice (representing the local acquaintances) augmented with directed links (symbolizing long-range contacts) distributed harmonically. He proposes to model people behavior by the greedy algorithm that forwards the message to the contact (local or long-range) of the current holder, which is the closest to its destination. He shows that this greedy algorithm computes paths of expected length $\Theta(\log^2n)$ between any pair of nodes. The present paper questions the choice of the greedy strategy to model social behavior. We proposes a new strategy in which nodes consult their acquaintances near by, before deciding where to forward the message. Our algorithm presents the same computational characteristics as Kleinberg's original algorithm: uses $\Theta(\log n)$ bits of memory and visits $O(\log^2n)$ nodes. However, it computes much shorter paths, of expected length $O(\log n(\log \log n)^2)$, between any pair of nodes. This algorithm demonstrates that consulting their acquaintances near by, leads to considerable improvements in performances. It also shows that this consultation is less and less useful as one gets closer to the target. As far as we know, this is the first algorithm to ``break the $\Theta(\log^2 n)$ barrier'' for the paths length in Kleinberg's network.

**Keywords:**

- Algorithms on Random Structures, Routing, Small World and Social Behavior Models.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+8p**Format:**Compressed postscript- Get it

### Transparent Remote File Access through a Shared Library Client.

**By:**Brice Goglin, Loïc Prylli**Number:**RR2003-56**Date:**December 2003**Abstract:**

- This paper presents the implementation of the ORFA client. ORFA aims at providing an efficient access to remote file systems through high-speed local networks such as Myrinet. The ORFA client is a lightweight shared library that may be pre-loaded to override standard file access routines to allow remote file access for any legacy application. In ORFA, virtual file descriptors have been designed to support POSIX behavior such as file sharing semantics so that remote files may be accessed and manipulated as local files. Local file access routines may still be used without any incompatibility with other libraries that modify their standard behavior. Finally, a network abstraction layer has been implemented in ORFA to efficiently use asynchronous interfaces such as GM without suffering from memory registration requirements.

**Keywords:**

- Remote File Access, Shared Library, Dynamic Linking, Transparency, Memory Registration, Myrinet, Linux.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+14p**Format:**Compressed postscript- Get it