### Computer validated proofs of a toolset for adaptable arithmetic.

**By:**Marc Daumas, Claire Moreau-Finot, Laurent Théry**Number:**RR2001-01**Date:**January 2001**Abstract:**

- Most existing implementations of multiple precision arithmetic demand that the user sets the precision a priori. A solution is to use the largest precision necessary to reach target accuracy. Some libraries are said adaptable in the sense that they dynamically change the precision of each intermediate operation individually to deliver the target accuracy according to the actual inputs. We present in this text a new adaptable numeric core inspired both from floating point expansions and from on-line arithmetic. The numeric core is cut down to five tools. The first tool that contains many arithmetic operations is proved to be correct. The proofs have been formally checked by the Coq assistant. Developing the proofs, we have formally proved many result published in the literature and we have extended a few of them. This work may let users (i) develop application specific adaptable libraries based on the toolset and / or (ii) write new formal proofs based on the set of validated facts.

**Keywords:**

- Multiple Precision, Expansion, On-line, Formal Proof, Coq.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+28p**Format:**Compressed postscript- Get it

### Parallelization of automatic speech recognition.

**By:**Yahya Ould Mohamed El Hadj, Nathalie Revol**Number:**RR2001-02**Date:**January 2001**Abstract:**

- The automatic recognition of spoken
words is increasingly common, for dictaphone applications, telephone
services or the command of various devices by disabled persons.
In the latter case, a high recognition rate is expected on a
vocabulary of small to medium size. To achieve this goal, the model
must be refined. Thus, both the training stage and the recognition
stage for such
applications can be very time consuming and occasional re-training
may happen. Its parallelization is thus worth considering. In
this paper we present firstly the models we use: the classical
hidden Markov model and another model that takes into account
the prosody of speech, namely the centisecond two-level hidden
Markov model introduced by Meziane.
Then two parallelization strategies are detailed: the first one
simply shares the vocabulary among the processors, the second
one also distributes the model.
%which has to be slightly modified in order to allow this distribution.
Experimental results highlight
the need for a finer load-balancing: an
*a priori*load estimation is presented and is used to statically balance the computational load between the processors. Further experiments have been conducted and exhibit efficiencies higher than 65\% on an architecture composed of 12 Pentium Pro interconnected via Myrinet. Directions for improving further the parallelization are given.

**Keywords:**

- Automatic speech recognition, Hidden Markov model, Centisecond two-level hidden Markov model, Parallelization.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+17p**Format:**Compressed postscript- Get it

### Remote object detection in cluster-based Java.

**By:**Gabriel Antoniu, Philip Hatcher**Number:**RR2001-03**Date:**January 2001**Abstract:**

- Our work combines Java compilation to native code with a run-time library that executes Java threads in a distributed-memory environment with true parallelism. To provide the illusion of a shared memory to Java threads, our Hyperion compiling system has been built on top of DSM-PM2, a portable implementation platform for multithreaded distributed-shared-memory protocols. We have designed, implemented and experimented with two alternative consistency protocols compliant with the Java Memory Model. The protocols have different mechanisms for access detection.We illustrate the effects of the two access-detection techniques with five applications run on two clusters with different interconnection networks: BIP/Myrinet and SISCI/SCI.

**Keywords:**

- Java, Distributed Systems, Java Consistency, JVM, Consistency Protocols, Distributed Shared Memory, DSM, DSM-PM2.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+9p**Format:**Compressed postscript- Get it

### DSM-PM2: A portable implementation platform for multithreaded DSM consistency protocols (extended version).

**By:**Gabriel Antoniu, Luc Bougé**Number:**RR2001-04**Date:**January 2001**Abstract:**

- DSM-PM2 is a platform for designing, implementing and experimenting multithreaded DSM consistency protocols. It provides a generic toolbox which facilitates protocol design and allows for easy experimentation with alternative protocols for a given consistency model. DSM-PM2 is portable across a wide range of clusters. We illustrate its power with figures obtained for different protocols implementing sequential consistency, release consistency and Java consistency, on top of Myrinet, Fast-Ethernet and SCI clusters.

**Keywords:**

- DSM, Multithreading, Consistency Protocols, DSM-PM2, PM2.

**Availability:**Electronic copy only.**Citation:**Gabriel Antoniu and Luc Bougé. DSM-PM2: A portable implementation platform for multithreaded DSM consistency protocols. In Proc. 6th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS '01), San Francisco, April 2001. Held in conjunction with IPDPS 2001. IEEE TCPP.To appear.**Size:**2+19p**Format:**Compressed postscript- Get it

### Efficient Matrix Preconditioners for Black Box Linear Algebra.

**By:**L. Chen, W. Eberly, E. Kaltofen, B.D. Saunders, W.J. Turner, G. Villard**Number:**RR2001-05**Date:**January 2001**Abstract:**

- The main idea of the ``black box'' approach in exact linear algebra is to reduce matrix problems to the computation of minimum polynomials. In most cases preconditioning is necessary to obtain the desired result. Here, good preconditioners will be used to ensure geometrical / algebraic properties on matrices, rather than numerical ones, so we do not address a condition number. We offer a review of problems for which (algebraic) preconditioning is used, provide a bestiary of preconditioning problems, and discuss several preconditioner types to solve these problems. We include new conditioners, new analyses of preconditioner performance, and results on the relations among preconditioning problems and with linear algebra problems. Thus improvements are offered for the efficiency and applicability of preconditioners. The focus is on linear algebra problems over finite fields, but most results are valid for entries from arbitrary fields.

**Keywords:**

- Linear Algebra, Randomized Algorithms, Black Box Matrix, Sparse Matrix, Exact Arithmetic, Finite Fields, Linear Systems, Rank, Preconditioner.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+18p**Format:**Compressed postscript- Get it

### Additive symmetric: the non-negative case.

**By:**Marc Daumas, Philippe Langlois**Number:**RR2001-06**Date:**February 2001**Abstract:**

- An additive symmetric b of a with respect to c satisfies c = (a+b)/2. Existence and uniqueness of such b are basic properties in exact arithmetic that fail when a and b are floating point numbers and the computation of c performed in IEEE-754 like arithmetic. We exhibit and prove conditions on the existence, the uniqueness and the exact correspondence of an additive symmetric when b and c have the same sign.

**Keywords:**

- Floating Point Arithmetic, Additive Symmetric, Correction, IEEE-754 Standard.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+11p**Format:**Compressed postscript- Get it

### Heterogeneous task scheduling : a survey.

**By:**Vincent Boudet**Number:**RR2001-07**Date:**February 2001**Abstract:**

- Scheduling computation tasks on processors is a key issue for high-performance computing. Although a large number of scheduling heuristics have been presented in the literature, most of them target only homogeneous resources. We survey here five low-complexity heuristics for heterogeneous platforms, the Best Imaginary Level (BIL), the Generalized Dynamic Level (GDL), the Critical-Path-on-a-Processor (CPOP), the Heterogeneous Earliest Finish Time (HEFT) and the Partial Completion Time (PCT) algorithms. These five heuristics aim at scheduling directed acyclic weighted task graphs on a bounded number of heterogeneous processors. We compare the performances of the heuristics using four classical testbeds.

**Keywords:**

- Scheduling, Task Graphs, High-Performance, Heterogeneous Platforms, Different-Speed Processors, Mapping.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+28p**Format:**Compressed postscript- Get it

### Leader Election without Compass in Some Hyperbolic and Euclidean Cellular Automata.

**By:**Codrin Nichitiu, Christophe Papazian, Eric Remila**Number:**RR2001-08**Date:**February 2001**Abstract:**

- We present a linear time algorithm for the networking and distributed computing problem of leader election (LE). Given a graph, its vertices represent processors (here finite state machines), and its edges communication lines (here synchronous). The LE problem consists in finding a protocol for a family of graphs such that after iterating it, a vertex, edge or cycle be distinguished by a special state called leader. Here the graphs are only required to be connected, and without holes. We describe the algorithm in full detail on a special class of planar graphs, prove its correctness and show how it extends to other classes.

**Keywords:**

- Distributed Algorithms, Graph Automata, Leader Election, Euclidean Cellular Automata, Hyperbolic Cellular Automata.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+12p**Format:**Compressed postscript- Get it

### Quantifier rank for parity of embedded finite models.

**By:**Hervé Fournier**Number:**RR2001-09**Date:**February 2001**Abstract:**

- We prove some lower bounds for quantifier rank of formulas expressing parity of a finite set I of bounded cardinal embedded in an algebraically closed field or an ordered Q-vector space. We show that these bounds are tight when elements of I are known to be linearly independent. In the second part, we prove that strongly minimal structures with quantifier elimination and zero characteristic differentially closed fields admit the active-natural collapse.

**Keywords:**

- Constraint Databases, Active-Natural Collapse.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+15p**Format:**Compressed postscript- Get it

### Toward an Algorithmic Classification of Cellular Automata Dynamics.

**By:**Nicolas Ollinger**Number:**RR2001-10**Date:**February 2001**Abstract:**

- This paper introduces a new approach to classifying cellular automata dynamics. In the spirit of Mazoyer and Rapaport [7], cellular automata are compared thanks to the worth studying notion of space-time diagrams rescaling. Mazoyer and Rapaport made the choice to consider square shaped rescaling. Here, an abstract logical framework, called bulking, is introduced to explore various possibilities of rescaling shapes. Finally, a particular case of bulking based on geometrical considerations is provided. It induces a classification with a maximal class. In particular, it allows to give a formalization of the notion of intrinsic universality implicitly introduced by Albert and Culik II [1] in 1987.

**Keywords:**

- Cellular Automata, Classification, Simulation, Intrinsic Universality, Games.

**Availability:**Electronic copy only.**Citation:**Submitted to ICALP 2001.**Size:**2+12p**Format:**Compressed postscript- Get it

### Two-States Bilinear Intrinsically Universal Cellular Automata.

**By:**Nicolas Ollinger**Number:**RR2001-11**Date:**March 2001**Abstract:**

- Additive cellular automata, which correspond to linear cellular automata, have been studied in details using algebraic techniques [11,6]. The generalization to families of polynomial cellular automata seems natural. The following step of complexity consists of bilinear cellular automata which study has begun with the work of Bartlett and Garzon [3]. Thanks to bulking techniques [15], the intuitive notions of simulation used in the paper from Bartlett and Garzon [3] are formalized. Moreover, two-states bilinear intrinsically universal cellular automata are constructed. In the case of cellular automata, as in the case of classical dynamical systems, a frontier between decidability and undecidability stands between linear and non-linear systems. This result also answers a question from Bartlett and Garzon [3] of 1995.

**Keywords:**

- Cellular Automata, Bilinearity, Simulation, Intrinsic Universality.

**Availability:**Electronic copy only.**Citation:**Submitted to FCT 2001.**Size:**2+11p**Format:**Compressed postscript- Get it

### Epistemic Logic in Higher Order Logic An experiment with COQ.

**By:**Pierre Lescanne**Number:**RR2001-12**Date:**March 2001**Abstract:**

- We present an experiment on epistemic logic, also called knowledge logic, we have done using COQ. This work involves a formalization in COQ of the epistemic logic which has been checked for adequacy on two puzzles well known in the community. COQ is a proof assistant which implements a higher logic known as the calculus of inductive construction and which provides a convenient framework to embed logics like epistemic logic. We try to draw from this exercise lessons for future works. protocols.

**Keywords:**

- Modal Logic, Logic of Knowledge, Epistemic Logic, Formal Methods, Type Theory, COQ.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+9p**Format:**Compressed postscript- Get it

### The Master-Slave Paradigm with Heterogeneous Processors.

**By:**Olivier Beaumont, Arnaud Legrand, Yves Robert**Number:**RR2001-13**Date:**March 2001**Abstract:**

- In this paper, we revisit the master-slave tasking paradigm in the context of heterogeneous processors. We assume that communications take place in exclusive mode. We present a polynomial algorithm that gives the optimal solution when a single communication is needed before the execution of the tasks on the slave processors. When communications are required both before and after the task processing, we show that the problem is at least as difficult as a problem whose complexity is open. In this case, we present a guaranteed approximation algorithm. Finally, we present asymptotically optimal algorithms when communications are required before the processing of each task, or both before and after the processing of each task. protocols.

**Keywords:**

- Heterogeneous Processors, Master-Slave Tasking, Communication, Matching, Complexity.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+22p**Format:**Compressed postscript- Get it

### Data Allocation Strategies for Dense Linear Algebra on two-dimensional Grids with Heterogeneous Communication Links.

**By:**Olivier Beaumont, Arnaud Legrand, Yves Robert**Number:**RR2001-14**Date:**April 2001**Abstract:**

- In this paper, we study the implementation of dense linear algebra kernels, such as matrix multiplication on 2D grids with homogeneous processors when the communication links between the processors are heterogeneous (i.e. the time to transfer a block of the matrix between two processors depends on these processors). We prove that finding the best allocation of the processors into a grid, with respect to the minimization of the communication overhead, is a NP-complete problem.

**Keywords:**

- Heterogeneous Communications, 2D Grids, Data Distribution, Linear Algebra Kernel.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+21p**Format:**Compressed postscript- Get it

### A Coq toolkit for graph theory.

**By:**Jean Duprat**Number:**RR2001-15**Date:**April 2001**Abstract:**

- As an illustration of the Curry-Howard's isomorphism, it is possible to express both mathematical and computational aspects of the graph theory using the same formalism. In this paper, we give what can be the very beginning of such a presentation. The proof assistant Coq has been used as a guideline all along this work.

**Keywords:**

- Graph Theory, Proof Assistant, Inductive Construction.

**Availability:**Electronic copy only.**Citation:**Submitted to TYPES 2001 Post-Workshop Proceedings.**Size:**2+13p**Format:**Compressed postscript- Get it

### How powerful are infinite time machines?

**By:**Grégory Lafitte**Number:**RR2001-16**Date:**April 2001**Abstract:**

- We study the power of finite machines with
*infinite time*to complete their task. To do this, we define a variant to Wojciechowski automata, investigate their recognition power, and compare them to infinite time Turing machines. Furthermore, using this infinite time, we analyse the ordinals*comprehensible*by such machines and discover that one can in fact go beyond the recursive realm. We conjecture that this is somehow already the case with Wojciechowski automata.

**Keywords:**

- Infinite Time, Finite Automata on Infinite Words, Machine Comprehensible Ordinals.

**Availability:**Electronic copy only.**Citation:**Accepted at FCT 2001.**Size:**2+12p**Format:**Compressed postscript- Get it

### Impact of Interferences on Bandwidth Reservation for Ad Hoc Networks: a First Theoretical Study.

**By:**Karell Bertet, Isabelle Guérin Lassous, Laurent Viennot**Number:**RR2001-17**Date:**April 2001**Abstract:**

- This paper presents a theoretical study on the bandwidth reservation problem for ad hoc networks. The proposed model is based on the spatial reuse and the existence of interferences. We show that in that case, the bandwidth reservation problem is NP-complete and we provide some bounds that compare solutions of the problems derived with greedy heuristics with an optimal one. We conclude with a discussion on the practical aspect of this model and its potential use in a practical protocol.

**Keywords:**

- Bandwidth Reservation, Ad Hoc Networks, Interferences.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+6p**Format:**Compressed postscript- Get it

### Validating polynomial numerical computations with complementary automatic methods.

**By:**Philippe Langlois, Nathalie Revol**Number:**RR2001-18**Date:**May 2001**Abstract:**

- Finite precision computations affect the accuracy of computed
solutions

and sometimes the stability of iterative algorithms. Automatic approaches

exist to control and to reduce these effects. Examples are the CESTAC and

the CENA methods and the more general interval approaches. We focus here

on a complementary use of these methods to localize unstable behavior of

the algorithm, to improve the accuracy of the solutions, to identify and

explain finite precision effects. We present computational experiments on

ill-conditioned polynomial roots approximated with Newton's iteration

that illustrate the well-known influence of coefficient perturbations.

**Keywords:**

- Finite Precision, Automatic Rounding Error Analysis, Newton's Iteration, Polynomial Multiple Roots.

**Availability:**Electronic copy only.**Citation:**Submitted.**Size:**2+12p**Format:**Compressed postscript- Get it

### Une méthode de correction automatique.

**By:**Philippe Langlois**Number:**RR2001-19**Date:**May 2001**Abstract:**

- Correcting methods intend to improve the accuracy of results computed in finite precision. The CENA method processes an automatic correction of the first-order effect of the rounding errors the computation generates. The method provides a corrected result and a bound of the residual error for a class of algorithms we identify. We present the main features of the CENA method and illustrate its interests and limitations with examples.

**Keywords:**

- Automatic Correction, CENA Method, Accuracy, Finite Precision, Floating Point Arithmetic.

**Availability:**Electronic copy only.**Citation:**To appear in Calculateurs Parallèles, Réseaux, et Systèmes Répartis.**Size:**2+14p**Format:**Compressed postscript- Get it

### Reconnaissance parallèle des langages rationnels sur automates cellulaires plans.

**By:**Marianne Delorme, Jacques Mazoyer**Number:**RR2001-20**Date:**May 2001**Abstract:**

- Few results are known on the computational power of 2D cellular automata. In the framework of languages recognition, the main difficulty comes from the multiplicity of possible embeddings of words into the discrete plane. Even a very simple problem as the real-time recognition of rational languages can become tricky. We solve this problem for two special embeddings. This first study brings to the fore that the true question is more the plane nature than the algorithmics .

**Keywords:**

- Cellular Automata, Language Recognition, Real-Time, Space Filling Curves.

**Availability:**Electronic copy only.**Citation:**Submitted to Theoretical Computer Science.**Size:**2+52p**Format:**Compressed postscript- Get it

### Scalability Issues in a Reliable Distributed Video Storage System.

**By:**Alice Bonhomme**Number:**RR2001-21**Date:**May 2001**Abstract:**

- This article studies the design and the performance of a fault-tolerant distributed storage system dedicated to video. We use a completely distributed approach that permit to reduce the memory cost of the reliability scheme. We experimentally point out the scalability properties of this system.

**Keywords:**

- Video Server, Scalability, Striping, Fault Tolerance, Cluster.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+6p**Format:**Compressed postscript- Get it

### The Iso-Level Scheduling Heuristic for Heterogeneous Processors.

**By:**Olivier Beaumont, Vincent Boudet, Yves Robert**Number:**RR2001-22**Date:**May 2001**Abstract:**

- Scheduling computational tasks on processors is a key issue for high-performance computing. Although a large number of scheduling heuristics have been presented in the literature, most of them target only homogeneous resources. We present a new scheduling heuristic for heterogeneous processors, which improves the load-balancing achieved at each decision step while keeping a low complexity. Experimental comparisons with five heuristics taken from the literature (BIL, GDL, CPOP, HEFT and PCT) and using six classical testbeds, show very favorable results.

**Keywords:**

- Heterogeneous Processors, Unrelated Parallel Machines, Scheduling Heuristics, Allocation, Complexity.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+24p**Format:**Compressed postscript- Get it

### Cost Analysis of a Distributed Video Storage System.

**By:**Alice Bonhomme**Number:**RR2001-23**Date:**June 2001**Abstract:**

- We describe the design and the implementation of the CFS (Cluster File System) storage system which is dedicated to video streams. Our goal is to provide a system with the following features: 1) High number of supported streams at a low cost. 2) Transparent management with respect to the clients. 3) Reliability regarding data storage and service continuity. The CFS is implemented on a cluster of PCs connected with a high speed internal network. Its management is fully distributed among the cluster nodes so that there is no central component. Data are stored and retrieved among the cluster nodes using a Streaming RAID strategy, to enhance reliability. We evaluate the overhead introduced by the CFS management with respect to disk performance, by varying the number of nodes involved in file storage, the block size and the reliability level. The impact of the global server load on this overhead is moreover discussed to demonstrate scalability. As matter of fact, the overhead is between 3-15% depending on the configuration.

**Keywords:**

- Video Server, Data Striping, Reliability, Performance Analysis.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+14p**Format:**Compressed postscript- Get it

### Scilab to Scilab// - The OURAGAN Project.

**By:**E. Caron, S. Chaumette, S. Contassot-Vivier, F. Desprez, E. Fleury, C. Gomez, M. Goursat, E. Jeannot, D. Lazure,F. Lombard, J.-M. Nicod, L. Philippe, M. Quinson, P. Ramet,J. Roman, F. Rubi, S. Steer, F. Suter, G. Utard**Number:**RR2001-24**Date:**June 2001**Abstract:**

- In this paper, we present the developments realized in the OURAGAN project around the parallelization of a Matlab-like tool called Scilab. These developments use high performance numerical libraries and different approaches based either on the duplication of Scilab processes or on computational servers. This tool, Scilab//, allows users to perform high level operations on distributed matrices in a metacomputing environment. We also present performance results on different architectures.

**Keywords:**

- Scilab, Parallel Libraries, Computational Servers, CORBA, Data Redistribution.

**Availability:**Electronic copy only.**Citation:**To appear in Parallel Computing.**Size:**2+19p**Format:**Compressed postscript- Get it

### Bandwidth-centric allocation of independent tasks on heterogeneous platforms.

**By:**Olivier Beaumont, Larry Carter, Jeanne Ferrante, Arnaud Legrand, Yves Robert**Number:**RR2001-25**Date:**June 2001**Abstract:**

- In this paper, we consider the problem of allocating a large number of independent, equal-sized tasks to a heterogenerous "grid" computing platform. Such problems arise in collaborative computing efforts like SETI@home. We use a tree to model a grid, where resources can have different speeds of computation and communication, as well as different overlap capabilities. We define a base model, and show how to determine the maximum steady-state throughput of a node in the base model, assuming we already know the throughput of the subtrees rooted at the node's children. Thus, a bottom-up traversal of the tree determines the rate at which tasks can be processed in the full tree. The best allocation is {\em bandwidth-centric}: if enough bandwidth is available, then all nodes are kept busy; if bandwidth is limited, then tasks should be allocated only to the children which have sufficiently small communication times, regardless of their computation power. We then show how nodes with other capabilities --- ones that allow more or less overlapping of computation and communication than the base model --- can be transformed to equivalent nodes in the base model. We also show how to handle a more general communication model. Finally, we present simulation results of several demand-driven task allocation policies that show that our bandwidth-centric method obtains better results than allocating tasks to all processors on a first-come, first serve basis.

**Keywords:**

- Heterogeneous Computer, Allocation, Scheduling, Grid, Metacomputing.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**3+44p**Format:**Compressed postscript- Get it

### Correctly Rounded Exponential Function in Double Precision Arithmetic.

**By:**David Defour, Florent de Dinechin, Jean-Michel Muller**Number:**RR2001-26**Date:**July 2001**Abstract:**

- We present an algorithm for implementing correctly rounded exponentials in double-precision floating point arithmetic. This algorithm is based on floating-point operations in the widespread IEEE-754 standard, and is therefore more efficient than those using multiprecision arithmetic, while being fully portable. It requires a table of reasonable size and IEEE-754 double precision multiplications and additions. In a preliminary implementation, the overhead due to correct rounding is a 2.3 times slowdown when compared to the standard library function.

**Keywords:**

- Exponential, Correct Rounding, Double-Precision.

**Availability:**Electronic copy only.**Citation:**Published in Advanced Signal Processing Algorithms, Architecture and Implementations XI (SPIE), San Diego, USA. August 2001.**Size:**2+21p**Format:**Compressed postscript- Get it

### On digit-recurrence division algorithms for self-timed circuits.

**By:**Nicolas Boullis, Arnaud Tisserand**Number:**RR2001-27**Date:**July 2001**Abstract:**

- The optimization of algorithms for self-timed or asynchronous circuits requires specific solutions. Due to the variable-time capabilities of asynchronous circuits, the average computation time should be optimized and not only the worst case of the signal propagation. If efficient algorithms and implementations are known for asynchronous addition and multiplication, only straightforward algorithms have been studied for division. This paper compares several digit-recurrence division algorithms (speed, area and circuit activity for estimating the power consumption). The comparison is based on simulations of the different operators described at the gate level. This work shows that the best solutions for asynchronous circuits are quite different from those used in synchronous circuits.

**Keywords:**

- Computer Arithmetic, Division Algorithms, SRT Tables, Asynchronous Circuits, Self-Timed Circuits.

**Availability:**Electronic copy only.**Citation:**Published in Advanced Signal Processing Algorithms, Architecture and Implementations XI (SPIE), San Diego, USA. August 2001.**Size:**2+12p**Format:**Compressed postscript- Get it

### Parallel Execution of the Saturated Reductions.

**By:**Benoît Dupont de Dinechin, Christophe Monat, Fabrice Rastello**Number:**RR2001-28**Date:**July 2001**Abstract:**

- This report addresses the problem of improving the execution performance of saturated reduction loops on fixed-point instruction-level parallel Digital Signal Processors (DSPs). We first introduce ``bit-exact'' transformations, that are suitable for use in the ETSI and the ITU speech coding applications. We then present ``approximate'' transformations, the relative precision of which we are able to compare. Our main results rely on the properties of the saturated arithmetic.

**Keywords:**

- Saturated Reductions, Fixed-Point Arithmetic, Fractional Arithmetic, ETSI / ITU Speech Coding, Parallel Reductions, Instruction Level Parallelism.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+15p**Format:**Compressed postscript- Get it

### BRuIT: Bandwidth Reservation under InTerferences influence.

**By:**Claude Chaudet, Isabelle Guerin-Lassous**Number:**RR2001-29**Date:**July 2001**Abstract:**

- This paper deals with the bandwidth reservation problem in ad hoc networks and with the influence that interferences between signals have on it. We show that interferences could decrease the applications rates. This can be a real problem for applications that need guarantees. We propose a distributed protocol (called "BRuIT") for bandwidth reservation in ad hoc networks that takes into account the existence of interferences from far transmissions. The protocol is analyzed through simulations carried out under NS.

**Keywords:**

- Wireless Networks, Ad Hoc Networks, Multihop, Bandwidth Reservation, Quality of Service, Interferences.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+8p**Format:**Compressed postscript- Get it

### Matrix Rank Certification.

**By:**David Saunders, Arne Storjohann, Gilles Villard**Number:**RR2001-30**Date:**August 2001**Abstract:**

- Randomized algorithms are given for computing the rank of a matrix over a field of characteristic zero. The matrix is treated as a black box. Only the capability to compute matrix x column-vector and row-vector x matrix products is used. The methods are exact, sometimes called seminumeric. They are appropriate for example for matrices with integer or rational entries. The rank algorithms are probabilistic of the Las Vegas type; the correctness of the result is guaranteed.

**Keywords:**

- Linear Algebra, Randomized Algorithms, Black Box Matrix, Matrix Rank, Seminumeric Computation.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+6p**Format:**Compressed postscript- Get it

### Formally validated specification of a micro-payment protocol.

**By:**P. Dargenton, D.Hirschkoff, P.Lescanne, E. Pommateau**Number:**RR2001-31**Date:**August 2001**Abstract:**

- In this paper, we develop a formal specification for a micro-payment
protocol, first on paper, then within the Coq proof assistant. Our
approach in defining a notion of
*execution traces*for protocol runs is inspired by previous works, notably by L. Paulson (in the Isabelle/HOL system). However, we show that the protocol we study entails some modifications to Paulson's framework, related to the modeling of the agents' internal state. We moreover take profit from Coq's expressive meta-language to mechanically derive proofs*about*the formalisation itself, by introducing a notion of well-formedness for protocol rules.

**Keywords:**

- Electronic Commerce, Micro-Payment Protocols, Specification, Formal Proof.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+15p**Format:**Compressed postscript- Get it

### Tiling groups for Wang tiles.

**By:**P. Cristopher Moore, Ivan Rapaport, Eric Rémila**Number:**RR2001-32**Date:**September 2001**Abstract:**

- We apply tiling groups and height functions to tilings of regions in the plane by Wang tiles(squares with colored boundaries) where the colors of shared edges must match. We define a set of tiles as unambiguous if it contains all tiles equivalent to the identity in its tiling group. For all but one set of unambiguous tiles with two colors, we give efficient algorithms that tell whether a given region with colored boundary is tileable, show how to sample random tilings, and how to calculate the number of local moves or ``flips'' required to transform one tiling into another. We also analyze the lattice structure of the set of tilings, and study several examples with three and four colors as well.

**Keywords:**

- Tiling, Height Function, Wang Tiles.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+14p**Format:**Compressed postscript- Get it

### A new range reduction algorithm.

**By:**P. David Defour, Peter Kornerup, Jean-Michel Muller, Nathalie Revol**Number:**RR2001-33**Date:**September 2001**Abstract:**

- Range reduction is a key point for getting accurate elementary function routines. We introduce a new algorithm that is fast for input arguments belonging to the most common domains, yet accurate over the full double precision range.

**Keywords:**

- Elementary Functions, Computer Arithmetic, Range Reduction, Floating-Point Arithmetic.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+10p**Format:**Compressed postscript- Get it

### A rank theorem for Vandermonde matrices.

**By:**Pascal Koiran, Natacha Portier, Gilles Villard**Number:**RR2001-34**Date:**September 2001**Abstract:**

- We show that certain matrices built from Vandermonde matrices are of full rank. This result plays a key role in the construction of the ``limit theory of generic polynomials''.

**Keywords:**

- Linear Algebra, Linear Recurrent Sequences, Transcendant Numbers, Model Theory.

**Availability:**Electronic copy only.**Citation:**You can also download an updated version of this report.**Size:**2+7p**Format:**Compressed postscript- Get it

### The limit theory of generic polynomials.

**By:**Pascal Koiran**Number:**RR2001-35**Date:**September 2001**Abstract:**

- We show that the set T of first-order sentences satisfied by all generic polynomials of sufficiently high degree forms a complete theory. As a consequence, complex polynomials of even degree cannot be distinguished from complex polynomials of odd degree by a first-order formula. We ask whether T has an analytic model.

**Keywords:**

- Model Theory, Definability, Intersection.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+14p**Format:**Compressed postscript- Get it

### On randomness and infinity.

**By:**Grégory Lafitte**Number:**RR2001-36**Date:**September 2001**Abstract:**

- In this paper, we investigate
refined definitions of random sequences. Classical definitions have always
the shortcome of making use of the notion of algorithm.
We discuss the nature of randomness and different ways of obtaining
satisfactory definitions of randomness after reviewing previous attempts
at producing a non-algorithmical definition. We present alternative
definitions based on infinite time machines and set theory and explain
how and why randomness is strongly linked to
*strong axioms of infinity*.

**Keywords:**

- Random, Infinite Time Machines, Large Cardinals.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+8p**Format:**Compressed postscript- Get it

### A Realistic Model and an Efficient Heuristic for Scheduling with Heterogeneous Processors.

**By:**Olivier Beaumont, Vincent Boudet, Yves Robert**Number:**RR2001-37**Date:**September 2001**Abstract:**

- Scheduling computational tasks on processors is a key issue for high-performance computing. Although a large number of scheduling heuristics have been presented in the literature, most of them target only homogeneous resources. Moreover, these heuristics often rely on a model where the number of processors is bounded but where the communication capabilities of the target architecture are not restricted. In this paper, we deal with a more realistic model for heterogeneous networks of workstations, where each processor can send and/or receive at most one message at any given time-step. First, we state a complexity result that shows that the model is at least as difficult as the standard one. Then, we show how to modify classical list scheduling techniques to cope with the new model. Next we introduce a new scheduling heuristic which incorporates load-balancing criteria into the decision process of scheduling and mapping ready tasks. Experimental results conducted using six classical testbeds: (LAPLACE, LU, STENCIL, FORK-JOIN, DOOLITTLE, and LDMt) show very promising results.

**Keywords:**

- Heterogeneous Processors, Scheduling, Mapping, Scheduling Heuristics, Complexity.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+18p**Format:**Compressed postscript- Get it

### The Quest for Small Universal Cellular Automata.

**By:**Nicolas Ollinger**Number:**RR2001-38**Date:**October 2001**Abstract:**

- We formalize the idea of intrinsically universal cellular automata, which is strictly stronger than classical computational universality. Thanks to this uniform notion, we construct a new one-dimensional universal automaton with von Neumann neighborhood and only 6 states.

**Keywords:**

- Cellular Automata, Simulation, Intrinsic Universality.

**Availability:**Electronic copy only.**Citation:**Submitted to STACS 2002.**Size:**2+11p**Format:**Compressed postscript- Get it

### Adding Data Persistence and Redistribution to NetSolve.

**By:**F. Desprez, E. Jeannot**Number:**RR2001-39**Date:**December 2001**Abstract:**

- The implementation of Network Enabled Servers (NES) on grid environments requires to lower the cost of communications. NetSolve, a NES environment developed at University of Tennessee Knoxville, sends data back to the client at the end of every computation. This implies unneeded communications when computed data are needed by an other server in further computations. In this paper, we present the modifications we made to the NetSolve protocol in order to overcome this drawback. We have developed a set of new functions and data structures that allow users to order servers to keep data in place and to redistribute them directly to an other server when needed.

**Keywords:**

- Client-Server Distributed Computing, Network Enabled Servers, NetSolve, Data Persistence, Data Redistribution.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+9p**Format:**Compressed postscript- Get it

### Dynamic Performance Forecasting for Network-Enabled Servers in a Heterogeneous Environment.

**By:**Frédéric Desprez, Martin Quinson, Frédéric Suter**Number:**RR2001-40**Date:**November 2001**Abstract:**

- This paper presents a tool for dynamic forecasting of Network-Enabled Servers performance. FAST (Fast Agent's System Timer) is a software package allowing client applications to get an accurate forecast of communication and computation times and memory use in a heterogeneous environment. It relies on low level software packages, i.e., network and host monitoring tools, and some of our developments in computation routines modeling. The FAST internals and user interface are presented and a comparison between the execution time predicted by FAST and the measured time of complex matrix multiplication executed on an heterogeneous platform is given.

**Keywords:**

- Performance Forecasting, Computation Modeling, Resources Monitoring, Heterogeneous Environments, Network-Enabled Servers.

**Availability:**Electronic copy only.**Citation:**Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2001), volume 3, pages 1421-1427. CSREA Press, 25-28 June 2001.**Size:**2+9p**Format:**Compressed postscript- Get it

### Introduction à l'arithmétique par intervalles.

**By:**Nathalie Revol**Number:**RR2001-41**Date:**September 2001**Abstract:**

- This report (written in french) constitutes an introduction to interval arithmetic. This arithmetic allows on the one hand to take into account the measurement uncertainties on data and on the other hand to determine an enclosure of the computed result that is guaranteed to contain it: indeed, the main advantage of interval arithmetic is its reliability. The goal of this introduction is to emphasize the strong points of interval arithmetic and to explain how to alleviate its problems. The main advantage is to provide global information, such as for instance the range of a function over a whole set. This global information can serve to prove that an iteration is contractant and thus that it has a fixed point. It can also be used to detemine the global optimum of a function without being trapped by a local one.

**Keywords:**

- Interval Arithmetic, Reliable Computation, Wrapping Effect, Data Dependency, Rump's Algorithm For Linear Systems, Interval Newton Method for Nonlinear Systems, Hansen's Algorithm for Global Optimization.

**Availability:**Electronic copy only.**Citation:**Version étendue de l'article paru dans Calculateurs Parallèles, Réseaux, et Systèmes Répartis, vol. 13, 2001.**Size:**2+41p**Format:**Compressed postscript- Get it

### Parallelization of the Numerical Lyapunov Calculation for the Fermi-Pasta-Ulam Chain.

**By:**Fabrice Rastello, Thierry Dauxois**Number:**RR2001-42**Date:**November 2001**Abstract:**

- In this paper, we present an efficient and simple solution to the parallelization of discrete integration programs of ordinary differential equations (ODE). The main technique used is known as loop tiling. To avoid the overhead due to code complexity and border effects, we introduce redundant tasks and we use non parallelepiped tiles. Thanks both to cache reuse (x4.3) and coarse granularity (x24.5) , the speedup using 25 processors over the non-tiled sequential implementation is larger than 106. We also present the draft of a fuzzy methodology to optimize the tile size and we illustrate it using real measurements for the communication cost and the execution time. In particular, we observe that the model of communication latencies over a Myrinet network is not as simple as is usually reported. We apply this solution to study the Lyapunov exponents of the Fermi-Pasta-Ulam (FPU) chain and in particular the dependence of the maximum Lyapunov exponents as a function of the length of the chain.

**Keywords:**

- Hierarchical Tiling, Redundant Tasks, Parallelism, Scalability, Cache Optimization, Locality, Tiles Shape, Communication Overhead, Heterogeneous Ressource, Dynamical System Theory, Fermi-Pasta-Ulam Chain, Lyapunov Instability Analysis, Phase Space Properties.

**Availability:**Electronic copy only.**Citation:**Not published yet..**Size:**2+24p**Format:**Compressed postscript- Get it

### The Data Broadcast Problem with Non-Uniform Transmission Times (Revised version).

**By:**Claire Kenyon, Nicolas Schabanel**Number:**RR2001-43**Date:**November 2001**Abstract:**

- The data broadcast problem consists in finding an infinite schedule to broadcast a given set of messages so as to minimize the average service time to clients requesting messages, and the cost of the broadcast. This problem also models the maintenance scheduling problem and the multi-item replenishment problem. Previous work concentrated on a discrete-time restriction where all messages have transmission time equal to 1. Here, we study a generalization of the model to the more realistic setting of continuous time and messages of non-uniform transmission times. We prove that the data broadcast problem is NP-hard, even if the broadcast costs are all zero, and give 3-approximation algorithms for broadcasting messages on a single channel.

**Keywords:**

- Approximation Algorithms, Scheduling, Randomized Algorithms, NP-completeness, Data Broadcast.

**Availability:**Electronic copy only.**Citation:**This article revises the previous LIP research report RR2000-32.**Size:**2+26p**Format:**Compressed postscript- Get it

### Multipartite Tables in JBits for the Evaluation of Functions on FPGA.

**By:**Jérémie Detrey, Florent de Dinechin**Number:**RR2001-44**Date:**November 2001**Abstract:**

- This paper presents a core generator for arbitrary numeric functions on Xilinx Virtex FPGAs. The cores use the state-of-the-art multipartite table method, which allows input and output precisions in the range of 8 to 24 bits on current Virtex chips. The implementation uses the JBits API to embed elaborate optimisation techniques in the description of the hardware.

**Keywords:**

- Function Generator, FPGA Core, Multipartite Method, JBits.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+9p**Format:**Compressed postscript- Get it

### Efficient Parallelization of Line-Sweep Computations.

**By:**Daniel Chavarría-Miranda, Alain Darte, Robert Fowler, John Mellor-Crummey**Number:**RR2001-45**Date:**November 2001**Abstract:**

- Multipartitioning is a strategy for partitioning multi-dimensional arrays on a collection of processors. With multipartitioning, computations that require solving 1D recurrences along each dimension of a multi-dimensional array can be parallelized effectively. Previous techniques for multipartitioning yield efficient parallelizations over 3D domains only when the number of processors is a perfect square. This paper considers the general problem of computing optimal multipartitionings for d-dimensional data volumes on an arbitrary number of processors. We describe an algorithm that computes an optimal multipartitioning for this general case, which enables efficient parallelizations of line-sweep computations under arbitrary conditions. Finally, we describe a prototype implementation of generalized multipartitioning in the Rice dHPF compiler and performance results obtained when using it to parallelize a line sweep computation for different numbers of processors.

**Keywords:**

- Loop Parallelization, Array Mapping, Generalized Latin Squares, High Performance Fortran.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+34p**Format:**Compressed postscript- Get it

### Qulaity of service in mobile ad-hoc networks -- state of the art (Qualité de service et réseaux ad-hoc - un état de l'art).

**By:**Claude Chaudet**Number:**RR2001-46**Date:**November 2001**Abstract:**

- Local wireless networks have known a great success since commercial solutions based on the IEEE 802.11 standard were released. With the important throughput obtained today, these networks now allow the use of complex applications that require guarantees regarding communications bandwidth, delay or jitter. All the work around Mobile IP and UMTS should lead to QoS protocols for wireless networks with access points in which the network is divided into geographical zones managed by base stations. Most of these works cannot be directly transposed to mobile ad-hoc networks in which no structure is available. These networks do not have standard QoS protocols that are suited to their specificities yet. This article presents a state of the art of this subject.

**Keywords:**

- Wireless Networks, Ad-hoc Networks, Quality of Service, QoS.

**Availability:**Electronic copy only.**Citation:**Claude Chaudet. Qualité de service et réseaux ad-hoc - un état de l'art. Colloque sur 'Mobiles-services et réseaux mobiles de 3ème Génération' - Lyon, décembre 2001.**Size:**2+6p**Format:**Compressed postscript- Get it

### Chordal embeddings of planar graph.

**By:**Vincent Bouchitté, Frédéric Mazoit, Ioan Todinca**Number:**RR2001-47**Date:**November 2001**Abstract:**

- Robertson and Seymour conjectured that the treewidth of a planar graph and the treewidth of its geometric dual differ by at most one. Lapoire solved the conjecture in the affirmative, using algebraic techniques. We give here a much shorter proof of this result.

**Keywords:**

- Planar Graphs, Treewidth, Duality.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+14p**Format:**Compressed postscript- Get it

### Lattices of tilings : an extension to figures with holes.

**By:**Eric Rémila**Number:**RR2001-48**Date:**December 2001**Abstract:**

- We first prove that the set of domino tilings of a fixed finite figure is a distributive lattice, even in the case when the figure has holes. Afterwards, we give a geometrical interpretation of the order given by this lattice. We extend these results to other types of tilings (calisson tilings, tilings with bicolored Wang tiles).

**Keywords:**

- Tiling, Height Function, Lattice.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+15p**Format:**Compressed postscript- Get it

### Interval Newton iteration in multiple precision for the univariate case.

**By:**Nathalie Revol**Number:**RR2001-49**Date:**December 2001**Abstract:**

- In this paper, interval arithmetic using an underlying multiple precision arithmetic is briefly presented. Then interval Newton iteration for solving nonlinear equations is introduced. A new Newton's algorithm based on multiple precision interval arithmetic is given, along with its properties: termination, arbitrary accuracy on the computed zeros, automatic and dynamic adaptation of the precision. Finally some experiments illustrate the behaviour of this method.

**Keywords:**

- Multiple Precision Interval Arithmetic, Interval Newton Algorithm, Arbitrary Accuracy, Automatic and Dynamic Adaptation of the Precision.

**Availability:**Electronic copy only.**Citation:**Submitted.**Size:**2+12p**Format:**Compressed postscript- Get it

### Impact of heavy traffic beyond communication range in multi-hops ad-hoc networks.

**By:**Dominique Dhoutaut**Number:**RR2001-50**Date:**December 2001**Abstract:**

- For commercial availability reasons, most actual multi-hops ad-hoc simulations and test-beds are based on IEEE 802.11 standard \cite{80211} and its medium access method CSMA/CA. But this standard has not been designed for that kind of network and presents serious flaws in this context. In this paper, we shall try to highlight problems that can appear in the \textbf{non-direct} neighborhood of important data flows in ad-hoc networks. In some situations, the fairness of the medium access can indeed be very poor. Various techniques could be used to solve this problem. In this paper we present one solution, which only requires minor modifications to the standard.

**Keywords:**

- Ad-hoc Networks, Radio Interferences, Medium Access, 802.11.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+9p**Format:**Compressed postscript- Get it

### Pavages et Bases de Grobner.

**By:**Olivier Bodini**Number:**RR2001-51**Date:**December 2001**Abstract:**

- In this paper, we answer to a question of Grunbaum by proving that, for all set F of polyominoes (union of unit squares of a square lattice), we can find a Z-tiling (signed tile) of polyominoes by copies of elements of F in polynomial time. We use for this the theory of generalised Grobner bases. For instance, we can algorithmicaly find again and extend results of Lagarias and Romero on the topic.

**Keywords:**

- Polyomino, Tiling, Standard Basis.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+31p**Format:**Compressed postscript- Get it

### Generalized Tilings with Height Functions.

**By:**Olivier Bodini, Matthieu Latap**Number:**RR2001-52**Date:**December 2001**Abstract:**

- In this paper, we introduce a generalization of a class of tilings which appear in the literature: the tilings over which a height function can be defined (for example, the famous tilings of polyominoes with dominoes). We show that many properties of these tilings can be seen as the consequences of properties of the generalized tilings we introduce. In particular, we show that any tiling problem which can be modelized in our generalized framework has the following properties: the tilability of a region can be constructively decided in polynomial time, the number of connected components in the undirected flip-accessibility graph can be determined, and the directed flip-accessibility graph induces a distributive lattice structure. Finally, we give a few examples of known tiling problems which can be viewed as particular cases of the new notions we introduce.

**Keywords:**

- Tilings, Height Functions, Tilability, Distributive Lattices, Random Sampling, Potentials, Flows.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+17p**Format:**Compressed postscript- Get it