2000 research reports

### Matrix-Matrix Multiplication on Heterogeneous Platforms.

• By: Olivier Beaumont, Vincent Boudet, Fabrice Rastello, Yves Robert
• Number: RR2000-02
• Date: January 2000
• Abstract:
In this paper, we address the issue of implementing matrix-matrix multiplication on heterogeneous platforms. We target two different classes of heterogeneous computing resources: heterogeneous networks of workstations, and collections of heterogeneous clusters. Intuitively, the problem is to load balance the work with different-speed resources while minimizing the communication volume. We formally state this problem and prove its NP-completeness. Next we introduce a (polynomial) column-based heuristic, which turns out to be very satisfactory: we derive a theoretical performance guarantee for the heuristic, and we assess its practical usefulness through MPI experiments.
• Abstract (in french):
Dans ce rapport, nous nous intéressons au problè me de l'implémentation du produit matrice-matrice sur des plateformes hété rogènes. Nous considérons deux sortes de ressources de calculs hété rogènes: des réseaux de stations hétérogènes et des collections de clusters hétérogènes. Intuitivement, le problème est d'équilibrer la charge sur ces ressources de vitesses différentes tout en minimisant le volume des communications. Après avoir correctement formulé le problème, nous établissons sa NP-complétude. Ensuite nous présentons une heuristique (polynomiale) qui donne en pratique des résultats très satisfaisant : nous garantissons une performance théorique pour l'heuristique et nous prouvons son utilité pratique grace à des expèriences MPI.
• Keywords:
Heterogeneous Resources, Cluster, Different-Speed Processors, Load Balancing, Data Distribution, Data Allocation.
• Keywords (in french):
Ressources Hétérogènes, Cluster, Processeurs de Vitesses Différentes, Distribution des Données, Equilibrage de Chargess.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+28p
• Format: Compressed postscript
• Get it

### Circuits versus trees in algebraic complexity.

• By: Pascal Koiran
• Number: RR2000-03
• Date: January 2000
• Abstract:
This survey is devoted to some aspects of the P = NP?'' problem over the real numbers and more general algebraic structures. We argue that given a structure M, it is important to find out whether NP_M problems can be solved by polynomial depth computation trees, and if so whether these trees can be efficiently simulated by circuits. Point location, a problem of computational geometry, comes into play in the study of these questions for several structures of interest.
• Abstract (in french):
Cet article de synthèse est consacré à certains aspects du problème P=NP ?'' sur les nombres réels ou plus généralement sur des structures algébriques arbitraires. Nous expliquons pourquoi, étant donnée une structure M, il est important de savoir si les problèmes NP_M peuvent être résolus par des arbres de calcul de profondeur polynomiale; et si c'est le cas de savoir si ces arbres peuvent être simulés efficacement par des circuits. Le problème de la localisation de point, issu de la géométrie algorithmique, intervient dans l'étude de ces questions pour plusieurs structures intéressantes.
• Keywords:
Circuits, Trees, Algebraic Complexity, Point Location.
• Keywords (in french):
Circuits, Arbres, Complexité Algébrique, Localisation de Point.
• Availability: Electronic copy only.
• Citation: STACS 2000, invited paper.
• Size: 2+22p
• Format: Compressed postscript
• Get it

### Damage Spreading and $\mu$-sensitivity on CA.

• By: Bruno Martin
• Number: RR2000-04
• Date: February 2000
• Abstract:
We show relations between new notions on cellular automata based on topological and measure-theoretical concepts: almost everywhere sensitivity to initial condition for Besicovitch pseudo-distance, damage spreading (which measure the information (or damage) propagation) and the destruction of the initial configuration information. Through natural examples, we illustrate the links between these formal definitions and Wolfram's empirical classification.
• Abstract (in french):
Nous étudions les relations qui existent entre des notions nouvelles sur les automates cellulaires basées sur des concepts topologiques et de theorie de la mesure : la sensibilité aux conditions initiales presque partout pour la pseudo-distance de Besicovich, la propagation d'information et la destructions de l'information contenue dans la configuration initiale. Nous illustrons à l'aide d'exemples naturels les liens qui existent entre ces definitions formelles et la classification empirique de Wolfram.
• Keywords:
Cellular Automaton, Besicovitch Topology, Damage Spreading, Entropy.
• Keywords (in french):
Automate Cellulaire, Topologie de Besicovitch, Propagation d'Information, Entropie.
• Availability: Electronic copy only.
• Citation: Submitted to TCS2000
• Size: 2+17p
• Format: Compressed postscript
• Get it

### Direct proofs of strong normalisation in calculi of explicit substitutions.

• By: Daniel Dougherty , Pierre Lescanne
• Number: RR2000-05
• Date: February 2000
• Abstract:
This paper is part of a general programme of treating explicit substitutions as the primary $\lambda$-calculi from the point of view of foundations as well as applications. Here we investigate the property of strong normalization. To date all the proofs of strong normalization of typed calculi of explicit substitutions use a reduction to the strong normalization of classical $\lambda$-calculus via the so-called preservation of strong normalization'' property. This paper develops a new approach, namely a direct proof that the strongly normalizing terms are precisely those typable under the intersection-types discipline. We also define an effective perpetual strategy for the general calculus, give an inductive definition of the strongly normalizing terms, and furthermore show that normalization properties are essentially unaffected by the inclusion of a rule for garbage collection. A key role is played by a certain general combinatorial lemma relating the reduction properties of two interacting abstract reductions, which we feel is of interest in its own right.
• Abstract (in french):
Cet article fait partie d'un programme plus général visant à traiter les substitutions explicites comme le sont en général les lambda-calculs aussi bien dans la théorie fondamentale que dans les applications. Ici nous analysons, de ce point de vue, la propriété de normalisation forte. Jusqu'à présent, toutes les preuves de normalisation forte des calculs typés de substitutions explicites s'appuyaient sur une réduction à la normalisation forte du lambda-calcul à travers la propriété dite de préservation de la normalisation forte''. Cet article, quant à lui, développe une approche nouvelle, à savoir une preuve directe que les termes fortement normalisables sont précisément les termes qui sont typables par un système de types avec intersection. Mais dans cet article, nous définissons aussi une stratégie perpétuelle effective pour un calcul général de substitutions explicites, nous donnons une définition inductive des termes fortement normalisables et enfin nous montrons que les propriétés de normalisation ne sont pas essentiellement affectées par l'adjonction d'une règle de glanage de cellules. Dans ces démonstrations, un lemme combinatoire général relie deux réductions abstraites qui interagissent et joue un rôle clé dans la preuve de normalisation forte ; nous pensons que ce lemme a un intérêt par lui-même.
• Keywords:
Explicit Substitutions, Lambda-Calculus, Strong Normalisation, Type Systems, Intersection Types Termination, Perpetual Strategy.
• Keywords (in french):
Substitutions Explicites, Lambda Calcul, Normalisation Forte, Terminaison, Systèmes de Types, Types avec Intersection, Stratégie Perpétuelle.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+20p
• Format: Compressed postscript
• Get it

### The Infinite Versions of LogSpace!=P Are Consistent with the Axioms of Set Theory.

• By: Gregory Lafitte , Jacques Mazoyer
• Number: RR2000-06
• Date: February 2000
• Abstract:
We consider the infinite versions of the usual computational complexity questions LogSpace?=P, NLogSpace?=P by studying the comparison of their descriptive logics on infinite partially ordered structures rather than restricting ourselves to finite structures. We show that the infinite versions of those famous class separation questions are consistent with the axioms of set theory and we give a sufficient condition on the complexity classes in order to get other such relative consistency results.
• Abstract (in french):
Nous considérons les versions infinies des questions usuelles de complexité LogSpace?=P, NLogSpace?=P en étudiant, sur les structures infinies partiellement ordonnées, la comparaison de leurs logiques, les décrivants, au lieu de se limiter aux structures finies. Nous montrons que les versions infinies de ces fameuses questions de séparation sont consistantes avec la théorie des ensemble et nous donnons une condition suffisante sur les classes de complexité comparées pour obtenir les mêmes résultats avec d'autres classes de complexité.
• Keywords:
Computational Complexity, Large Cardinals, Relative Consistency Results.
• Keywords (in french):
Complexité de Calcul, Grands Cardinaux, Résultats de Consistance Relative.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+9p
• Format: Compressed postscript
• Get it

### Implementing Java consistency using a generic, multithreaded DSM runtime system.

• By: Gabriel Antoniu , Luc Bouge , Philip Hatcher , Mark MacBeth , Keith McGuigan , Raymond Namyst
• Number: RR2000-07
• Date: February 2000
• Abstract:
This paper describes Hyperion, an environment for executing Java programs on clusters of computers. To provide high performance, the environment compiles Java bytecode to native code and supports the concurrent execution of Java threads on multiple nodes of a cluster. The implementation uses the PM2 distributed, multithreaded runtime system. PM2 provides lightweight threads and efficient inter-node communication. It also includes a generic, distributed shared memory layer (DSM-PM2) which allows the efficient and flexible implementation of the Java memory consistency model. This paper includes preliminary performance figures for our implementation of Hyperion/PM2 on clusters of Linux machines connected by SCI and Myrinet.
• Abstract (in french):
Nous présentons Hyperion, un environnement permettant l'exécution de programmes Java sur une grappes de PCs. pour permettre une exécution efficcace, cet environnement compile le bytecode Java en code natif et supporte l'exécution concurrente des threads Java sur les différents noeuds de la grappe. Hyperion utilise l'environnement multithread distribué PM2. PM2 fournit des mécanismes efficaces permettant l'utilisation des threads et dispose d'une librarie de communication portable et peformante. PM2 fournit aussi une couche de mémoire virtuellement partagée (DSM-PM2) qui peut être configurée pour permettre l'implantation du modèle de cohérence Java. Ce papier présente quelques mesures préliminaires de performance de notre implantation Hyperion/PM2 durs des grappes de PCs sous Linux interconnectés par SCI et Myrinet.
• Keywords:
Parallel Java Compiling, DSM, Java Consistency, Multithreading, PM2.
• Keywords (in french):
Compilation Java Parallèle, DSM, Cohérence Java, Multithreading, PM2.
• Availability: Electronic copy only.
• Citation: Gabriel Antoniu, Luc Bougé, Philip Hatcher, Mark MacBeth, Keith McGuigan, and Raymond Namyst. Implementing Java consistency using a generic, multithreaded DSM runtime system. In Parallel and Distributed Processing. Proc. Intl Workshop on Java for Parallel and Distributed Computing., Lect. Notes in Comp. Science, Cancun, Mexico, May 2000. Held in conjunction with IPDPS 2000. IEEE TCPP and ACM, Springer-Verlag. To appear.
• Size: 2+10p
• Format: Compressed postscript
• Get it

### Compiling multithreaded Java bytecode for distributed execution.

• By: Gabriel Antoniu , Luc Bouge , Philip Hatcher , Mark MacBeth , Keith McGuigan , Raymond Namyst
• Number: RR2000-08
• Date: February 2000
• Abstract:
Our work combines Java compilation to native code with a run-time library that executes Java threads in a distributed-memory environment. This allows a Java programmer to view a cluster of processors as executing a single Java virtual machine. The system we present is available on top of several UNIX systems and can use a large variety of network protocols thanks to the high portability of the run-time system. To evaluate our approach, we compare serial C, serial Java, and multithreaded Java implementations of a branch-and-bound solution to the minimal-cost map-coloring problem. All measurements have been carried out on three platforms using different network protocols: SISCI/SCI, BIP/Myrinet and TCP/Myrinet.
• Abstract (in french):
Notre travail se situe dans le cadre la compilation des programmes Java en code natif, en utilisant un exécutif multithreadé permettant une exécution en environnement distribué. Cette approche permet au programmeur de voir une grappe de PCs comme une seule JVM. Les mémoires distribuées des processeurs sont cachés sous l'apparence d'un espace partagé unique. Le système que nous présentons est disponible au-dessus de plusieurs systèmes UNIX et utilise un grand nombre de protocoles réseau grâce à la portabilité du support d'exécution. Pour évaluer notre approche, nous comparons différentes implémentations (séquentielle, parallèlle multithreadée) d'un algorithme branch-and-bound calculant le coloriage d'une graphe avec un nombre minimal de couleurs. Les mesures ont été réalisées sur 3 platformes: SISCI/SCI, BIP/Myrinet et TCP/Myrinet.
• Keywords:
Parallel Java Compiling, DSM, Java Consistency, Multithreading, PM2.
• Keywords (in french):
Compilation Java Parallèle, DSM, Cohérence Java, Multithreading, PM2.
• Availability: Electronic copy only.
• Citation: Submitted for publication.
• Size: 2+12p
• Format: Compressed postscript
• Get it

### Further reducing the redundancy of a notation over a minimally redundant digit set.

• By: Marc Daumas , David W. Matula
• Number: RR2000-09
• Date: March 2000
• Abstract:
Redundant notations are used implicitly or explicitly in many digital designs. They have been studied in details and a general framework is known to reduce the redundancy of a notation down to the minimally redundant digit set. We present here an operator to further reduce the redundancy of such a representation. It does not reduce the number of allowed digits since removing one digit to a minimally redundant digit set is a conversion to a non redundant digit set and this is an expensive operation. Our operator introduces some correlation between the digits to reduce the number of possible redundant notations for any represented number. This reduction is visible in small useful operators like the elimination of leading zeros. We also present a key application with a CMOS Booth recoded multiplier. Our multiplier is able to accept both a redundant or a non redundant input with very little modifications and almost no penalty in time or space compared to state-of-the-art non redundant multipliers.
• Abstract (in french):
Les notations redondantes sont utilisées de façon implicite ou explicite dans de nombreux circuits numériques. Elles ont été étudiées en détail et des méthodes générales sont connues pour réduire la redondance jusqu'à atteindre un ensemble de chiffres redondant minimal. Nous présentons ici un opérateur pour réduire encore la redondance d'une telle notation. Il ne réduit pas le nombre de chiffres autorisés car supprimer un chiffre à une notation sur un ensemble de chiffres déjà redondant minimum est une conversion vers une notation non redondante et une opération longue. Notre opérateur introduit des correlations entre les chiffres pour réduire le nombre des notations redondantes possibles pour chaque nombre représenté. Cette réduction a un effet visible pour de petits opérateurs tels que l'élimination de zéros non significatifs en tête d'un mot. Nous présentons aussi une application importante avec un multiplieur CMOS basé sur le codage de Booth. Notre multiplieur est capable de traiter une entrée redondante ou non redondante avec très peu de modifications et sensiblement aucune pénalité en vitesse et en taille comparé à un multiplieur capable uniquement de traiter des nombres non redondants.
• Keywords:
Redundant Notation, Language, Multiplication, Computer Arithmetic, Computer Architecture, CMOS Technology.
• Keywords (in french):
Notation Redondante, Langage, Multiplication, Arithmétique des Ordinateurs, Architecture des Ordinateurs, Technologie CMOS.
• Availability: Electronic copy only. Paper copy available from the INRIA RR 3886.
• Citation: Not published yet.
• Size: 2+14p
• Format: Compressed PostScript
• Get it

### Partitioning a Square into Rectangles: NP-Completeness and Approximation Algorithms.

• By: Olivier Beaumont , Vincent Boudet , Fabrice Rastello , Yves Robert
• Number: RR2000-10
• Date: March 2000
• Abstract:
In this paper, we deal with two geometric problems arising from heterogeneous parallel computing: how to partition the unit square into p rectangles of given area s_1, s_2,..., s_p (such that the sum of the s_i is equal to 1), so as to minimize (i) either the sum of the p perimeters of the rectangles (ii) or the largest perimeter of the p rectangles. For both problems, we prove NP-completeness and we introduce approximation algorithms.
• Abstract (in french):
Dans ce rapport, nous nous intéressons à deux problèmes géométriques issus de calculs parallèles hétérogèns : comment découper le carré unité en p rectangles d'aires donnés s_1, s_2,...,s_p (tel que la somme des s_i soit égale à 1), de manière à minimiser (i) soit la somme des périmètres des p rectangles (ii) soit le plus grand périmètre de ces p rectangles. Pour les deux problèmes, nous établissons leur NP-complétude et nous introduisons des algorithmes d'approximation.
• Keywords:
Heterogeneous Resources, Load-Balancing, Communication Cost, Parallel Computing, Partitioning, NP-Completeness, Geometric Problems.
• Keywords (in french):
Ressources Hétérogènes, Equilibrage de Charge, Cout de Communication, Calcul Parallèle, Découpage, NP-Complétude, Problèmes Géométriques.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+25 p
• Format: Compressed postscript
• Get it

### Variable Reliability Protocol: A protocol with a tunable loss tolerance for high performance over a WAN.

• By: Alexandre Denis
• Number: RR2000-11
• Date: March 2000
• Abstract:
Applications often have to chose between the slow TCP and the unreliable UDP. Robin Kravets from the GaTech has proposed an alternative: the Variable Reliability Protocol (VRP). The applications can specify what the allowable data losses are, and the protocol guarantees that the given loss tolerance parameters are respected. It should result in a faster throughput over WAN which have typically a higher loss rate. In this paper we describe how VRP has been improved, implemented, and integrated into Nexus, the communication library of the meta-computing environment Globus.
• Abstract (in french):
Les applications doivent jusqu'à présent choisir entre TCP, fiable mais lent, et UDP, rapide mais non-fiable. Robin Kravets du GaTech a proposé une alternative~: le protocole à fiabilité variable (Variable Reliability Protocol -- VRP). Les applications peuvent spécifier quelles sont les pertes tolérables dans les données, et le protocole garantit que les paramètres de tolérance de pertes seront respectés. Il en résulte un débit plus élevé sur les WAN qui ont typiquement un taux de pertes élevé. Nous décrivons dans cet article comment nous l'avons amélioré, implémenté et intégré à Nexus, la couche de communication de l'environnement de meta-computing Globus.
• Keywords:
TCP, UDP, Loss Tolerance, Network Protocol, Globus, Nexus, WAN, Internet.
• Keywords (in french):
TCP, UDP, Tolérance de Pertes, Protocole Réseau, Globus, Nexus, WAN, Internet.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+29p
• Format: Compressed postscript
• Get it

### Sparse NP-complete problems of the reals with addition.

• By: Herve Fournier
• Number: RR2000-12
• Date: March 2000
• Abstract:
An analog of Mahaney's Theorem was shown, stating that there is no sparse NP-complete problem over the reals with addition and equality - here sparse is defined in terms of dimension. We extend this to the case of the Turing reduction, then turn our attention toward the ordered case where it is shown that such sparse Turing-complete languages do exist. At last, we formulate a conjecture concerning the case of the many-one reduction.
• Abstract (in french):
Il existe un analogue du theorème de Mahaney, qui montre qu'il n'existe pas de problème NP-complet creux sur les réels avec addition et égalité - creux est ici défini en termes de dimension. Nous étendons ici ce résultat au cas de la réduction Turing, puis nous nous intéressons aux réels avec addition et ordre. Dans le cas de la réduction Turing, nous sommes en mesure d'exhiber un problème complet creux alors que cette même question reste ouverte pour la réduction many-one.
• Keywords:
Sparse Sets, Real Complexity, Knapsack.
• Keywords (in french):
Problèmes Creux, Complexité Réelle, Sac à Dos.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+5p
• Format: Compressed postscript
• Get it

### Transfer theorems via sign conditions.

• By: Pascal Koiran
• Number: RR2000-13
• Date: March 2000
• Abstract:
We show that P = PSPACE implies the collapse of the boolean polynomial hierarchy over any structure which admits efficient enumeration of sign conditions''. This fairly rich class of structures contains in particular R and C.
• Abstract (in french):
Nous montrons que si P = PSPACE la hiérarchie polynomiale booléenne s'effondre pour toute structure vérifiant la propriétée d'énumération efficace des conditions de signes. Cette classe de structures assez riche contient en particulier R et C.
• Keywords:
Algebraic Complexity, Polynomial Hierarchy, Transfer Theorems, Sign Conditions.
• Keywords (in french):
Complexité Algébrique, Hiérarchie Polynomiale, Théorèmes de Transfert, Conditions de Signe.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+7p
• Format: Compressed postscript
• Get it

### Linux Activations : un support système performant pour les applications de calcul multithreads.

• By: Vincent Danjean
• Number: RR2000-14
• Date: March 2000
• Abstract:
In this paper, we present LinuxActivation, an efficient system support for user level thread scheduling implemented within Linux. This work is an extension to the Scheduler Activations'' model (proposed by Anderson and al.) that better meets the needs of high performance applications. The aim is to allow a user level thread scheduler to handle correctly blocking system calls. After dealing with the limitations of original model, we show how we can efficiently handle all system calls with good performance if we suppose a dedicated machine. We describe the implementation of the mechanisms involved in this new approach within the Linux operating system and the modifications made to a user level thread scheduler to take profits of these mechanisms. The performance numbers we obtain validate our approach.
• Abstract (in french):
Cet article présente LinuxActivations, un support système efficace pour l'ordonnancement des processus légers (threads) de niveau utilisateur implanté dans Linux. Ces travaux sont une variante des "Scheduler Activations" (introduites par Anderson et al.) dans le contexte du calcul parallèle. L'objectif est de permettre à un ordonnanceur de threads de niveau utilisateur de fonctionner correctement en présence d'appels systèmes bloquants. Après avoir examiné les limitations du modèle originel, nous montrons comment il est possible de couvrir l'intégralité des appels systèmes tout en conservant de bonnes performances si l'on suppose l'exécution sur une machine dédiée''. Nous décrivons l'implantation des mécanismes découlant de cette nouvelle proposition dans le système d'exploitation Linux ainsi que les modifications apportées à un ordonnanceur de threads existant pour tirer parti de ces mécanismes. Les performances obtenues valident notre approche.
• Keywords:
Multithreading, Hybrid Scheduling, Operating System, Linux.
• Keywords (in french):
Multithreading, Ordonnancement Mixte, Système d'Exploitation, Linux.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+9p
• Format: Compressed postscript
• Get it

### On the structure of some spaces of tilings.

• By: Eric Rémila
• Number: RR2000-15
• Date: April 2000
• Abstract:
We study the structure of the set of tilings of a polygon $P$ with bars of fixed length. We obtain a undirected graph connecting two tilings if one can pass from one tile to the other one by a flip (i. e a local replacement  of tiles). Using algebraic tools (as tiling groups and their quotients and subgroups), we give a formula to compute the distance in this graph (i. e. the minimal number of necessary flips) between two tilings. Moreover, we prove that, for each pair (T, T') of tilings, the set \Upsilon_{T, T'} formed from tilings which are in a  path of minimal length from  T to T' canonically has a structure of distributive lattice.
• Keywords:
Tiling, Group, Lattice.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+23p
• Format: Compressed postscript
• Get it

### A distributed storage system for a video-on-demand server.

• By: Alice Bonhomme, Loic Prylli
• Number: RR2000-16
• Date: April 2000
• Abstract:
The aim of this paper is to present the design of a distributed storage system for a video server. The main goal is to support good fault-tolerance capabilities (no single point of failure, and no perturbation of the clients at the time the failure occur) while keeping the resource overhead low (storage space, buffer sizes). To reach this aim, all nodes of the cluster are symmetrical, there is no master node, each node can at the same time serve clients, and handle part of the storage.
• Keywords:
Video Server, Distributed Filesystem, Fault Tolerance, Cluster of PCs.
• Availability: Electronic copy only.
• Citation: Alice Bonhomme and Loic Prylli. A distributed storage system for a video-on-demand server. In Euro-Par 2000: Parallel Processing, volume ?? of Lect. Notes in Comp. Science, page ??, Munchen, Germany, August 2000. Springer-Verlag. To appear.
• Size: 2+10p
• Format: Compressed postscript
• Get it

### A Portable and Adaptive Multi-Protocol Communication Library for Multithreaded Runtime Systems.

• By: Olivier Aumage, Luc Bouge, Raymond Namyst
• Number: RR2000-17
• Date: April 2000
• Abstract:
This paper introduces Madeleine II, an adaptive multi-protocol extension of the Madeleine portable communication interface. Madeleine II provides facilities to use multiple network protocols (VIA, SCI, TCP, MPI) and multiple network adapters (Ethernet, Myrinet, SCI) within the same application. Moreover, it can dynamically select the most appropriate transfer method for a given network protocol according to various parameters such as data size or responsiveness user requirements. We report performance results obtained using Fast-Ethernet and SCI.
• Keywords:
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+7p
• Format: Compressed postscript
• Get it

### Constant Multipliers for FPGAs.

• By: Florent de Dinechin, Vincent Lefèvre
• Number: RR2000-18
• Date: May 2000
• Abstract:
This paper presents a survey of techniques to implement multiplications by constants on FPGAs. It shows in particular that a simple and well-known technique, canonical signed recoding, can help design smaller constant multiplier cores than those present in current libraries. An implementation of this idea in Xilinx JBits is detailed and discussed. The use of the latest algorithms for discovering efficient chains of adders, subtractors and shifters for a given constant multiplication is also discussed. Exploring such solutions is made possible by the new FPGA programming frameworks based on generic programming languages, such as JBits, which allow an arbitrary amount of irregularity to be implemented even within an arithmetic core.
• Keywords:
Constant Multipliers, FPGAs, KCM.
• Availability: Electronic copy only.
• Citation: To appear at ENREGLE 2000, Las Vegas.
• Size: 2+9p
• Format: Compressed postscript
• Get it

### Generic Distributed Shared Memory: the DSM-PM2 Approach.

• By: Gabriel Antoniu, Luc Bougé, Raymond Namyst
• Number: RR2000-19
• Date: May 2000
• Abstract:
This paper describes DSM-PM2, a generic, multi-protocol distributed shared memory library built for PM2, a multithreaded runtime system with preemptive thread migration. DSM-PM2 allows threads running on different nodes to communicate via a virtually shared address space and supports multiple consistency models. For a given model, the user can select among several alternative implementations, based for instance on page migration, thread migration, or an adaptative combination of them. Moreover, new consistency models or new implementations of existing models can be easily added using the available library routines. DSM-PM2 is available on top of several UNIX systems and can use a large variety of network protocols (BIP, SCI, VIA, MPI, TCP, etc.). We report performance figures for platforms using different network protocols: SISCI/SCI, BIP/Myrinet and TCP/Myrinet.
• Keywords:
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+13p
• Format: Compressed postscript
• Get it

### Stochastic Localization of Instability and Deterministic Enhancement of Accuracy for Iterative Algorithms.

• By: Philippe Langlois
• Number: RR2000-20
• Date: May 2000
• Abstract:
Finite precision computations may affect the stability of iterative algorithms and the accuracy of computed solutions.Automatic approaches are proposed to control these effects as for example, the CESTAC and the CENA methods.
We focus here on a complementary use of these two methods to localize unstable behavior of the algorithm, improve its stability and the accuracy of the solutions.
We present computational experiments on ill-conditioned polynomial roots approximated with Newton's iteration.
• Keywords:
Finite Precision, Stability, Accuracy, Newton's Iteration, Polynomial Multiple Roots, CESTAC Method, CADNA Library, Automatic Correction, CENA Method.
• Availability: Electronic copy only.
• Citation: Presented at 16th IMACS World Congress, Lausanne, August 2000.
• Size: 2+8p
• Format: Compressed postscript
• Get it

### From Rounding Error Estimation to Automatic Correction with Automatic Differentiation.

• By: Philippe Langlois
• Number: RR2000-21
• Date: May 2000
• Abstract:
Using automatic differentiation (AD) to estimate the propagation of rounding errors in numerical algorithms is classic. We propose a new application of AD to roundoff analysis providing an automatic correction of the first order effect of the elementary rounding errors. We present the main characteristics of this method and significant examples of its application to improve the accuracy of computed results and/or the stability of the algorithm.sion computations may affect the stability of iterative algorithms and the accuracy of computed solutions.
• Keywords:
Automatic Error Analysis, Rounding Error, Floating Point Arithmetic, Automatic Differentiation.
• Availability: Electronic copy only.
• Citation: Presented at AD 2000, the 3rd International Conference on Automatic Differentiation:  From Simulation to Optimization, Nice, June 2000.
• Size: 2+7p
• Format: Compressed postscript
• Get it

### Loop Shifting for Loop Parallelization.

• By: Alain Darte, Guillaume Huard
• Number: RR2000-22
• Date: May 2000
• Abstract:
The automatic detection of parallel loops is a well-known problem. Sophisticated polynomial algorithms have been proposed to produce code transformations that reveal parallel loops, in an optimal way as long as the only objective is to reveal as much parallelism as possible. However, a weakness of these techniques is that there is very few control on the solutions they built. For example, it may happen that the produced solution needs a very complex unimodular transformation plus a shift of the different statements even though a much simpler solution exists. It would be more useful to first select its own unimodular transformation and to be able to check that it can be completed by a shift while revealing the same parallelism. Unfortunately, the main result of this paper is that this is a hard problem: finding a (multi-dimensional) shift that makes an innermost loop parallel is strongly NP-complete. Nevertheless, we show that several subcases of this problem are easily solvable and that the general problem can be solved thanks to integer linear programming.
• Keywords:
Loop Parallelization, Code Transformation, Retiming, Loop Shifting.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+40p
• Format: Compressed postscript
• Get it

### Towards a Formalization of pi-calculus Processes in Higher Order Abstract Syntax.

• By: Christine Roeckl, Daniel Hirschkoff, Stefan Berghofer
• Number: RR2000-23
• Date: June 2000
• Abstract:
Higher order abstract syntax is a natural way to formalize programming languages with binders, like the pi-calculus, because alpha-conversion and beta-reduction are delegated to the meta level of the provers, making tedious substitutions superfluous. However, such formalizations usually lack induction principles, and often give rise to exotic terms. Induction is necessary in syntax analysis, and certain important syntactic properties might be invalid in the presence of exotic terms. The paper introduces well formedness predicates for the pi-calculus with which exotic terms are excluded and, simultaneously, induction principles are obtained. The principles are then used in formal proofs of vital syntactic properties, mechanized in Isabelle/HOL.
• Keywords:
Mobility, General-Purpose Theorem Proving, Higher Order Abstract Syntax, Induction, Pi-Calculus.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+15p
• Format: Compressed postscript
• Get it

### Heterogeneity Considered Harmful to Algorithm Designers.

• By: Olivier Beaumont, Vincent Boudet, Arnaud Legrand, Fabrice Rastello, Yves Robert
• Number: RR2000-24
• Date: June 2000
• Abstract:
In this paper, we deal with algorithmic issues on heterogeneous platforms. We show that static scheduling and load-balancing strategies are absolutely needed to achieve good performances, in contrast to situation for homogeneous parallel machines where dynamic schemes often turn out to be very satisfactory. However, we also show that static strategies targeted to heterogeneous platforms are difficult to design and implement: intuitively, data distribution must obey a much more refined model than standard block-cyclic distributions to equally balance the load between processors of different speeds. Technically, we state several NP-completeness results that demonstrate the intrinsic difficulty of static load-balancing on heterogeneous platforms.
• Keywords:
Heterogeneous Platform, Cluster, Different-Speed Processors, Scheduling, Load Balancing, Data Distribution, Data Allocation.
• Availability: Electronic copy only.
• Citation: Proceedings of the first IEEE Internationnal Conference on Cluster Computing (Cluster 2000), pp403-404.
• Size: 2+28p
• Format: Compressed postscript
• Get it

### Open problems on cellular automata.

• By: Marianne Delorme, Enrico Formenti, Jacques Mazoyer
• Number: RR2000-25
• Date: June 2000
• Abstract:
We present a non exhaustive collection of open problems on Cellular Automata''. They have arisen from the open problems sessions of AUTOMATA'99'' (ENS Lyon, October 1999) as well as from our own works.
• Keywords:
Cellular Automata, Open Problems.
• Availability: Electronic copy only.
• Citation: Not published yet. This text is an extension of workshop AUTOMATA'99 held at ENS Lyon (october 1999).
• Size: 2+42p
• Format: Compressed postscript
• Get it

### A Portable and Efficient Communication Library for High-Performance Cluster Computing.

• By: Olivier Aumage, Luc Bougé, Alexandre Denis, Jean-Francois Méhaut, Guillaume Mercier, Raymond Namyst, Loic Pryllï
• Number: RR2000-26
• Date: July 2000
• Abstract:
This paper introduces Madeleine II, a new adaptive and portable multi-protocol communication library. Madeleine II has the ability to control multiple network protocols (BIP, SISCI, VIA) and multiple network adapters (Ethernet, Myrinet, SCI) within the same application session. Moreover, it includes advanced mechanisms to dynamically select the most appropriate transfer method for a given network protocol according to various parameters such as data size or responsiveness user requirements. We report on performance measurements obtained using BIP and SCI and we present preliminary results about our Nexus/Madeleine II and MPICH/Madeleine II ports.
• Keywords:
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+16p
• Format: Compressed postscript
• Get it

### Un algorithme de décision pour la limite des théories de courbes génériques.

• By: Natacha Portier
• Number: RR2000-27
• Date: August 2000
• Abstract:
O. Chapuis, E. Hrushovski, P. Koiran et B. Poizat ont introduit une nouvelle théorie complète : la théorie limite des courbes génériques. Nous montrons ici que des courbes génériques de degré supérieur à exp(2r) ne peuvent pas être distinguées à l'aide d'une formule du premier ordre de rang de quantification inférieur à r. Nous en déduisons facilement un algorithme de décision pour la théorie limite. Ce travail a été fait en collaboration avec Pascal Koiran. Il a été présenté le 27 juin 2000 à Qaragandy, au cinquième colloque franco-qasaq de Théorie des modèles. Un article en anglais, plus détaillé et contenant de nouveaux résultats sera publié ultérieurement.
• Keywords:
Courbes Génériques, Va-et-vient.
• Availability: Electronic copy only.
• Citation: Actes du cinquième colloque franco-kasak de Théorie des modèles, Qaragandy, juin 2000.
• Size: 2+7p
• Format: Compressed postscript
• Get it

### Some Properties of Hyperbolic Networks.

• By: Christophe Papazian, Eric Rémila
• Number: RR2000-28
• Date: September 2000
• Abstract:
Many mathematical results exist about continuous topological surfaces of negative curvature. We give here some properties of discrete regular tessellations on such objects and explain a characterization of discrete geodesics and areas that shows how such hyperbolic networks can be seen as intermediary structures between Euclidean infinite tessellations (like square grid) and regular infinite trees. We do not use some possible group structures of this networks (Cayley graphs) but only geometrical arguments in our constructive proofs. Hence we can see that there are few geodesics in hyperbolic networks and that large areas have very unsmooth border.
• Keywords:
Hyperbolic Geometrie, Geodesics, Hyperbolic Networks.
• Availability: Electronic copy only.
• Citation: Will be published soon (DGCI2000).
• Size: 2+10p
• Format: Compressed postscript
• Get it

### Back-and-forth systems for generic curves and a decision algorithm for the limit theory.

• By: Pascal Koiran, Natacha Portier
• Number: RR2000-29
• Date: September 2000
• Abstract:
It was recently shown that the theories of generic algebraic curves converge to a limit theory as their degrees go to infinity. In this paper we give quantitative versions of this result and other similar results. In particular, we show that generic curves of degree higher than 2 to the power r cannot be distinguished by a first-order formula of quantifier rank r. A decision algorithm for the limit theory then follows easily. We also show that in this theory all formulas are equivalent to boolean combinations of existential formulas, and give a quantitative version of this result.
• Keywords:
Algebraic Curves, Algebraic Algorithms, Back-and-Forth Systems, Model Theory.
• Availability: Electronic copy only.
• Citation: This paper was accepted in "Annals of Pure and Applied Logic". To appear.
• Size: 2+19p
• Format: Compressed postscript
• Get it

### MPICH/Madeleine: a True Multi-Protocol MPI for High Performance Network.

• By: Olivier Aumage, Guillaume Mercier, Raymond Namyst
• Number: RR2000-30
• Date: October 2000
• Abstract:
This report introduces a version of MPICH handling efficiently different networks simultaneously. The core of the implementation relies on a device called ch_mad which is based on a generic multiprotocol communication library called Madeleine. The performance achieved with tested networks such as Fast-Ethernet, Scalable Coherent Interface or Myrinet is very good. Indeed, this multi-protocol version of MPICH generally outperforms other free or commercial implementations of MPI.
• Keywords:
MPI, High-Performance Network, Heterogeneity, Multi-Protocol, Cluster Computing.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+16p
• Format: Compressed postscript
• Get it

### Reductions, intersection types, and explicit substitutions.

• By: Daniel Dougherty, Pierre Lescanne
• Number: RR2000-31
• Date: October 2000
• Abstract:
This paper is part of a general programme of treating explicit substitutions as the primary lambda-calculi from the point of view of foundations as well as applications. We work in the composition-free calculus lambda-x of explicit substitutions and the augmented calculus obtained by adding explicit garbage-collection, and explore the relationship between intersection-types and reduction. We show that the strongly normalizable terms, the terms which normalize by leftmost reduction, and the terms which normalize by head reduction can each be characterized as the terms typable in a certain system. As a consequence we characterize a class of deterministic reduction strategies which are normalizing strategies and so can serve as models for evaluation in functional programming. The type systems we study are the natural generalizations of the classical systems, but the proofs require some new techniques in the presence of explicit reductions involving substitutions. Indeed, in our proofs we do not rely on results from classical lambda-calculus, which in our view comes after the calculus of explicit substitution. The proof technique we employ yields readily the result that a term is strongly normalizing iff it is strongly normalizing in the calculus extended by garbage-collection.
• Keywords:
Explicit Substitutions, Lambda-Calculus, Types, Intersection Types.
• Availability: Electronic copy only.
• Citation: Accepted in "Typed Lambda calculus and Applications" 2001.
• Size: 2+29p
• Format: Compressed postscript
• Get it

### The Data Broadcast Problem with Non-Uniform Transmission Times.

• By: Claire Kenyon, Nicolas Schabanel
• Number: RR2000-32
• Date: October 2000
• Abstract:
The data broadcast problem consists in finding an infinite schedule to broadcast a given set of messages so as to minimize the expected service time to clients requesting messages, and the cost of the broadcast. This problem also models the maintenance scheduling problem and the multi-item replenishment problem. Previous work concentrated on a discrete-time restriction where all messages have transmission time equal to 1. Here, we study a generalization of the model to the more realistic setting of continuous time and messages of non-uniform transmission times. The structural properties of the optimal solutions appear to be very different from the uniform case. We prove that the data broadcast problem is NP-hard, even if the broadcast costs are all zero, and give a randomized and a derandomized 3-approximation algorithm for broadcasting messages on a single channel.
• Keywords:
Data Broadcast, Push Technology, Approximation Algorithm, Derandomization, Lower Bounds.
• Availability: Electronic copy only.
• Citation: Submitted.
• Size: 2+28p
• Format: Compressed postscript
• Get it

### Bound on Run of Zeros and Ones for Images of Floating-Point Numbers by Algebraic Functions.

• By: Tomas Lang, Jean-Michel Muller
• Number: RR2000-33
• Date: November 2000
• Abstract:
This paper presents upper bounds on the number of zeros and ones after the rounding bit for algebraic functions. These functions include reciprocal, division, square root, and inverse square root, which have been considered in previous work. We here propose simpler proofs for the previously given bounds given and generalize to all algebraic functions. We also determine cases for which the bound is achieved for square root. As is mentioned in the previous work, these bounds are useful for determining the precision required in the computation of approximations in order to be able to perform correct rounding.
• Keywords:
Algebraic Functions, Computer Arithmetic, Table Maker's Dilemma, Correct Rounding, Floating-Point Arithmetic.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+14p
• Format: Compressed postscript
• Get it

### On-the-fly Range Reduction.

• By: Vincent Lefevre, Jean-Michel Muller
• Number: RR2000-34
• Date: November 2000
• Abstract:
In several cases, the input argument of an elementary function evaluation is given bit-serially, most significant bit first. We suggest a solution for performing the first step of the evaluation (namely, the range reduction) \emph{on the fly}: the computation is overlapped with the reception of the input bits. This algorithm can be used for the trigonometric functions $\sin$, $\cos$, $\tan$ as well as for the exponential function.
• Keywords:
Range Reduction, Elementary Functions, Computer Arithmetic.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+6p
• Format: Compressed postscript
• Get it

### Worst Cases for Correct Rounding of the Elementary Functions in Double Precision.

• By: Vincent Lefevre, Jean-Michel Muller
• Number: RR2000-35
• Date: November 2000
• Abstract:
We give here the results of a four-year search for the worst cases for correct rounding of the major elementary functions in double precision. These results allow the design of reasonably fast routines that will compute these functions with correct rounding, at least in some interval, for any of the four rounding modes specified by the IEEE-754 standard. They will also allow to easily test libraries that are claimed to provide correctly rounded functions.
• Keywords:
Elementary Functions, Computer Arithmetic, Table Maker's Dilemma, Correct Rounding, Floating-Point Arithmetic.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+14p
• Format: Compressed postscript
• Get it

### The topological entropy of iterated piecewise affine maps is uncomputable.

• By: Pascal Koiran
• Number: RR2000-36
• Date: November 2000
• Abstract:
We show that it is impossible to compute (or even to approximate) the topological entropy of a continuous piecewise affine function in dimension 4. The same result holds for saturated linear functions in unbounded dimension. We ask whether the topological entropy of a piecewise affine function is always a computable real number, and conversely whether every non-negative computable real number can be obtained as the topological entropy of a piecewise affine function. It seems that these two questions are also open for cellular automata.
• Keywords:
Topological Entropy, Piecewise Affine Functions, Saturated Linear Functions, Cellular Automata, Undecidability.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+7p
• Format: Compressed postscript
• Get it

### Stylesheet Validation.

• By: Philippe Audebaud, Kristoffer H. Rose
• Number: RR2000-37
• Date: November 2000
• Abstract:
We propose a method for validating that a standard XSLT/XPath stylesheet is document type safe'', i.e., produces a valid XML output document when processing a valid XML input document. To preserve the possibility for writing reuseable stylesheets we propose a notion of DTD pattern'' that captures the minimal requirements imposed by a stylesheet to its input and output DTD, and we explain how to validate a stylesheet for an input/output pair of DTD patterns.
• Keywords:
XML, XSLT, Validation, Typechecking, Pattern.
• Availability: Electronic copy only.
• Citation: Submitted to XML 2000.
• Size: 2+10p
• Format: Compressed postscript
• Get it

### Some Improvements on Multipartite Table Methods.

• By: Florent de Dinechin, Arnaud Tisserand
• Number: RR2000-38
• Date: November 2000
• Abstract:
This paper presents an unified view of most previous table-lookup-and-addition methods: bipartite tables, SBTM, STAM and multipartite method. This new definition allows a more accurate computation of the error entailed by these methods. Being more general, it also allows an exhaustive design space exploration which has been implemented, and leads to tables smaller than previously published ones by up to 50%. These tables have also been synthesised for Virtex FPGAs, and the paper discusses some results of this synthesis.
• Keywords:
Bipartite Tables, Table-lookup-and-addition Methods, SBTM, STAM, Multipartite, FPGA.
• Availability: Electronic copy only.
• Citation: Proceedings of Advanced Signal Processing Algorithms, Architectures and Implementations X. SPIE Volume 4116, pages 226-234. San Diego, USA. 2-4 August 2000.
• Size: 2+12p
• Format: Compressed postscript
• Get it

### DSM-PM2: A portable implementation platform for multithreaded DSM consistency protocols.

• By: Gabriel Antoniu, Luc Bougé
• Number: RR2000-39
• Date: November 2000
• Abstract:
DSM-PM2 is a platform for designing, implementing and experimenting multithreaded DSM consistency protocols. It provides a generic toolbox which facilitates protocol design and allows for easy experimentation with alternative protocols for a given consistency model. DSM-PM2 is portable across a wide range of clusters. We illustrate its power with figures obtained for different protocols implementing sequential consistency, release consistency and Java consistency, on top of Myrinet, Fast-Ethernet and SCI clusters.
• Keywords:
DSM, Multithreading, Consistency Protocols, DSM-PM2, PM2.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+12p
• Format: Compressed postscript
• Get it

### Distribution de contenus multimédia par flot continu : un état de l'art.

• By: Jean-Christophe Mignot
• Number: RR2000-40
• Date: November 2000
• Abstract:
This paper presents a survey of the state-of-the-art techniques and prototypes for streaming over the Internet. The basic principles are presented, the most important approaches (RTP, RTSP) are described.
• Keywords:
Real-Time, Streaming, Internet Protocols.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+13p
• Format: Compressed postscript
• Get it

### Les nouveaux codecs : un état de l'art.

• By: Jean-Christophe Mignot
• Number: RR2000-41
• Date: December 2000
• Abstract:
This paper presents a survey of the state-of-the-art techniques and prototypes for compressing videos. A particular interest is devoted to compression for streaming over the Internet. The basic principles are presented, the most important approaches are described.
• Keywords:
Image Compression, Video Compression, MPEG.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+24p
• Format: Compressed postscript
• Get it

### A revised presentation of the CENA method.

• By: Philippe Langlois
• Number: RR2000-42
• Date: December 2000
• Abstract:
The CENA method is a new automatic method to correct the first-order effect
of floating point  rounding errors on the result  of a numerical algorithm.
A first  presentation  of this method   is proposed in   the INRIA Research
Report RR-3828 ( december 1999).  The current report completes and replaces
the previous  one.  We hark back  to  the presentation  of  the CENA method
exhibiting new important  aspects. We illustrate  the pros-and-cons of  the
method solving triangular  linear  systems. These  systems are designed  to
generate  a significantly inaccurate computed   solution. We propose a  new
analysis of  the analogy and discrepancy between  the linear correction and
computing with extented precision.
• Keywords:
Automatic Error Analysis, Rounding Error, Floating Point Arithmetic.
• Availability: Electronic copy only.
• Citation: To appear in BIT-Numerical Mathematics.
• Size: 2+25p
• Format: Compressed postscript
• Get it

### Parallel Computation on Interval Graphs: Algorithms and Experiments.

• By: Afonso Ferreira, Isabelle Guérin-Lassous, Karina Marcus, Andrew Rau-Chaplin
• Number: RR2000-43
• Date: December 2000
• Abstract:
This paper describes efficient coarse-grained parallel algorithms and implementations for a suite of interval graph problems. Included are algorithms requiring only a constant number of communication rounds for connected components, maximum weighted clique, and breadth-first-search and depth-first-search trees, as well as O(log p) communication rounds algorithms for optimization problems such as minimum interval covering, maximum independent set and minimum dominating set, where p is the number of processors in the parallel system. Implementations of these algorithms are evaluated on parallel clusters, using both Fast Ethernet and Myrinet interconnection networks, and on a CRAY T3E parallel multicomputer, with extensive experimental results being presented and analyzed.
• Keywords:
Parallel Algorithms, Interval Graphs, Coarse Grain, Implementations, Clusters, CRAY T3E.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+25p
• Format: Compressed postscript
• Get it

### Static LU Decomposition on Heterogeneous Platforms.

• By: Olivier Beaumont, Arnaud Legrand, Fabrice Rastello, Yves Robert
• Number: RR2000-44
• Date: December 2000
• Abstract:
In this paper, we deal with algorithmic issues on heterogeneous platforms. We concentrate on dense linear algebra kernels, such as matrix multiplication or LU decomposition. Block cyclic distribution techniques used in ScaLAPACK are no longer sufficient to balance the load among processors running at different speeds. The main result of this paper is to provide a static data distribution scheme that leads to an asymptotically perfect load balancing for LU decomposition, thereby providing solid foundations toward the design of a cluster-oriented version of ScaLAPACK.
• Keywords:
Heterogeneous Platforms, Different-Speed Processors, Load-Balancing, LU Decomposition.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+13p
• Format: Compressed postscript
• Get it

### Dense Linear Algebra Kernels on Heterogeneous Platforms: Redistribution Issue.

• By: Olivier Beaumont, Arnaud Legrand, Fabrice Rastello, Yves Robert
• Number: RR2000-45
• Date: December 2000
• Abstract:
In this paper, we deal with redistribution issues for dense linear algebra kernels on heterogeneous platforms. In this context, processors speeds may well vary during the execution of a large kernel, which requires efficient strategies for redistributing the data along the computations. The strategy that we propose is to redistribute data after some well identified static phases and therefore, it is neither fully static nor fully dynamic. We present an optimal algorithm (under some assumptions) for redistributing data when performing matrix matrix multiplication.
• Keywords:
Heterogeneous Platforms, Different-Speed Processors, Load-Balancing, Data Redistribution, Matrix Product.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+15p
• Format: Compressed postscript
• Get it

### DAGSim: A Simulator for DAG Scheduling Algorithms.

• By: Aubin Jarry, Henri Casanova, Francine Berman
• Number: RR2000-46
• Date: December 2000
• Abstract:
Scheduling the tasks of a distributed application has been an active field of research for several decades. The classic scheduling problem is to find a assignment of application tasks onto a set of distributed resources in a way that minimizes the overall application execution time. This problem can be formulated for different classes of applications and computing environments. It has been shown that most non-trivial instances of the scheduling problem are NP-complete. As a result, many research works propose different solutions including heuristic-based algorithms, guided random searches, and genetic algorithms. However, it is difficult to determine which solutions are practical for real scenarios. Indeed, most available results ignore many characteristic of large computing environments such as dynamically changing resource performance and availability, complex network topologies and network contention, data storage issues, or inaccuracy of resource performance estimates. In this paper, we present DAGSim, a simulator specifically designed to evaluate scheduling algorithms for application structured as Directed Acyclic dependency Graphs (DAGs). This simulator is built on top of Simgrid~\cite{simgrid}, a toolkit for the fast and accurate simulation of distributed applications with realistic assumptions concerning the computing environment. We describe the implementation of DAGSim and how it was used to implement two well-known scheduling algorithms (DCP~\cite{DCP} and DLS~\cite{DLS}). After giving preliminary results with these two algorithms, we give implementation recommendations for further improvement of Simgrid and DAGSim.
• Keywords:
Heterogeneous Processors, Load Balancing, Grid Computing, NWS, Scheduling, Simulator.
• Availability: Electronic copy only.
• Citation: Not published yet.
• Size: 2+8p
• Format: Compressed postscript
• Get it