### 3D thinning algorithms on distributed memory machines.

**By:**Virginie Marion-Poty**Number:**RR1994-01**Date:**January 1994**Abstract:**

- I present two skeletonization algorithms using thinning operators. These operators are local masks based on 26-neighborhood or 3*3*3 neighborhood. My goal is to study the implementation of these algorithms on a MIMD machine. I describe two methods of parallelization, one based on the decomposition of the thinning operator into sub-operators, and the other based on the decomposition of the study domain (image) into sub-domains.

**Keywords:**

- Thinning Operators, Sub-operator, Sub-domain, Parallel Algorithms.

**Availability:**Electronic copy only.**Citation:**Published in Proceedings 3rd International Workshop on Parallel Image Analysis: University of Maryland at College Park, 1994.**Size:**2+18p**Format:**Compressed PostScript- Get it

### User Transparent Interval Arithmetic.

**By:**Marc Daumas , Christophe Mazenc , Jean-Michel Muller**Number:**RR1994-02**Date:**January 12, 1994**Abstract:**

- Interval arithmetic provides a tool powerful enough to validate computations. Yet no implementation standard is available to the programmers. Testing a code involves getting acquainted with an evolving non standard interval arithmetic package, modifying the code and running the new source with the correct extensions. We propose a self contained easy to use interval arithmetic package that can be litteraly put in place of standard floating point arithmetic. Users would get interval guaranteed results without even recompiling the source code.

**Keywords:**

- Computer Arithmetic, Floating Point, Interval Arithmetic.

**Availability:**Electronic copy only.**Citation:**Published in proceedings IMACS/GAMM International Symposium SCAN-93, Vienna, Austria, 1993. Published in Interval Computations, no. 4, pp. 46-63, 1994.**Size:**2+14p**Format:**Compressed PostScript- Get it

### Some issues on CARESSE, a new heterogeneous fine grain parallel-pipelined architecture.

**By:**Mario Fiallos-Aguilar , Jean Duprat**Number:**RR1994-03**Date:**January 13, 1994**Abstract:**

- Here, we deal with a new fine grain parallel-pipelined architecture made up of heterogeneous N arithmetic units (AUs). We present some main issues of such an architecture, including the model of computation, its digit-serial AUs, new scheduling heuristics and examples of linear algebra computations. Using parallel discrete-event simulations and computation visualization on a massively parallel computer, we present some measures of its performance.

**Keywords:**

- Fine Grain Parallelism, Heterogeneous Processing, Digit On-line Computation.

**Availability:**Paper copy available.**Size:**2+26p**Format:**Compressed PostScript- Get it

### Interval Routing Schemes.

**By:**Pierre Fraigniaud , Cyril Gavoille**Number:**RR1994-04**Date:**January 19, 1994**Abstract:**

- Interval routing was introduced to reduce the size of the routing tables: a router finds the direction to forward a message by determining the interval that contains the destination address of the message, each interval being associated to one particular direction. This way of implementing routing function is quite attractive but very few was known about the topological properties that should satisfy a network to admit an interval routing function satisfying particular constraints (shortest paths, limited number of intervals associated to each direction, etc). In this paper, we deeply investigate the study of the interval routing functions. In particular we characterize the networks that admit a linear or a linear strict interval routing function with only one interval per direction. We also derive practical tools to measure the efficiency of an interval routing function (number of intervals, length of the paths, etc), and we describe large classes of networks that admit an optimal (linear) interval routing function. Finally, we derive the main properties that satisfy the popular networks used to interconnect the processors of a distributed memory parallel computer.

**Keywords:**

- Routing Function, Intervals, Interconnected Networks, Distributed Memory Parallel Computer.

**Availability:**Electronic copy only.**Citation:**Published in proceedings CONPAR'94-VAPP VI, LNCS 854, pp 785-796, 1994 & proceedings 13rd ACM Symp on PODC, ACM Press, pp 216-224, 1994 : short version.**Size:**2+29p**Format:**Compressed PostScript- Get it

### Parallel Merge Sort for Distributed Memory Architectures.

**By:**Jean-Marc Adamo , Luis Trejo**Number:**RR1994-05**Date:**January 17, 1994**Abstract:**

- Cole presented a parallel merge sort for the PRAM model that performs in log n parallel steps using n processors. He gave an algorithm for the CREW PRAM model for which the constant in the running time is small. He also gave a more complex version of the algorithm for the EREW PRAM; the constant factor in the running time is still moderate but not as small. In this paper we give an approach to implement Cole's parallel merge sort on a distributed memory architecture. Both, the CREW and the EREW algorithms have been considered. A data placement algorithm is presented as well as the associated data movements. Our proposition sorts n items using exactly n processors in log n parallel time. The constant in the running time is only one greater than the one obtained for the PRAM model.

**Keywords:**

- Parallel Merge Sort, Parallel Architecture, Distributed Memory, Parallel Algorithm, PRAM, Pipe-Line.

**Availability:**Electronic copy only.**Size:**2+35p**Format:**Compressed PostScript- Get it

### The data-parallel programming model: a semantic perspective (revised version).

**By:**Luc Bouge**Number:**RR1994-06**Date:**January 1994**Abstract:**

- We propose a short introduction to the Data-Parallel programming model. We show that parallel computing often makes little distinction between the execution model and the programming model. This results in poor programming and low portability. Using the ``GOTO considered harmful'' analogy, we show that data-parallelism can be seen as a way out of this collapsing. We show that this model was already present in several works on parallel programming methodology, and that it can be characterized by a small number of concepts with simple semantics.

**Keywords:**

- Programming Languages, Data-Parallelism, Operational Semantics, Programming Models.

**Availability:**Electronic copy only.**Citation:**Published in French as Le modele de programmation a parallelisme de donnees: une perspective semantique. Technique et science informatiques, Special issue on data-parallel languages, Vol. 12, No. 5/6, 1993, pp. 541--562.**Size:**2+15p**Format:**Compressed PostScript- Get it

### Strategies for multicasting in meshes.

**By:**Eric Fleury , Pierre Fraigniaud**Number:**RR1994-07**Date:**March 17, 1994**Abstract:**

- Multicasting is an information dissemination problem that consists for a processor of a distributed memory parallel computer to send the same message to a part of the other processors. In this paper, we propose new multicasting algorithms for the mesh under the wormhole routing mode and using the read-and-route facility. These new algorithms perform generally faster than the algorithms previously described in the literature under the same model. We take, as criteria to compare the algorithms, the off-line computation time necessary to prepare the multicast, and the communication time to complete the multicast in absence of traffic. Moreover, since it is often not possible to derive useful expression for the communication time, we also simulate the algorithms on a 32x32 mesh.

**Keywords:**

- Multicasting, Routing, Wormhole, Multiprocessor, Parallel Computers.

**Availability:**Electronic copy only.**Citation:**Published in proceedings ICPP'94, 23rd International Conference on Parallel Processing, 1994.**Size:**2+22p**Format:**Compressed PostScript- Get it

### Kautz Necklaces

**By:**Pavel Tvrdik**Number:**RR1994-08**Date:**March 27, 1994**Abstract:**

- Vertices of de Bruijn digraph can be partitioned in a natural way into mutually disjoint cycles, called de Bruijn necklaces. Necklaces are in fact equivalence classes of an equivalence induced on the vertex set of de Bruijn digraph by cyclic left shifting. The number of necklaces of d-regular de Bruijn digraph with diameter D is well known and there exists an efficient algorithm for generating de Bruijn necklaces in lexicographic order. In this report, we show that similar results exist for Kautz digraphs. We introduce the notion of Kautz necklaces, present a linear-time and constant-space algorithm to generate them, and we give a formula to enumerate them. We also define a straightforward mapping between vertices of K(d,D) and B(d,D) U B(d,D-1) preserving the partition of vertices into necklaces and study the structural similarities between de Bruijn and Kautz digraphs based on this mapping.

**Keywords:**

- Interconnection Networks, Kautz Digraphs, De Bruijn Digraphs, Cycles, Necklaces, Enumerating, Embedding.

**Availability:**Paper copy available.**Citation:**Published in proceedings Sixth IEEE Symposium on Parallel and Distributed Processing 1994, IEEE CS Press, pp 409-415, 1994.**Size:**2+21p**Format:**Compressed PostScript- Get it

### Deadlocks in adaptive wormhole routing.

**By:**Eric Fleury , Pierre Fraigniaud**Number:**RR1994-09**Date:**March 22, 1994**Abstract:**

- In this paper, we give a necessary and sufficient condition for any adaptive wormhole routing function to be deadlock free. This result generalizes the result of Dally and Seitz that applies to deterministic routing functions, and the result of Duato that applies only for particular kinds of adaptive routing functions. Moreover, our proof is constructive and allows to derive a systematic way to check if an adaptive wormhole routing function is deadlock free or not.

**Keywords:**

- Wormhole, Routing, Deadlock.

**Availability:**Electronic copy only.**Citation:**Submitted for publication.**Size:**2+18p**Format:**Compressed PostScript- Get it

### Evaluating Array Expressions on Massively Parallel Machines with Communication/Computation Overlap.

**By:**Vincent Bouchitte , Pierre Boulet , Alain Darte , Yves Robert**Number:**RR1994-10**Date:**March 24, 1994**Abstract:**

- This paper deals with the problem of evaluating HPF style array expressions on massively parallel distributed-memory computers (DMPCs). This problem has been addressed by Chatterjee et al. under the strict hypothesis that computations and communications cannot overlap. As such a model appears to be unnecessarily restrictive for modeling state-of-the-art DMPCs, we relax the restriction and we allow for simultaneous computations and communications. This simple modification has a tremendous effect on the complexity of the optimal evaluation of array expressions. We first show that even a simple version of the problem is NP-complete. Then, we present some heuristics, which we are able to guarantee in some very important cases in practice, namely for coarse-grain or fine grain computations.

**Keywords:**

- Parallelism, Array Expressions, Distributed Memory, Communications Overlap.

**Availability:**Electronic copy only.**Citation:**Published in proceedings CONPAR'94-VAPP VI, LNCS 854, Springer Verlag, pp 713-724, 1994. To appear in International Journal Supercomputing Appl & Hight Performance.**Size:**2+23p**Format:**Compressed PostScript- Get it

### Parallelizing compilers: what can be achieved?

**By:**Michele Dion , Jean-Laurent Philippe , Yves Robert**Number:**RR1994-11**Date:**March 25, 1994**Abstract:**

- We discuss automatic parallelization techniques that take a sequential program as input and produce executable code for massively parallel distributed memory machines. Target programming style is SPMD data-parallel. Target language is High Performance Fortran. We review the whole design trajectory: data dependence extraction, scheduling and mapping onto virtual processors, code generation for execution on physical processors. With concrete examples we explain all the difficulties we have to face. We show that fully automatic parallelization can be achieved for simple computational kernels arising from scientific applications, namely loop nests programs with uniform or affine dependences.

**Keywords:**

- Parallelism, HPF, SPMD, Scheduling, Mapping, Dependences, Code Genaration, Loop Nest.

**Availability:**Electronic copy only.**Citation:**Published in High-Performance Computing and Networking, April 94 Proceedings, Volume II: Networking and Tools.**Size:**2+13p**Format:**Compressed PostScript- Get it

### On the expressivity of a weakest precondition calculus for a simple data-parallel programming language.

**By:**Luc Bouge , Yann Le Guyadec , Bernard Virot , Gil Utard**Number:**RR1994-12**Date:**April 1, 1994**Abstract:**

- We present a weakest preconditions calculus a la Dijkstra for a small common kernel of existing data-parallel languages. We use two-part assertions, where the current extent of parallelism is specified by a separate boolean vector expression. Our main contribution is concerned with the conditioning construct ``where'' which modifies the current extent of parallelism. We prove that the weakest (strict) preconditions of a ``where'' block is definable by an assertion as soon as its body is. We show that this is not the case with its weakest liberal preconditions. This sheds a new light on the deep semantic nature of the data-parallel conditioning construct.

**Keywords:**

- Concurrent Programming, Specifying and Verifying and Reasoning about Programs, Semantics of Programming Languages, Data-Parallel Languages, Proof System, Hoare Logic, Weakest Preconditions.

**Availability:**Paper copy available.**Citation:**Published in Proc. ConPar'94--VAPP VI Conference, Linz, Austria, 1994.**Size:**2+15p**Format:**Compressed PostScript- Get it

### Validation of the compilation of Data-Parallel C ``while'' loops for shared memory architectures.

**By:**Gil Utard**Number:**RR1994-13**Date:**April 1, 1994**Abstract:**

- This report focuses on the compilation of the ``while'' loops in data-parallel languages for MIMD Shared Memory architectures. An efficient compilation must decrease the number of ``global synchronizations'' due to dependencies. We validate an optimization suggested by Hatcher and Quinn for the DPC language. It consists in splitting the original loop into two loops : one ``computation loop'' without any additional control dependencies, and one ``waiting loop'' to assure termination. Computation loop's body presents a minimal number of global synchronizations. We study informaly its correction proof, and give the methodology leading of its conception. The formal proof is based on the axiomatic semantics of Owiki and Gries. We give an axiomatization of the global synchronization statement, and specify which are the sufficient conditions for a non-deadlocking execution. In Hatcher and Quinn's solution, we observe that the waiting loop is independant of the computation one. The former loop absorbs residual synchronizations of any parallel program. We conclude by presenting a modular method to elaborate parallel programs.

**Keywords:**

- Concurrent Programming, Data-Parallel Languages, Compilation, Validating Compilation Scheme, Axiomatic Semantics.

**Availability:**Paper copy available.**Size:**2+21p**Format:**Compressed PostScript- Get it

### Massivelly parallel machine based on T9000 and C104.

**By:**Eric Fleury , Marc Picquendar**Number:**RR1994-14**Date:**May 2, 1994**Abstract:**

- The T9000 transputer and its companion routing chip, the C104, allow the construction of very large networks (several thousand processors). The designer of such a network must select a topology taking into account both performance (i.e. small communication delays) and engineering considerations (cost and wireability). This paper presents preliminary studies of various candidate topologies for large networks of transputers. These topologies are all based on small building modules which are easy to package; various ways of interconnecting these modules are studied. For each candidate topology, we describe the connection patterns and routing functions; we also discuss the relative merits of the various topologies both from a performance viewpoint (through simulations) and an engineering viewpoint (cost and wireability).

**Keywords:**

- T9000, C104, Multiprocessor, Parallel Architecture, Interconnection Networks.

**Availability:**Electronic copy only.**Citation:**Published in proceedings Transputers'94, IOS Press, pp 120-134, 1994.**Size:**2+13p**Format:**Compressed PostScript- Get it

### Factoring and scaling Kautz digraphs.

**By:**Pavel Tvrdik**Number:**RR1994-15**Date:**May 2, 1994**Abstract:**

- It has been recently shown that vertices of Kautz digraphs, similarly to de Bruijn digraphs, can be partitioned into in a natural way into mutually disjoint cycles, called necklaces. There is a linear-time algorithm generating the Kautz necklaces in lexicographic order and the number of the necklaces is known. In this report, we present a necklace-based algorithm to partition the arcs of d-regular Kautz Bruijn digraphs of diameter D into d arc-disjoint 1-factors. Using this factorization, we show how to construct Kautz partial line digraphs with connectivity close to the connectivity of the full Kautz digraphs. We describe a construction of d-regular partial line digraphs of d-regular Kautz digraphs. This result implies that the space between d-regular Kautz digraphs of diameter D and D+1 is evenly populated by d-2 digraphs with the same routing and connectivity properties. Finally, a generalization of this method enables to construct partial line digraphs of any size with the best possible connectivity close to d. These results make the Kautz digraphs unique in that sense that they are not only very dense digraphs with fixed degree, but also very well incrementally scalable.

**Keywords:**

- Interconnection Networks, Kautz Digraphs, de Bruijn Digraphs, Cycles, Necklaces, Factoring, Partial Line Digraphs, Connectivity, Incremental Scalability.

**Availability:**Paper copy available.**Citation:**Published in Proceedings Sixth IEEE Symposium on Parallel and Distributed Processing 1994, IEEE CS Press, pp 409-415.**Size:**2+16p**Format:**Compressed PostScript- Get it

### A Parallel Performance Study of Jacobi-like Eigenvalue Solution.

**By:**Makan Pourzandi , Bernard Tourancheau**Number:**RR1994-16**Date:**March 1994**Abstract:**

- In this report we focus on Jacobi like resolution of the eigen-problem for a real symmetric matrix from a parallel performance point of view: we try to optimize the algorithm working on the communication intensive part of the code. We discuss several parallel implementations and propose an implementation which overlaps the communications by the computations to reach a better efficiency. We show that the overlapping implementation can lead to significant improvements. We conclude by presenting our future work.

**Keywords:**

- Intel iPSC/860, Eigenvalue Problem, Jacobi Method.

**Availability:**Paper copy available.**Citation:**Published in proceedings of First International Meeting on Vector and Parallel Processing, 1993.**Size:**2+16p**Format:**Compressed PostScript- Get it

### Knowledge Extraction From Neural Networks : A Survey.

**By:**Richard Baron**Number:**RR1994-17**Date:**May 27, 1994**Abstract:**

- Artificial neural networks may learn to solve arbitrary complex problems. But knowledge acquired is hard to exhibit. Thus neural networks appear as ``black boxes'', the decisions of which can't be explained. In this survey, different techniques for knowledge extraction from neural networks are presented. Early works have shown the interest of the study of internal representations, but these studies were domain specific. Thus, authors tried to extract a more general form of knowledge, like rules of an expert system. In a more restricted field, it is also possible to extract automata from neural networks, likely to recognize a formal language. Finally, numerical information may be obtained in process modelling, and this may be of interest in industrial applications.

**Keywords:**

- Artificial Neural Networks, Knowledge Extraction, Expert Systems.

**Availability:**Electronic copy only.**Size:**2+13p**Format:**Compressed PostScript- Get it

### Escape constructs in data-parallel languages: semantics and proof system.

**By:**Luc Bouge , Gil Utard**Number:**RR1994-18**Date:**June 2, 1994**Abstract:**

- We describe a simple data-parallel kernel language which encapsulates the main data-parallel control structures found in high-level languages such as MasPar's MPL or the recent HyperC. In particular, it includes the concept of data-parallel escape, which extends the break and continue constructs of the scalar C language. We give this language a natural semantics, we define a notion of assertion and describe an assertional proof system. We demonstrate its use by proving a small data-parallel Mandelbrot-like program.

**Keywords:**

- Concurrent Programming, Specifying and Verifying and Reasoning about Programs, Semantics of Programming Languages, Data-Parallel Languages, Proof System, Hoare Logic.

**Availability:**Paper copy available.**Size:**2+19p**Format:**Compressed PostScript- Get it

### Resource-constrained Scheduling of Partitioned Algorithms on Processor Array.

**By:**Michele Dion , Tanguy Risset , Yves Robert**Number:**RR1994-19**Date:**June 27, 1994**Abstract:**

- We deal with the problem of partitioning and mapping uniform loop nests onto physical processor arrays. Resource constraints are taken into account: not only we assume a limited number of available processors, but we also assume that the communication capabilities of the physical processors are restricted (in particular, the number of communication links in each direction is bounded). This paper is motivated by the recent work of Chou and Kung and of Thiele. Our main contributions are a new formulation of the complex optimization problem to be solved in terms of a single integer linear programming problem, as well as optimal scheduling algorithms and complexity results in the case of linear processor arrays.

**Keywords:**

- Parallelism, SPMD, Processor Arrays, Scheduling, Tiles, Resource Constraints.

**Availability:**Electronic copy only.**Size:**2+29p**Format:**Compressed PostScript- Get it

### A new 2-D and 3-D thinning algorithm based on successive border generations.

**By:**Serge Miguet , Virginie Marion-Poty**Number:**RR1994-20**Date:**July 22, 1994**Abstract:**

- In this paper we present a skeletonization algorithm based on contour peeling. This algorithm can be applied on 2-D as well as on 3-D images. The 2-D algorithm is a new formulation of Pavlidis' algorithm, avoiding the explicit contour tracking, and is based on local computations only. The new thinning operator can be applied on all the image points in parallel.

**Keywords:**

- Thinning Operators, Contour, Contour Tracking, Coloration, Parallel Algorithms.

**Availability:**Paper copy available.**Citation:**Published in Proceedings 4th Conference on Discrete Geometry in Computer Imagery, Grenoble, France, 1994.**Size:**2+17p**Format:**Compressed PostScript- Get it

### Fuzzy Array Dataflow Analysis.

**By:**Paul Feautrier , Jean-Francois Collard**Number:**RR1994-21**Date:**July 29, 1994**Abstract:**

- Exact array dataflow analysis can be achieved in the general case if the only control structures are the sequence, the DO-loop and restricted IFs, and if loop counter bounds and array subscripts are affine expressions of surrounding loop counters and possibly some integer constants. In this paper, we begin the study of dataflow analysis of dynamic control programs, where arbitrary IFs and WHILEs are allowed. In the general case, this dataflow analysis can only be approximate, hence the name Fuzzy Array Dataflow Analysis.

**Keywords:**

- Array Dataflow Analysis, Dynamic Control Programs, Automatic Parallelization.

**Availability:**Paper copy available.**Citation:**Submitted for publication.**Size:**2+31p**Format:**Compressed PostScript- Get it

### Parallel functional programming: a pragmatic approach.

**By:**Gaetan Hains**Number:**RR1994-22**Date:**September 15, 1994**Abstract:**

- We introduce DPML, an intermediate-level portable language for massively parallel programming designed as an extension of Mini-ML. Its parallel execution mode generalises data-parallelism and features explicit localisations and communications. Unlike imperative parallel languages with explicit communications, DPML is deterministic. A DPML program is seen as a static vector of ML programs communicating through remote evaluation and a global protocol. The language's implementation reuses that of Caml and does not require a distributed GC.

**Keywords:**

- Parallel Programming, Strict Functional Language, Portable Implementation, Deterministic Semantics, Massively Parallel Architectures, Distributed Memory.

**Availability:**Electronic copy only.**Citation:**Submitted for publication.**Size:**2+24p**Format:**Compressed PostScript- Get it

### Length and Number of Buses for Gossiping in Networks.

**By:**Pierre Fraigniaud , Christian Laforest**Number:**RR1994-23**Date:**September 21, 1994**Abstract:**

- Gossiping is an information dissemination problem in which each node of a communication network has a unique piece of information that must be transmitted to all the other nodes. A bus network is a network of processing elements that communicate by sending messages along buses in a sequence of calls. We assume that (i) each node can participate to at most one call at a time, (ii) a node can either read or write on a bus, (iii) no more than one node can write on a given bus at a given time, and (iv) communicating a message on a bus takes a unit of time. This model extends the telegraph model in allowing the number of nodes connected to each bus to be as large as needed, instead on being bounded by 2. In this paper, we are interested in minimizing the ``hardware'' of a bus network in keeping optimal the communication performances for solving the gossiping problem. More precisely, we compute the minimum number of buses required for a gossiping to be optimal. Similarly, we give upper bounds on the minimum length of buses required for a gossiping to be optimal. Finally, we combine the two approaches in trying to minimize both parameters: length and number of buses.

**Keywords:**

- Hypergraph, Bus, Network, Gossiping, Communication.

**Availability:**Paper copy available.**Size:**2+16p**Format:**Compressed PostScript- Get it

### Automatic parallelization based on multi-dimensional scheduling.

**By:**Alain Darte , Frederic Vivien**Number:**RR1994-24**Date:**September 21, 1994**Abstract:**

- In the scope of uniform recurrence equations, we study an algorithm first proposed by Karp, Miller and Winograd for detecting cycles of null weight in cyclic graphs. We show how this algorithm can be used for generating multi-dimensional schedules that express all the potential parallelism contained in a computable system of uniform recurrence equations. We then apply this technique to imperative programs based on loop nests and whose dependences are described in the framework (Z, +, -, *), the most popular way of describing dependences. We are able to show that our technique is an optimal parallelization technique in the sense that no more parallelism can be detected than provided by our new algorithm.

**Keywords:**

- Automatic Parallelization, Multi-dimensional Scheduling, Loop Nest, Systems Of Recurrence Equations, Dependence Analysis.

**Availability:**Electronic copy only.**Citation:**Submitted for publication.**Size:**2+34p**Format:**Compressed PostScript- Get it

### Implementing On-Line Arithmetic on PAM.

**By:**Marc Daumas , Jean-Michel Muller , Jean Vuillemin**Number:**RR1994-25**Date:**September 26, 1994**Abstract:**

- On-line arithmetic is a computation tool able to adapt to the precision expected by the user. Developing a library of on line operators for FPGAs will lead in a near future to the spread of brick-assembled application-dedicated operators. In the implementation of the basic arithmetic operations (addition, multiplication, division and square root), we have met some new problems : our work has involved changes in the VLSI design methodology in order to achieve some effective performances. We shall present the modified on-line algorithms and their adaptation to the cell oriented FPGA architecture. The correct integration of some retiming barriers has proved to be critical as far as speed is concerned.

**Keywords:**

- Computer Arithmetic, On-Line Arithmetic, Field Programmable Logic.

**Availability:**Paper copy available.**Citation:**Published in proceedings 4th Field Programmable Logic Workshop, Prage, Czech Republic, 1994.**Size:**2+13p**Format:**Compressed PostScript- Get it

### Explicit Substitutions for the Lambda-Mu-Calculus.

**By:**Philippe Audebaud**Number:**RR1994-26**Date:**September 08, 1994**Abstract:**

- We present a confluent rewriting system wich extends a previous calculus for the Lambda-Calculus (Levy, Hardin) to Parigot's untyped Lambda-Mu-Calculus. This extension embeds the Lambda- Mu-Calculus as a sub-theory, and provides the basis for a theoretical framework to study the abstract properties of implementations of functional programming languages enriched with control structures. This study gives also some interesting feedback on Lambda-Mu-Calculus on both the syntactical and semantics levels.

**Keywords:**

- Rewriting Systems, Lambda-Mu Calculus.

**Availability:**Electronic copy only.**Size:**2+14p**Format:**Compressed PostScript- Get it

### An evolutionary algorithm for the learning of neural networks with real parameters.

**By:**Cedric Gegout**Number:**RR1994-27**Date:**October 10, 1994**Abstract:**

- This paper presents an evolutionary method that provides the best neural network with real parameters approximating a given real continuous function (the mean square error of the network ouput averaged over all training patterns is minimised). Our evolutionary algorithm adjusts the weights of a fixed architecture network so that the solution obtained is a good initialization net for another training algorithm. The evolutionary algorithm has been proved to be efficient for reducing the convergence time of a conjugate gradient descent algorithm and for increasing the global convergence rate. Simulation runs show that our learning method is efficient and robust. The parallel implementation of the method divides the population into subpopulations distributed on the multiple processor architecture. By distributing the individuals we increase the population diversity and improve the performances of the evolution.

**Keywords:**

- Genetic Algorithms, Evolutionary Algorithms, Feedforward Neural Networks, Multilayer Perceptron Training, Geometrical Parametrization.

**Availability:**Electronic copy only.**Citation:**Published in Proc. Evolution Artificielle, Toulouse, 1994.**Size:**2+9p**Format:**Compressed PostScript- Get it

### Automatic Definition of Sub-Neural Networks.

**By:**Frederic Gruau**Number:**RR1994-28**Date:**October 14, 1994**Abstract:**

- This report illustrates an artificial developmental system that is a computationally efficient technique for the automatic generation of complex Artificial Neural Networks (ANN). Artificial developmental system can develop a graph grammar into a modular ANN made of a combination of more simple subnetworks. A genetic algorithm is used to evolve coded grammar that generates ANNs for controlling a six-legged robot locomotion. We support our argumentation with pictures describing the steps of development and illustrate how ANN structures are evolved.

**Keywords:**

- Animats, Cellular Encoding, Modularity, Locomotion, Automatic Defintion Of Sub-neural Networks.

**Availability:**Electronic copy only.**Citation:**Submitted to Artificial Behavior,Toulouse, 1994.**Size:**2+33p**Format:**Compressed PostScript- Get it

### Precise Tiling for Uniform Loop Nests.

**By:**Pierre-Yves Calland , Tanguy Risset**Number:**RR1994-29**Date:**September, 1994**Abstract:**

- The subject of this report is a hyperplan partitionning problem applied to perfect loop nests. This work aims to increase of computation granularity to reduce the overhead due to communication time. This study is different from previous works as it takes redundant communications into account. We propose an algorithm giving the optimal solution and various examples to show the soundness of this report.

**Keywords:**

- Loop Nest, Uniform Dependence Algorithm, Partitionning, Tiling, Convex Cone.

**Availability:**Electronic copy only.**Citation:**To appear in proceeding ASAP'95, Juillet 1995, Strasbourg.**Size:**2+18p**Format:**Compressed PostScript- Get it

### Mapping Affine Loop Nests: New Results.

**By:**Michele Dion , Yves Robert**Number:**RR1994-30**Date:**November 1994**Abstract:**

- This paper deals with the problem of aligning data and computations when mapping affine loop nests onto Distributed Memory Parallel Computers (DMPCs). We formulate the problem by introducing a new graph, the access graph, to model affine communications (with rectangular access matrices) more adequately than with the previously introduced tool, the communication graph. We show that maximizing the number of local communications in the access graph is a NP-complete problem in the strong sense and we present several heuristics based upon the access graph for mapping affine loop nests onto DMPCs.

**Keywords:**

- Distributed Memory Parallel Computers, Affine Loop Nests, Communications, Access Graph, Alignment, Mapping, Processor Grid.

**Availability:**Electronic copy only.**Citation:**To appear in the proceedings of HPCN EUROPE 95, 1995.**Size:**2+24p**Format:**Compressed PostScript- Get it

### Compilation of data-parallel program for a network of workstations.

**By:**Gil Utard , Guy Cuvillier**Number:**RR1994-31**Date:**November 15, 1994**Abstract:**

- We describe the compilation and execution of data-parallel languages for networks of workstations. Executions of distributed programs on cluster of workstations must take into account the loads changing dynamically due to other users. We show that data-parallel programming model is a good candidate to include dynamics load balancing features. We present an implementation for a network of workstations with the data-parallel language C* and PVM.

**Keywords:**

- Data-Parallelism, Compilation, Network of Workstations, Dynamic Load Balancing.

**Availability:**Paper copy available.**Size:**2+24p**Format:**Compressed PostScript- Get it

### How to make good use of multilayer neural networks.

**By:**Helene Paugam-Moisy**Number:**RR1994-32**Date:**November 1994**Abstract:**

- This report is both a kind of survey of recent advances about multilayer neural networks and a presentation of several results of the "Connexionisme" research group, at LIP. Several theorems are cited, which present one-hidden-layer neural networks as universal approximators. Next section points out that two hidden layers are often required for exact realizing d-dimensional dichotomies. Defining the frontier between one-hidden-layer and two-hidden-layer neural networks is still an open problem, in spite of steps forward in that way. After presenting several bounds concerning the size of a multilayer network which learns from examples, we enhanced the fact that, even if all can be done with only one hidden layer, more often, things can be done better with two or more hidden layers. Finally, this assertion is supported by the behavior of multilayer neural networks in two radically different applications.

**Keywords:**

- Multilayer Neural Networks, Hidden Layers, Complexity.

**Availability:**Paper copy available.**Citation:**Published in Journal of Biological Systems, vol.3, n.4, (1995), p.1177-1191**Size:**2+12p

### A scalable design for VLSI dictionary machines.

**By:**Thibault Duboux , Afonso Ferreira , Michel Gastaldo**Number:**RR1994-33**Date:**December 02, 1994**Abstract:**

- Most of the proposed VLSI dictionary machines appearing in the literature were designed to fit in one chip only. If the number of acquired elements is larger than that of VLSI cells, another chip has to be designed and manufactured to take a larger dictionary into account. In this paper, we propose a new design for dictionary machines that assembles blocks of standard existing dictionary machines. Our machine is as efficient as the best machines described in the literature, with the enormous advantage of scaling up quite easily, with no degradation of its performance, by simply adding more and more standard blocks.

**Keywords:**

- Dictionary Machine, VLSI, Load Balancing, Parallel Data Structures.

**Availability:**Electronic copy only.**Citation:**Published in proceeding 3rd International Workshop on Algorithms and Parallel VLSI Architectures, Leuven, Belgium, 1994, Elsevier Science.**Size:**2+18p**Format:**Compressed PostScript- Get it

### A method for static scheduling of dynamic control programs (preliminary version).

**By:**Jean-Francois Collard , Paul Feautrier**Number:**RR1994-34**Date:**December 21, 1994**Abstract:**

- Static scheduling consists in compile-time mapping of operations to logical execution dates. However, scheduling so far only applies to static control programs, i.e. roughly to nests of do (or for) loops. To extend scheduling to dynamic control programs, one needs a method that 1) is consistent with unpredictable control flows (and thus unpredictable iteration domains) 2) is consistent with unpredictable data flows, and 3) permits speculative execution. This report describes a means to achieve these goals.

**Keywords:**

- Automatic Parallelization, Dynamic Control Program, While Loop, Scheduling, Speculative Execution.

**Availability:**Electronic copy only.**Citation:**Not published yet.**Size:**2+28p**Format:**Compressed PostScript- Get it

### Neural Network Parallelization on a Ring of Processors: Training Set Partition and Load Sharing.

**By:**Bernard Girau**Number:**RR1994-35**Date:**December 05, 1994**Abstract:**

- A parallel back-propagation algorithm that partitions the training set has been introduced in the thesis of H. Paugam-Moisy. Its performance on MIMD machines is studied, and a new version is developped. It is based on a heterogeneous load sharing method. Algebraic models allow precise comparisons and show great improvements if learning steps are processed.

**Keywords:**

- Neural Networks, Parallel Algorithms, Load Sharing, MIMD Machines, Performance Models.

**Availability:**Electronic copy only.**Citation:**Published in proceeding International parallel processing symposium Santa-Barbara, 1995.**Size:**2+35p**Format:**Compressed PostScript- Get it

### Synthesizing proofs of programs in the Calculus of Inductive Constructions.

**By:**Catherine Parent**Number:**RR1994-36**Date:**November 30, 1994**Abstract:**

- In type theory, a proof can be represented as a typed lambda-term. There exist methods to mark logical parts in proofs and extract their algorithmic contents. The result is a correct program with respect to a specification. This paper focuses on the inverse problem : how to generate a proof from its specification. The framework is the Calculus of Inductive Constructions. A notion of coherence is introduced between a specification and a program containing types but no logical proofs. This notion is based on the definition of an extraction function called the weak extraction. Such a program can give a method to reconstruct a set of logical properties needed to have a proof of the initial specification. This can be seen either as a method of proving programs or as a method of synthetically describing proofs.

**Keywords:**

- Program Proving, Extraction, Calculus of Constructions, Lambda-Calculus.

**Availability:**Electronic copy only.**Citation:**Published in proceeding MPC'95 (Mathematics for programs constructions), 1995, Kloster-Irsee Allemagne in LNCS.**Size:**2+19p**Format:**Compressed PostScript- Get it

### Parallelisation d'une nouvelle methode de recherche de valeurs propres pour des matrices reelles symetriques.

**By:**Makan Pourzandi , Francoise Tisseur**Number:**RR1994-37**Date:**September 1994**Abstract:**

- In this report, we propose research results on the parallel implementation of a new algorithm for finding all eigenvalues and eigenvectors of e real symmetric matrix. We first present our target parallel computer. We review in details the mathematical therory of this new eigensolver and explain the sequential algorithm. Afterwards, the parallel implementation of this algorithm is described. Finally, we discuss our experimental results and conclude by presenting our future works on this new parallel eigensolver.

**Keywords:**

- Eigenvalue Problem, Symmetric Matrices, FFT, Chebyshev Polynomial, Matrix Multiplications, Intel Paragon.

**Availability:**Paper copy available.**Size:**2+54p

### Perfect matchings in the triangular lattice.

**By:**Claire Kenyon , Eric Remila**Number:**RR1994-38**Date:**December 1994**Abstract:**

- Let V be a finite subset of vertices of the infinite planar triangular lattice T, and G denote the subgraph induced by V. We assume that G is simply connected (i. e. G and the subgraph induced by T-V are both connected). Firstly we prove that if the vertex-connectivity of G is at least 2, and V is even, then G has a perfect matching. Secondly, we devise a linear-time algorithm for finding a perfect matching of G when it exists. Thirdly, we prove that any perfect matching of G can be transformed into any other perfect matching of G by a linear number of local transformations involving at most 4 edges.

**Keywords:**

- Tiling, Matching.

**Availability:**Electronic copy only.**Citation:**To appear in Discrete Mathematics.**Size:**2+23p

### Recognitions of figures with two-dimensional cellular automata.

**By:**Laure Tougne**Number:**RR1994-39**Date:**January 31, 1995**Abstract:**

- We consider a family of figures of the discrete plane (rectangles, squares, ellipses,...); how shall we decide with a 2D cellular automaton whether a given figure belongs to the family or not? We essentially give three kinds of results. First, we look for a parallel way of defining the family. For example, a rectangle is a connected figure without hole such that all cells that are in the border have exactly two neighbors in the border. We show that this definition is equivalent to the classical one and give a cellular automaton which recognizes the rectangles' family in an optimal time. Secondly, the figures are defined with the help of an algorithm which can be easily parallelized. A natural and meaningful example is the ellipses' recognition. An ellipse is a figure with two distinguished cells (the focuses); it is composed of all the cells the sum of distances to the two focus of which is less then a constant k. In this case, the algorithm of recognition is the following one: a signal, generated by one of the focuses, spreads with an optimal speed in all the possible directions, and it is reflected back by the border of the pattern. The figure is an ellipse if and only if all these signals are resorbed on the other focus. Finally, we present an other algorithm inspired by a synchronization algorithm due to F. Grasselli. in order to recognize the squares' family.

**Keywords:**

- 2D Cellular Automata, Recognition, Family of Figures, Synchronization.

**Availability:**Electronic copy only.**Size:**2+85pp**Format:**Compressed PostScript- Get it

### Simulations Between Cellular Automata on Cayley Graphs.

**By:**Zsuzsanna Roka**Number:**RR1994-40**Date:**December 1994**Abstract:**

- We consider cellular automata on Cayley graphs and compare their computational powers according to the architecture on which they work. We show that, if there exists a homomorphism with a finite kernel from a group into another one such that the image of the first group has a finite index in the second one, then every cellular automaton on the Cayley graph of one of these groups can be uniformally simulated by a cellular automaton on the Cayley graph of the other one. This simulation can be constructed in a linear time. With the help of this result we also show that cellular automata working on any Archimedean tiling can be simulated by a cellular automaton on the grid of Z^2 and conversely.

**Keywords:**

- Cellular Automata, Cayley Graphs, Simulations.

**Availability:**Electronic copy only.**Size:**2+29p**Format:**Compressed PostScript- Get it

### Using PVM for the Selection of Wavelet Neural Network Parameters.

**By:**Richard Baron , Helene Paugam-Moisy**Number:**RR1994-41**Date:**December 1994**Abstract:**

- Neural networks may be good approximators. But their performances depend on several parameters. Testing these parameters is a time-consuming task. In order to deal with the choice of a good set of parameters, the use of parallel computation is experimented. An architecture is proposed, based on PVM system. The load balancing problem is addressed. The method is used to determine efficient parameters for a wavelet neural network. This neural network is experimented as a function approximator. Results are given, in term of approximation, and in terms of speed-up.

**Keywords:**

- Artificial Neural Networks, Wavelet Neural Networks, PVM.

**Availability:**Paper copy available.**Citation:**Presented at the First European PVM Users Group Meeting, October 1994.**Size:**2+7p**Format:**Compressed PostScript- Get it

### On the completeness of a proof system for a simple data-parallel programming language.

**By:**Luc Bouge , David Cachera**Number:**RR1994-42**Date:**December 19, 1994**Abstract:**

- We proved the completeness of an assertional proof system for a simple loop-free data-parallel language. This proof system is based on two-part assertions, where the predicate on the current value of variables is separated from the specification of the current extent of parallelism (activity context). The proof is based on a Weakest Precondition (WP) calculus. In contrast with the case of usual scalar, Pascal-like languages, not all WP can be defined by an assertion. Yet, partial definability suffices to prove the completeness thanks to the introduction of hidden variables in assertions. The case of data-parallel programs with loops seems to be much more difficult, as briefly discussed in the conclusion.

**Keywords:**

- Concurrent Programming, Specifying and Verifying and Reasoning about Programs, Semantics of Programming Languages, Data-Parallel Languages, Proof System, Hoare Logic, Weakest Preconditions.

**Availability:**Electronic copy only.**Citation:**Published in proceeding EuroPar'95 Stockholm Aout 95.**Size:**2+17p**Format:**Compressed PostScript- Get it

### A Categorical Model of Array Domains.

**By:**Gaetan Hains , John Mullins**Number:**RR1994-43**Date:**December 27, 1994**Abstract:**

- We apply the theory of generalised concrete data structures (or gCDSs) to construct a cartesian closed category of concrete array structures with explicit data layout. The technical novelty is the array gCDS preserved by exponentiation whose isomorphisms relate higher-order objects to their local parts. This work is part of our search of semantic foundations for data-parallel functional programming.

**Keywords:**

- Parallel Programming, Functional Languages, Distributed Arrays, Denotational Semantics, Concrete Data Structures.

**Availability:**Electronic copy only.**Size:**2+9p**Format:**Compressed PostScript- Get it

### A Random NP-complete problem for inversion of 2D Cellular Automata.

**By:**Bruno Durand**Number:**RR1994-44**Date:**December, 1994**Abstract:**

- In this paper, we prove the co-RNP-completeness (RNP = Random NP) of the following decision problem: ``Given a 2-dimensional cellular automaton A, is A reversible when restricted to finite configurations extending a given row?" In order to prove this result, we introduce a polynomial reduction from problems concerning finite tilings into problems concerning cellular automata. Then we add to tile sets and cellular automata two probability functions and we prove that these problems are not only co-NP-complete, but co-RNP-complete too.

**Keywords:**

- Computational Complexity, Cellular Automata, Cryptography.

**Availability:**Electronic copy only.**Citation:**Published in proceedings STACS'95, LNCS 900, Springer-Verlag. Complete version to appear in TCS.**Size:**2+13p**Format:**Compressed PostScript- Get it

### Growing Patterns in 1D Cellular Automata.

**By:**Bruno Durand , Jacques Mazoyer**Number:**RR1994-45**Date:**January 1995**Abstract:**

- We study limit evolutions of cellular automata (CA) both theoretically and experimentally. We show that either all orbits enter the limit set after less than a finitely bounded number of states, or almost all orbits never enter the limit set. We link this result with a classification of CAs according to their limit behavior due to Culik et al: in the first case, the considered CA belongs to class1 while in the second case, it belongs to class2. By experiments, we try to measure the convergence speed of orbits to their accumulation points. We compute the maximum number of nested growing segments in a space-time diagram representing a finite portion of an orbit. We observe that, in the average case, this criterion depends only on the CA and not on the configuration. We also observe two kinds of CA wrt our criterion which correspond to the intuitive notions of chaos and regularity. We do further experiments to explain the fact that the proportion of chaotic diagrams grows with the number of states of the considered CA.

**Keywords:**

- Cellular Automata, Complex Systems, Dynamics.

**Availability:**Electronic copy only.**Citation:**Proceeding FCT'95 Dresde Allemagne Aout 95 dans LNCS Springer-Verlag and in Complexe Systems.**Size:**2+17p**Format:**Compressed PostScript- Get it

### Reconnaissance de langages sur automates cellulaires.

**By:**Marianne Delorme , Jacques Mazoyer**Number:**RR1994-46**Date:**December 1994**Abstract:**

- We are interested in the power of parallel devices cellular automata are, in complexity classes they determine. With the intention of starting research on parallel recognition of languages on cellular automata of dimension highter than or equal to 2, we sum up the situation for dimension 1, in parallel and sequential mode, for bounded and unbounded computations in case of two- and one-way one dimensional automata. We set out some typical examples, and present the different methods used in the field through proofs of some main results.

**Keywords:**

- Cellular Automata, Language Recognition, Complexity Classes.

**Availability:**Paper copy available.**Citation:**Published in proceedings IV Conference on Semigroups and Automata, Porto, 1995.**Size:**2+59p**Format:**Compressed PostScript- Get it

### Towards a dynamic parallel database machine: data balancing techniques and pipeline.

**By:**Thibault Duboux , Afonso Ferreira**Number:**RR1994-47**Date:**December 21, 1994**Abstract:**

- The fast development over the last years of high performance multicomputers makes them attractive candidates as the base technology for scalable and performance oriented database applications. In this paper, we address the problem of how to process utility commands while the system remains operational and the data remain available for concurrent access. In particular, we focus on the on-line reorganization of a dictionary, a database reduced to its simplest instance, showing its implementation on a multicomputer. As is the case with implementations of dynamic structures on distributed memory architectures, a crucial load balancing problem has to be solved. We propose an elegant solution and prove that it solves this problem. Experimental results are shown and analyzed.

**Keywords:**

- Dictionary Machine, Parallel Data Structures, Parallel Databases, Load Balancing.

**Availability:**Paper copy available.**Size:**2+18p**Format:**Compressed PostScript- Get it

### On the minimal delay of on-line computations.

**By:**Alexandre Scherbyna**Number:**RR1994-48**Date:**December 21, 1994**Abstract:**

- This paper deals with the minimal on-line delay that can be attained when evaluating a function in on-line arithmetic. First, a generalization of the concept of on-line delay is proposed. It is shown that decomposing a function into sequence of simpler calculations always gives an on-line delay that is greater than the minimal one.

**Keywords:**

- On-Line Arithmetic, On-Line Delay, Signed-Digit Number Systems.

**Availability:**Paper copy available.**Size:**2+12p**Format:**Compressed PostScript- Get it

### Synchronization of a line of finite automata with nonuniform delays.

**By:**Jacques Mazoyer**Number:**RR1994-49**Date:**January 1995**Abstract:**

- We study the Firing Squad Synchronization Problem with non uniform delays in the case of a line of cells. The problem was solved in the general case by T.~Jiang in time Delta^3. In the case of the line, we improve his result, obtaining the Delta^2. We observe that there does not exist an optimal solution. We also note that the strategy, used here, is the general strategy (Waksman's one) and thus, even in this case, we can break the line in its middle.

**Keywords:**

- Cellular Automata, Synchronization, Optimality.

**Availability:**Paper copy available.**Size:**2+24p**Format:**Compressed PostScript- Get it

### Signals in one dimensional cellular automata.

**By:**Jacques Mazoyer**Number:**RR1994-50**Date:**January 1995**Abstract:**

- In this paper, we are interested in signals, form whereby the data can be transmitted in a cellular automaton. We study generation of some signals. In this aim, we investigate a notion of constructibility of increasing functions related to the production of words on the initial cell (in the sense of Fischer for the prime numbers). We establish some closure properties on this class of functions. We also exhibit some impossible moves of data.

**Keywords:**

- Cellular Automata, Computability, Moves of Information.

**Availability:**Paper copy available.**Size:**2+33p**Format:**Compressed PostScript- Get it

### Synchronization of two interacting finite automata.

**By:**Jacques Mazoyer**Number:**RR1994-51**Date:**January 1995**Abstract:**

- The Firing Squad Synchronization Problem with non uniform delays was recently solved by T. Jiang. In this paper, we study a very particular case: the synchronization of two finite automata. Two finite automata communicates, but the delay of communication is great in comparison with the duration of a state transition. We show that there exist solutions in quasi-quadratic time. This implies that there does not exist an optimal solution (in the sense of K. Kobayashi) to the Firing Squad with non uniform delays. We show also that if the communication between the two automata are not symmetric the synchronization cannot be obtained.

**Keywords:**

- Cellular Automata, Synchronization, Optimality.

**Availability:**Electronic copy only.**Citation:**To appear in Proceedings IJAC : International journal of algebra and computation.**Size:**2+25p**Format:**Compressed PostScript- Get it

### Computations on one dimensional cellular automata.

**By:**Jacques Mazoyer**Number:**RR1994-52**Date:**January 1995**Abstract:**

- We study a model of massively parallel computations: a line of automata. In our definition, computations are done on grids. Then twisting and moving these grids allow us to achieve complexe computaions. We show that any recursive function can be computed by this way without any synchronization.

**Keywords:**

- Cellular Automata, Computability.

**Availability:**Electronic copy only.**Citation:**To appear in Mathematic and Computer Science some interactions, Balser publication.**Size:**2+28p**Format:**Compressed PostScript- Get it