2005 research reports
Simultaneous Scheduling of Replication and Computation for Data-Intensive Applications on the Grid.
- By: Frédéric Desprez, Antoine Vernois
- Number: RR2005-01
- Date: January 2005
- Abstract:
- One of the first motivations of using grids comes from
applications managing large data sets like for example in High Energy Physic or Life Sciences.
To improve the global throughput of software environments, replicas are usually put at wisely selected sites.
Moreover, computation requests have to be scheduled among the available resources. To get the best performance,
scheduling and data replication have to be tightly coupled which is not always the case in existing approaches.
This paper presents an algorithm that combines data management and scheduling at the same time using a steady-state approach.
Our theoretical results are validated using simulation and logs from a large life science application (ACI GRID GriPPS).
- Keywords:
- scheduling, replication, data management, grid computing.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+17p
- Format: pdf
- Get it
- Revisions: none
A study of various load information exchange mechanisms for a distributed application using dynamic scheduling.
- By: Abdou Guermouche, Jean-Yves L'Excellent
- Number: RR2005-02
- Date: January 2005
- Abstract:
- We consider a distributed asynchronous system where processes
can only communicate by message passing and need a coherent view of the load (e.g., workload, memory)
of others to take dynamic decisions (scheduling). We present several mechanisms to obtain a distributed
view of such information, based either on maintaining that view or demand-driven with a snapshot algorithm.
We perform an experimental study in the context of a real application, an asynchronous parallel solver
for large sparse systems of linear equations.
- Keywords:
- Snapshot, distributed system, dynamic scheduling, load balancing, message passing.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+14p
- Format: pdf
- Get it
- Revisions: none
Computing the rank and a small nullspace basis of a
polynomial matrix.
- By: A. Storjohann (U. Waterloo), Gilles Villard
- Number: RR2005-03
- Date: January 2005
- Abstract:
- We reduce the problem of computing the rank and a nullspace basis of
a univariate polynomial matrix to polynomial matrix multiplication.
For an input n x n matrix of degree d over a field K we give a
rank and nullspace algorithm using about the same number of operations
as for multiplying two matrices of dimension n and degree d.
If the latter multiplication is done in MM(n,d)=softO((n^omega)d)
operations, with omega the exponent of matrix multiplication over K,
then the algorithm uses softO(MM(n,d)) operations in K.
For m x n matrices of rank r and degree d, the cost expression is
softO(nmr^(omega -2)d). The softO notation indicates some missing
logarithmic factors. The method is randomized with Las Vegas
certification. We achieve our results in part through a combination
of matrix Hensel high-order lifting and matrix minimal fraction
reconstruction, and through the computation of minimal or small degree
vectors in the nullspace seen as a K[x]-module.
- Keywords:
- Linear algebra, polynomial matrix, matrix multiplication, nullspace
basis, minimal polynomial basis, randomized algorithm.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+24p
- Format: pdf
- Get it
- Revisions: none
Fully asynchronous behavior of double-quiescent elementary cellular automata.
- By: Nazim Fatès , Michel Morvan , Nicolas Schabanel , Éric Thierry
- Number: RR2005-04
- Date: February 2005
- Abstract:
- In this paper we propose a probabilistic analysis of the fully
asynchronous behavior (i.e., two cells are never simultaneously updated, as in a continuous time
process) of elementary finite cellular automata (i.e.,\0,1\ states, radius 1 and unidimensional)
for which both states are quiescent (i.e., (0,0,0) \mapsto 0 and (1,1,1) \mapsto 1). It has been
experimentally shown in previous works that introducing asynchronism in the global function of a cellular
automata was perturbing its behavior, but as far as we know, only few theoretical work exists on the subject.
The cellular automata we consider live on a ring of size n and asynchronism is introduced as follows: at
each time step one cell is selected uniformly at random and the transition is made on this cell while the
others stay in the same state. Among the sixty-four cellular automata belonging to the class we consider,
we show that nine of them diverge almost surely on all non-trivial configurations while the fifty-five other
converge almost surely to a random fixed point. We show that the exact convergence time of these fifty-five
automata can only take the following values: either 0, \Theta(n \ln n), \Theta(n^2), \Theta(n^3),
or \Theta(n2^n). Furthermore, the global behavior of each of these cellular automata is fully determined
by reading its code.
- Keywords:
- cellular automata, discrete dynamical systems, convergence, stochastic process.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+20p
- Format: pdf
- Get it
- Revisions: none
Multiplication Algorithms for Radix-$2$ RN-Codings and Two's Complement Numbers.
- By: Jean-Luc Beuchat, Jean-Michel Muller
- Number: RR2005-05
- Date: February 2005
- Abstract:
- The RN-codings are particular cases of signed-digit representations,
for which rounding to the nearest is always identical to truncation. In radix $2$, Booth recoding is
an RN-coding. In this paper, we suggest several multiplication algorithms able to handle RN-codings,
and we analyze their properties.
- Keywords:
- Computer arithmetic, RN-codings, (modified) Booth recoding, rounding to the nearest.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+11p
- Format: pdf
- Get it
- Revisions: none
GoDIET: A tool for managing distributed hierarchies of DIET agents and servers.
- By: Eddy Caron, Holly Dail
- Number: RR2005-06
- Date: February 2005
- Abstract:
- The Distributed Interactive Engineering Toolbox (DIET) is an
Application Service Provider (ASP) platform providing remote
execution of computational problems on distributed, heterogeneous
resources. Traditional ASP toolkits are based on a single,
centralized scheduling agent coordinating computation requests
from clients with service offerings from servers. DIET is based
on a hierarchy of agents that collaborate to perform scheduling
decisions; we are exploring the benefits of hierarchical agent
architectures for scalability and adaptation to heterogeneous
network performance. This paper describes GoDIET, a new tool for
the configuration, launch, and management of DIET on computational
grids. Users of GoDIET write an XML file describing their
available compute and storage resources and the desired overlay of
DIET agents and servers onto those resources. GoDIET automatically
generates and stages all necessary configuration files, launches
agents and servers in appropriate hierarchical order, reports
feedback on the status of running components, and allows shutdown
of all launched software. We present a series of experiments that
permit the evaluation of the performance of GoDIET for several
launch and management approaches. We also evaluate the robustness
of the DIET platform for a large number of servers.
- Keywords:
- Deployment, PSE, Network enabled server, Grid computing.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+13p
- Format: pdf
- Get it
- Revisions: none
Experiences with hierarchical request flow management
for Network-Enabled Server Environments.
- By: Holly Dail, Frédéric Desprez
- Number: RR2005-07
- Date: February 2005
- Abstract:
- DIET (Distributed Interactive Engineering Toolbox) is a toolbox
for the construction of Network Enabled Server systems. Client
requests for computational services are automatically matched
with available resources by a distributed hierarchy of
scheduling agents. Traditionally NES systems have used a
centralized agent for performance prediction and resource
information management; in DIET, these services are distributed
by providing them at the DIET server level.
DIET has traditionally offered an online scheduling model
whereby all requests are scheduled immediately or refused. This
approach can overload interactive servers in high-load
conditions and does not allow adaptation of the schedule to task
or data dependencies.
In this article we consider an alternative
model based on active management of the flow of requests
throughout the system. We have added support for (1) limiting
the number of concurrent requests on interactive servers, (2)
server and agent-level queues, and (3) window-based scheduling
algorithms whereby the request release rate to servers can be
controlled and some re-arrangement of request to host mappings
is possible. We present experiments demonstrating that these
approaches can improve performance and that the overheads
introduced are not significantly different from those of the
standard DIET approach.
- Keywords:
- Scheduling, Grid, Window, Distributed, Hierarchical.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+18p
- Format: pdf
- Get it
- Revisions: none
Small FPGA polynomial approximations with 3-bit coefficients
and low-precision estimations of the powers of x.
- By: Romain Michard, Arnaud Tisserand, Nicolas Veyrat-Charvillon
- Number: RR2005-08
- Date: February 2005
- Abstract:
- This paper presents small FPGA implementations of low
precision polynomial approximations of functions without multipliers.
Our method uses degree-2 or degree-3 polynomial approximations
with at most 3-bit coefficients and low precision estimations of the
powers of x. Here, we denote by 3-bit coefficients values with at
most 3 non-zero and possibly non-contiguous signed bits
(e.g. 1.001000-1). This leads to very small operators by
replacing the costly multipliers by a small number of additions. Our
method provides approximations with very low average error and is
suitable for signal processing applications.
- Keywords:
- computer arithmetic, hardware arithmetic operator,
polynomial evaluation, digital signal application.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+9p
- Format: Compressed postscript
- Get it
- Revisions: none
On the definition of ulp(x).
- By: Jean-Michel Muller
- Number: RR2005-09
- Date: February 2005
- Abstract:
- Function ulp (acronym for unit in the last place) is
frequently used for expressing errors in floating-point computations. We present
several previously suggested definitions of that function, and analyse some of their properties.
- Keywords:
- Computer arithmetic, floating-point arithmetic, unit in the last place, ULP.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+11p
- Format: pdf
- Get it
- Revisions: none
Optimizing Network Resource Sharing in Grids.
- By: Loris Marchal, Yves Robert, Pascale Vicat-Blanc Primet, Jingdi Zeng
- Number: RR2005-10
- Date: March 2005
- Abstract:
- While grid computing reaches further to geographically
separated clusters, data warehouses, and disks, it poses demanding requirements on end-to-end
performance guarantee. Its pre-defined destinations and service criteria ease the performance
control; however, expensive resources and equipments used by grid applications determine that
optimal resource sharing, especially at network access points, is critical. From the resource
reservation perspective, this article looks at communication resources shared by grid sites.
Two resource request scenarios have been identified, aiming at optimizing the request accept
rate and resource utilization. The optimization problems, proven NP-complete, are then solved
by heuristic algorithms. Simulation results, aside from showing satisfying results, illustrate
the pros and cons of each algorithm.
- Keywords:
- grid computing, communication resource, resource sharing, optimization.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+14p
- Format: pdf
- Get it
- Revisions: none
The language X: circuits, computations and Classical Logic.
- By: Steffen van Bakel, Stéphane Lengrand, Pierre Lescanne
- Number: RR2005-11
- Date: March 2005
- Abstract:
- X is an untyped language for describing
circuits by composition of basic components. This language is well
suited to describe structures which we call "circuits" and which
are made of parts that are connected by wires. Moreover X gives
an expressive platform on which algebraic objects and many different
(applicative) programming paradigms can be mapped. In this paper we
will present the syntax and reduction rules for X and some its
potential uses.
To demonstrate the expressive power of X, we will show how, even
in an untyped setting, elaborate calculi can be embedded, like the
naturals, the $`l$-calculus, Bloe and Rose's calculus of explicit
substitutions lambda-x, Parigot's lambda-mu and Curien and Herbelin's
lambda-mu-mu~.
- Keywords:
- Language design, mobility, circuits, classical logic,
Curry-Howard correspondance.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+25p
- Format: pdf
- Get it
- Revisions: none
Perfect Sampling for Fork-Join networks.
- By: Anne Bouillard, Bruno Gaujal
- Number: RR2005-12
- Date: March 2005
- Abstract:
- In this paper, we show how to design a perfect simulation for
Markovian fork-join networks, or equivalently, free-choice
Petri nets.
For pure fork-join networks and for event graphs, the simulation
time can be greatly reduced by using extremal initial states, namely
blocking states, although such nets do not exhibit any natural
monotonicity property.
Another approach for perfect simulation of pure fork-join networks
is based on
a (max,plus) representation of the system.
For that, we show how the theory of (max,plus) stochastic systems
can be used to provide perfect samplings.
Finally, experimental runs show that the (max,plus) approach
couples within fewer steps but needs a larger simulation time
than the Markovian approach.
- Keywords:
- Perfect simulation, Petri nets, fork and join.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+14p
- Format: pdf
- Get it
- Revisions: none
A study of meta-scheduling architectures for high throughput computing.
- By: Eddy Caron , Vincent Garonne , Andreï Tsaregorodtsev
- Number: RR2005-13
- Date: March 2005
- Abstract:
- In this paper we present a model and a simulator for
large-scale system. Such platforms are composed of heterogeneous clusters of PCs belonging
to a local network. These clusters are then connected to each other through a global network.
Moreover each cluster is managed via a local scheduler and is shared by many users.
We validate our simulator by comparing the experimental results and the analytical results
of a M/M/4 queuing system. These studies indicate that the simulator is consistent.
After that we do the comparison with a real batch system and we obtain a mean error of 10.5 \%
for the response time and 12 \% for the makespan. We conclude that our simulator is realistic
and describes well the behavior of a large-scale system. Thus we can study the scheduling of
our system called \dirac in a high throughput context. We justify our decentralized, adaptive
and opportunistic approach in comparison to a centralized approach in such a context.
- Keywords:
- Metascheduler, Grid computing.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+13p
- Format: Compressed postscript
- Get it
- Revisions: none
Extremal throughputs in free-choice nets.
- By: Anne Bouillard, Bruno Gaujal, Jean Mairesse
- Number: RR2005-14
- Date: April 2005
- Abstract:
- We give a method to compute the throughput
in a timed live and bounded free-choice Petri net under a total
allocation (i.e. a 0-1
routing). We also characterize and compute the conflict-solving policies that
achieve the smallest throughput in the special case of a 1-bounded
net. They do not correspond to total allocations, but still have a small period.
- Keywords:
- Free-choice Petri nets, timed and routed Petri nets, throughput.
- Availability: Electronic copy only.
- Citation: Short version without proofs: In
26th International Conference On Application and Theory of Petri Nets, LNCS, Springer-Verlag, 2005.
- Size: 2+20p
- Format: Compressed postscript
- Get it
- Revisions: none
Hardware/Software Interface for Multi-Dimensional Processor Arrays..
- By: Alain Darte, Steven Derrien, Tanguy Risset
- Number: RR2005-15
- Date: April 2005
- Abstract:
- On most recent systems on chip, the performance
bottleneck is the on-chip communication medium, bus or network. Multimedia applications
require a large communication bandwidth between the processor and graphic hardware accelerators,
hence an efficient communication scheme using burst mode is mandatory. In the context of data-flow
hardware accelerators, we approach this problem as a classical resource-constrained problem.
We explain how to use recent optimization techniques so as to define a conflict-free schedule
of input/output for multi-dimensional processor arrays (e.g., 2D grids). This schedule is
static and allows us to perform further optimizations such as grouping successive data in
packets to operate in burst mode. We also present an effective VHDL implementation on FPGA
and compare our approach to a run-time congestion resolution showing important gains in hardware area.
- Keywords:
- Hardware accelerators, interface, commmunications, processor arrays,
bus arbiter, resource constrained scheduling.
- Availability: Electronic copy only.
- Citation: See ASAP'05.
- Size: 2+16p
- Format: pdf
- Get it
- Revisions: none
Up-to Techniques for Weak Bisimulation.
- By: Damien Pous
- Number: RR2005-16
- Date: April 2005
- Abstract:
- Up-to techniques have been introduced to enhance the bisimulation
proof method for establishing bisimilarity results. While up-to
techniques for strong bisimilarity are well understood, in the weak
case they come as a collection of unrelated results, and lack a
unified presentation.
We propose a uniform and modular theory of up-to techniques for weak
bisimulation that captures existing proof technology and introduces
new techniques. Some proofs rely on non trivial -- and new --
commutation results based on termination guarantees.
The flexibility and usefulness of our framework is illustrated on two
examples, one of them coming from a recent study of a distributed
abstract machine.
- Keywords:
- behavioural equivalence,
bisimulation, termination, commutation diagrams.
- Availability: Electronic copy only.
- Citation: To appear in the proceedings of ICALP 2005.
- Size: 2+20p
- Format: pdf
- Get it
- Revisions: none
The Quantum Query Complexity of the Abelian Hidden Subgroup Problem.
- By: Pascal Koiran, Vincent Nesme, Natacha Portier
- Number: RR2005-17
- Date: April 2005
- Abstract:
- Simon in his FOCS'94 paper was the first to
show an exponential gap between classical and quantum computation.
The problem he dealt with is now part of a well-studied class of problems,
the hidden subgroup problems. We study Simon's problem from the point of view
of quantum query complexity and give here a first nontrivial lower bound on the
query complexity of a hidden subgroup problem, namely Simon's problem. Our bound
is optimal up to a constant factor. We also show how, as a consequence, this gives
us the query complexity of the Abelian hidden subgroup problem, up to a constant factor.
At last we expose some elementary facts about complexity in weaker query models.
- Keywords:
- quantum computation, query complexity,hidden subgroup, Simon's problem, lower bound.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+12p
- Format: Compressed postscript
- Get it
- Revisions: This article is a new version of this one
An Efficient Network API for in-Kernel Applications in Clusters.
- By: Brice Goglin, Olivier Glück, Pascale Vicat-Blanc Primet
- Number: RR2005-18
- Date: April 2005
- Abstract:
- Running parallel applications on clusters with high-speed local
networks requires fast communication between computing nodes but
also low latency and high bandwidth file access.
However, the application programming interfaces of high-speed local
networks were designed for MPI communication and do not always
meet the requirements of other applications like distributed file
systems.
In this paper, we explore several solutions to improve the use of
high-speed network for in-kernel applications.
Distributed file systems implemented on top of the GM interface of
Myrinet are first examined to demonstrate how hard it is to get an
efficient interaction between such applications and the network.
Then, we propose solutions to simplify and improve this interaction
and integrate them into the kernel interface of the new Myrinet
driver, MX.
Performance comparisons between MX and GM, and their usage in
both a distributed file system and a zero-copy protocol show nice
improvements.
Moreover, we are able to improve the performance of the flexible
kernel API we designed in MX that allows to remove some
intermediate copy.
- Keywords:
- High-speed local networks, in-kernel network API,
distributed file systems, zero-copy socket protocols.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 3+16p
- Format: pdf
- Get it
- Revisions: none
Structure of spaces of rhombus tilings in the lexicograhic case.
- By: Eric Rémila
- Number: RR2005-19
- Date: May 2005
- Abstract:
- We study a class of lexicographic rhombus tilings of zonotopes,
which are deduced from higher Bruhat orders relaxing the unitarity condition.
We prove that a space of such tilings is a graded poset with minimal and maximal element.
- Keywords:
- rhombus tiling, flip, connectivity.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+26p
- Format: Compressed postscript
- Get it
- Revisions: none
Component-oriented programming with sharing: containment is not ownership.
- By: Daniel Hirschkoff, Tom Hirschowitz, Damien Pous, Alan Schmitt, and Jean-Bernard Stefani
- Number: RR2005-20
- Date: May 2005
- Abstract:
- Component-oriented programming yields a tension between higher-order features (deployment, reconfiguration, passivation), encapsulation, and component sharing.
We propose a programming discipline for component-oriented programming to address this issue, and we define a process calculus whose operational semantics embodies this programming discipline.
We present several examples that illustrate how the calculus supports component sharing, while allowing strong encapsulation and higher-order primitives.
- Keywords:
- Component-oriented programming, higher-order process algebras, component sharing.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+12p
- Format: Compressed postscript
- Get it
- Revisions: none
Scheduling divisible loads with return messages on heterogeneous master-worker platforms.
- By: Olivier Beaumont, Loris Marchal, Yves Robert
- Number: RR2005-21
- Date: May 2005
- Abstract:
- In this paper, we consider the problem of scheduling independent tasks, or divisible loads,
onto an heterogeneous star platform, with both heterogeneous computing and communication resources.
We consider the case where the workers, after processing the tasks, send back some results to the master processor.
This corresponds to a more general framework than the one used in many divisible load papers, where only forward communications
are taken into account. To the best of our knowledge, this paper constitutes the first attempt to derive optimality results
under this general framework (forward and backward communications, heterogeneous processing and communication resources).
We prove that it is possible to derive the optimal solution both for LIFO and FIFO distribution schemes. Nevertheless,
the complexity of the general problem remains open: we also prove in the paper that the optimal distribution scheme may
be neither LIFO nor FIFO.
- Keywords:
- Scheduling, divisible load, master-worker platform, heterogeneous cluster..
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+19p
- Format: pdf
- Get it
- Revisions: none
A practical type system for generalized recursion.
- By: Tom Hirschowitz, Serguei Lenglet
- Number: RR2005-22
- Date: May 2005
- Abstract:
- The ML language is equipped with a sophisticated module system, especially thanks
to its notions of functor (higher-order functions on modules) and of controlled type abstraction (opaque or transparent types).
Nevertheless, an important weakness of this system hinders modularization: the impossibility to define mutually
recursive modules. In particular, mutually recursive functions must all reside in the same module.
Recently, Leroy extended the OCaml language, a dialect of ML, with an unsafe notion of recursive modules.
In this extension, one can define recursive modules, but the system does not check that they are well-founded.
If not, the system throws an exception at runtime, which is annoying given the strong typing of ML.
Powerful type systems have been proposed to tackle this issue, but they require rather deep modifications
to the ML type system. This report presents a system requiring only local modifications to the ML type system.
We prove its soundness by injection into one of the evoked more general formalisms.
- Keywords:
- Programming languages, semantics, typing, modularity, recursion.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+50p
- Format: Compressed postscript
- Get it
- Revisions: none
DIET: A Scalable Toolbox to Build Network Enabled Servers on the Grid.
- By: Eddy Caron , Frédéric Desprez
- Number: RR2005-23
- Date: June 2005
- Abstract:
- Among existing grid middleware
approaches, one simple, powerful, and flexible approach consists
of using servers available in different administrative domains
through the classical client-server or Remote Procedure Call
(RPC) paradigm. Network Enabled Servers implement this model
also called GridRPC. Clients submit computation requests to a
scheduler whose goal is to find a server available on the
grid. The aim of this paper is to give an overview of a
middleware developed in the GRAAL team called DIET
(http://graal.ens-lyon.fr/DIET.
DIET (Distributed Interactive Engineering Toolbox) is a hierarchical set of components used
for the development of applications based on computational servers on the grid.
- Keywords:
- Grid Computing, Network Enabled Servers, Client-Server Computing.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+20p
- Format: Portable Document Format
- Get it
- Revisions: none
Markov chain analysis of an agent-based growth model.
- By: Bruno Gaujal, Eric Thierry
- Number: RR2005-24
- Date: May 2005
- Abstract:
- In this paper we investigate the asymptotic behavior of a discrete
and probabilistic dynamical system which can be described as a growth model where autonomous agents aggregates.
The aim of this paper is to give a mathematical analysis of the dynamics. The analysis uses face homogeneous
Markov chains and thanks to this study we validate a conjecture set by Laszlo Gulyas concerning a growth model
for cities where simulations had shown that the sizes of the cities asymptotically distribute as a Zipf's law.
In light of our analysis, we discuss how the emergence of such a Zipf's law could be expected in Gulyas' model and in its variants.
- Keywords:
- Discrete probabilistic dynamics, face homogeneous Markov chains, Zipf's law.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+13p
- Format: Compressed postscript
- Get it
- Revisions: none
Statistical study of the activity due to the selection function in the E-method algorithm.
- By: Romain Michard, Arnaud Tisserand, Nicolas Veyrat-Charvillon
- Number: RR2005-25
- Date: May 2005
- Abstract:
- This work is a statistical study of the activity due to the selection
function in the polynomial approximation algorithm called E-method and proposed by M. Ercegovac in [Erc75,Erc77].
The latitude in the choice of the result digits in the selection function, when using a redundant representation,
allows us to consider a reduced electrical activity in some cases. This article presents the begining of a study
on the profits in such a situation.
- Keywords:
- computer arithmetic, low-power consumption, polynomial evaluation, E-method.
- Availability: Electronic copy only (in French).
- Citation: Published in FTFC 2005.
- Size: 2+6p
- Format: Portable Document Format
- Get it
- Revisions: none
Automatic Middleware Deployment Planning on Clusters.
- By: Eddy Caron, Pushpinder Kaur Chouhan, Holly Dail
- Number: RR2005-26
- Date: May 2005
- Abstract:
- The use of many distributed, heterogeneous resources as a large
collective resource offers great potential and has become an increasingly popular idea.
A key issue for these Grid platforms is middleware scalability and how middleware services
can best be mapped to the resource platform structure. Optimizing deployment is a difficult
problem with no existing, general solutions. In this paper we address a simpler sub-problem:
how to carry out an adapted deployment on a cluster with hundreds of nodes? Efficient use of
clusters alone or as part of the Grid is an important issue. We present a deployment model that
predicts the maximum throughput of each element of a deployment. Our deployment construction
algorithm uses this model to automatically create a mapping of middleware elements onto resources
with the goal of maximizing throughput. We apply our approach to automatically deploy a distributed
Problem Solving Environment (PSE) on a homogeneous cluster environment. We present experiments
comparing the automatically-generated deployment against a number of other reasonable deployments.
- Keywords:
- Deployment, Cluster, Middleware, Modeling.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+13p
- Format: Portable Document Format
- Get it
- Revisions: none
Evaluation of Meta-scheduler Architectures and Task Assignment Policies for High Throughput Computing.
- By: Eddy Caron, Vincent Garonne , Andreï Tsaregorodtsev
- Number: RR2005-27
- Date: May 2005
- Abstract:
- In this paper we present a model and simulator for many
clusters of heterogeneous PCs belonging to a local network. These clusters are assumed to be
connected to each other through a global network and each cluster is managed via a local scheduler
which is shared by many users. We validate our simulator by comparing the experimental and analytical
results of a M/M/4 queuing system. These studies indicate that the simulator is consistent.
Next, we do the comparison with a real batch system and we obtain an average error of 10.5\%
for the response time and 12\% for the makespan. We conclude that the simulator is realistic
and well describes the behaviour of a large-scale system. Thus we can study the scheduling of
our system called \dirac in a high throughput context. We justify our decentralized, adaptive
and opportunistic approach in comparison to a centralized approach in such a context.
- Keywords:
- Simulation, Model, Multi-clusters platform, Meta-scheduling, Grid Computing.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+11p
- Format: Portable Document Format
- Get it
- Revisions: none
GDS: an Architecture Proposal for a Grid Data-Sharing Service.
- By: Gabriel Antoniu, Marin Bertier, Luc Bougé, Eddy Caron, Frédéric Desprez, Mathieu Jan, Sébastien Monnet, Pierre Sens
- Number: RR2005-28
- Date: June 2005
- Abstract:
- Grid computing has recently emerged as a response to the growing demand for resources
(processing power, storage, etc.) exhibited by scientific applications. We address the challenge of sharing large
amounts of data on such infrastructures, typically consisting of a federation of node clusters. We claim that storing,
accessing, updating and sharing such data should be considered by applications as an external service. We propose an
architecture for such a service, whose goal is to provide transparent access to mutable data, while enhancing data
persistence and consistency despite node disconnections or failures. Our approach leverages on weaving together previous
results in the areas of distributed shared memory systems, peer-to-peer systems, and fault-tolerant systems.
- Keywords:
- Data sharing, grid computing, transparent access, mutable data, peer-to-peer systems, fault tolerance, consistency protocols.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+13p
- Format: Portable Document Format
- Get it
- Revisions: none
Hiérarchies algébriques de classes d'automates cellulaires.
- By: Marianne Delorme, Jacques Mazoyer, Guillaume Theyssier
- Number: RR2005-29
- Date: June 2005
- Abstract:
- Cellular automata are a formal model of locally interacting systems
which is very simple but suitable to study complex systems in general.
Many classifications have been proposed in the literature, often relying
on the observation of dynamics. In a first part, we present more recent
approaches of algebraic nature based on notions of sub-systems or
embeddings. A second part, which is more technical, is dedicated to new
results concerning these algebraic tools. This framework allows us to
give formal definitions to several intuitive global notions (e.g.,
universality, particles) and to derive formal proofs of positive results
and, more interestingly, of negative results. More precisely, we show
that modifying local rules may be more powerful in some sense than
increasing the number of states ; then, we illustrate by the
construction of an infinite lattice that dynamical universality is
incredibly more powerful than usual computation universality.
- Keywords:
- Cellular Automata, Complexity, Computation, Universality, Algebraic Classifications, Methods.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+51p
- Format: Portable Document Format
- Get it
- Revisions: none
Use of A Network Enabled Server System for a Sparse Linear Algebra Grid Application.
- By: Eddy Caron, Frédéric Desprez, Christophe Hamerling, Jean-Yves L'Excellent, Marc Pantel, Chiara Puglisi-Amestoy
- Number: RR2005-30
- Date: June 2005
- Abstract:
- Solving systems of linear equations is one of the key operations in linear algebra.
Many different algorithms are available in that purpose.
These algorithms require a very accurate tuning to minimise runtime and memory consumption. The TLSE project
provides, on one hand, a scenario-driven expert site to help users choose the right algorithm according to
their problem and tune accurately this algorithm, and, on the other hand, a test-bed for experts in order
to compare algorithms and define scenarios for the expert site. Both features require to run the available
solvers a large number of times with many different values for the control parameters (and maybe with many
different architectures). Currently, only the grid can provide enough computing power for this kind of application.
The DIET middleware is the GRID backbone for TLSE. It manages the solver services and their scheduling in a scalable way.
- Keywords:
- Grid Computing, Sparse Linear System Solvers, Expert Site Framework.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+16p
- Format: Portable Document Format
- Get it
- Revisions: none
Off-line and on-line scheduling on heterogeneous master-slave platforms.
- By: Jean-François Pineau, Yves Robert, Frédéric Vivien
- Number: RR2005-31
- Date: July 2005
- Abstract:
- In this work, we deal with the problem of scheduling independent tasks
on heterogeneous master-slave platforms. We target both off-line and on-line problems, with several
objective functions (makespan, maximum response time, total completion time). On the theoretical side,
our results are two-fold: (i) For off-line scheduling, we prove several optimality results for problems
with release dates; (ii) For on-line scheduling, we establish lower bounds on the competitive ratio of any
deterministic algorithm. On the practical side, we have implemented several heuristics, some classical and
some new ones derived in this paper. We studied experimentally these heuristics on a small but fully
heterogeneous MPI platform. Our results show the superiority of those heuristics which fully take into
account the relative capacity of the communication links.
- Keywords:
- Scheduling, Master-slave platforms, Heterogeneous computing, On-line, Release dates.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+16p
- Format: Portable Document Format
- Get it
- Revisions: none
Scheduling network requests with transmission window.
- By: Loris Marchal, Yves Robert, Pascale Vicat-Blanc Primet, Jingdi Zeng
- Number: RR2005-32
- Date: July 2005
- Abstract:
- We consider the problem of bulk data transfers and bandwidth
sharing in the context of grid infrastructures. Grid computing empowers high-performance computing
in a large-scale distributed environment. Network bandwidth, which makes the expensive computational
and storage resources work in concert, plays an active role on performance. Due to specific traffic
patterns, network topology and application scenarios, bandwidth sharing encounters new challenges.
From this perspective, this research report looks at bulk transfers among computing and storage elements.
Referred to as short-lived, transfer requests with transmission window and volume are scheduled in the network.
By manipulating the transmission window, the request accept rate and network resource utilization are to be optimized.
The formulated optimization problem is proven NP-complete. Associated with proposed heuristics, simulations are carried
out to study each bandwidth sharing strategy and its application scenarios. A tuning factor, that allows adaptation of
performance objective, is introduced to adjust network infrastructure and workload.
- Keywords:
- grid computing, network bandwidth sharing, online scheduling, optimization, transmission window, scheduling window.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+13p
- Format: Portable Document Format
- Get it
- Revisions: none
Register allocation and spill complexity under SSA.
- By: Florent Bouchez, Alain Darte, Christophe Guillon, Fabrice Rastello
- Number: RR2005-33
- Date: August 2005
- Abstract:
- This report deals with the problem of choosing which variables
to spill during the register allocation phase. Spilling is used when the number of variables is higher
than the number of registers, and consists of storing the value of a variable in memory and loading it
when necessary. The problem is that instructions dealing with memory are time-consumming. Hence the goal
is to minimize the amount of spilled variables, which is a highly studied problem for compiler design, but
nevertheless NP-complete. Meanwhile, a program under SSA form has the interesting property that on cases
where spill is unnecessary, the problem of register allocation is not anymore NP-complete but polynomial.
The interesting question is: can we solve the spilling problem under SSA, come back from SSA by splitting
live-ranges as SSA does and finaly use classical register allocator? We show in this report that unfortunately
many formulations of the spilling problem are also NP-complete under SSA form. In particular, the node-deletion
approach used in most compilers remains NP-complete on most cases even on basic-blocks. The only polynomial
solution has a too high complexity in practice. But this first advanced study on the complexity of the spill
problem under SSA greatly helps to the understanding and gives directions for polynomial approximations.
In this report, we also talk about the problem of splitting variables aggressively as in SSA. This shows
the weakness of the "iterated register coalescing" regarding false interferences created by the adding of
move instructions. We then implemented in a production compiler for st200 an interference graph based on
liveness analysis and value numbering: two variables interfere if at one's definition the other is live and
carries another value. The experiments we made and present in this report show that with such an interference
graph aggressive splitting is not a problem.
- Keywords:
- Compilation, register allocation, spill, coalescing, SSA, NP-completeness, perfect graph.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+9p
- Format: Portable Document Format
- Get it
- Revisions: none
Optimizing the translation out-of-SSA with renaming constraints.
- By: Fabrice Rastello, Francois de Ferriere, Christophe Guillon
- Number: RR2005-34
- Date: August 2005
- Abstract:
- Static Single Assignment form is an intermediate representation
that uses phi instructions to merge values at each confluent point of the control flow graph. phi
instructions are not machine instructions and must be renamed back to move instructions when translating
out of SSA form. Without a coalescing algorithm, the out of SSA translation generates many move instructions.
Leung and George use a SSA form for programs represented as native machine instructions, including the use of
machine dedicated registers. For this purpose, they handle renaming constraints thanks to a pinning mechanism.
Pinning phi arguments and their corresponding definition to a common resource is also a very attractive technique
for coalescing variables. In this paper, extending this idea, we propose a method to reduce the phi-related copies
during the out of SSA translation, thanks to a pinning-based coalescing algorithm that is aware of renaming constraints.
This report provides also a discussion about the formulation of this problem, its complexity and its motivations.
We implemented our algorithm in the STMicroelectronics Linear Assembly Optimizer. Our experiments show interesting
results when comparing to the existing approaches of Leung and George, Sreedhar et al., and Appel and George for register coalescing.
- Keywords:
- Static Single Assignment, Coalescing, NP-complete, K-COLORABILITY, Machine code level, register allocation.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+26p
- Format: Portable Document Format
- Get it
- Revisions: This article is a new version of this one.
Non Gaussian and Long Memory Statistical Characterisations for Internet Traffic with Anomalies.
- By: Antoine Scherrer, Nicolas Larrieu, Philippe Owezarski, Pierre Borgnat, Patrice Abryn
- Number: RR2005-35
- Date: September 2005
- Abstract:
- The Internet aims at providing a wide range of services for a large variety of applications.
Hence, it is highly sensitive to traffic anomalies (e.g., failures, flash crowds,...) as well as to DoS attacks, which are
likely to significantly reduce the Quality of Service level. Current intrusion detection systems, specially those based on
anomaly detection, are not providing efficient nor satisfactory solutions for DoS attack tracking. This is mainly due to
difficulties in distinguishing between strong but legitimate traffic variations and DoS attack induced changes. The goal
of this work is to compare relevant statistical characteristics of regular traffic to those of traffic presenting anomalies.
To do so, we introduce a non Gaussian long memory model and develop estimators for the corresponding parameters.
First, we show that this model relevantly describes Internet traffic for a wide range of aggregation levels, using
both a large set of data taken from public reference repositories (Bellcore, LBL, Auckland, UNC, CAIDA) and data collected
by ourselves. Second, we show that the proposed model also describes meaningfully traffic with anomalies such as flash
crowd and DoS attacks which we generated and collected. We show that the behaviors of the parameters of the model enables
us to discriminate between regular and anomalous traffic, and between flash crowds and DoS attacks. We also derive analytically
procedures to numerically synthesize realizations of stochastic processes with prescribed non Gaussian marginals and long range
dependent covariances. This enables us to validate the relevance and accuracy of our characterization procedures. Finally,
we describe various applications based on the proposed model.
- Keywords:
- Intrusion detection, DoS attack, Flash Crowd, Non Gaussian Long Range Dependent Process.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+22p
- Format: Portable Document Format
- Get it
- Revisions: none
Asymptotically Fast Polynomial Matrix Algorithms for Mutivariable Systems.
- By: Claude-Pierre Jeannerod, Gilles Villard
- Number: RR2005-36
- Date: September 2005
- Abstract:
- We present the asymptotically fastest known algorithms for some basic problems on univariate
polynomial matrices: rank, nullspace, determinant, generic inverse, reduced form. We show that they
essentially can be reduced to two computer algebra techniques, minimal basis computations and matrix
fraction expansion/reconstruction, and to polynomial matrix multiplication. Such reductions eventually
imply that all these problems can be solved in about the same amount of time as polynomial matrix
multiplication (up to logarithmic factors and the size of the output).
- Keywords:
- Linear algebra, polynomial matrix, matrix rank, matrix determinant, nullspace basis, matrix inversion, matrix reduced form.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+12p
- Format: Portable Document Format
- Get it
- Revisions: none
Fast and correctly rounded logarithms in double-precision.
- By: Florent de Dinechin, Christoph Lauter, Jean-Michel Muller
- Number: RR2005-37
- Date: September 2005
- Abstract:
- This article is a case study in the implementation of a portable,
proven and efficient correctly rounded elementary function in double-precision. We describe the methodology
used to achieve these goals in the \textttcrlibm library. There are two novel aspects to this approach.
The first is the proof framework, and in general the techniques used to balance performance and provability.
The second is the introduction of processor-specific optimizations to get performance equivalent to the best
current mathematical libraries, while trying to minimize the proof work. The implementation of the natural
logarithm is detailed to illustrate these questions.
- Keywords:
- floating-point, elementary functions, logarithm, correct rounding.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+14p
- Format: Portable Document Format
- Get it
- Revisions: none
Basic building blocks for a triple-double intermediate format.
- By: Christoph Quirin Lauter
- Number: RR2005-38
- Date: September 2005
- Abstract:
- The implementation of correctly rounded elementary functions
needs high intermediate accuracy before final rounding. This accuracy can be provided by (pseudo-)
expansions of size three, i.e. a triple-double format. The report presents all basic operators for
such a format. Triple-double numbers can be redundant. A renormalization procedure is presented and proven.
Elementary functions' implementations need addition and multiplication sequences. These operators must
take operands in double, double-double and triple-double format. The results must be accordingly in one of
the formats. Several procedures are presented. Proofs are given for their accuracy bounds. Intermediate
triple-double results must finally be correctly rounded to double precision. Two effective rounding sequences
are presented, one for round-to-nearest mode, one for the directed rounding modes. Their complete proofs
constitute half of the report.
- Keywords:
- elementary functions, multiple precision, expansions, correct rounding.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+65p
- Format: Portable Document Format
- Get it
- Revisions: none
High-Radix Floating-Point Division Algorithms for Embedded VLIW Integer Processors.
- By: Claude-Pierre Jeannerod, Saurabh-Kumar Raina and Arnaud Tisserand
- Number: RR2005-39
- Date: September 2005
- Abstract:
- This paper presents floating-point division algorithms and
implementations for embedded VLIW integer processors. On those processors, there is no hardware
floating-point unit, for cost reasons. But, for portability and/or accuracy reasons, a software
FP emulation layer is sometime useful. Here, we focus on high-radix digit-recurrence algorithms
for FP division on integer VLIW processors such as the ST200 from STMicroelectronics.
- Keywords:
- Computer arithmetic, floating-point arithmetic, high-radix division algorithm, integer processor, VLIW processor.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+11p
- Format: Compressed postscript
- Get it
- Revisions: none
Scheduling with Resource Constraints using Dis-Equations.
- By: Hadda Cherroun, Alain Darte, and Paul Feautrier
- Number: RR2005-40
- Date: September 2005
- Abstract:
- Scheduling is an important step in high-level synthesis (HLS).
We perform scheduling in two steps: (1) coarse-grain scheduling, in which we take into account the
whole control structure of the program including imperfect loop nests, and (2) fine-grain scheduling,
where we refine each logical step using a detailed description of the available resources. This paper
focuses on the second step. We present an efficient formalism to express resource constraints, using
dis-equations (i.e., negations of equations). We give an exact branch-and-bound algorithm, based on
variants of Floyd's and Dijkstra's algorithms. We compare this approach with a greedy-scheduling
heuristic on pieces of scientific applications from the PerfectClub and the HLSynth95 benchmarks.
The results show that both methods are suitable for HLS tools.
- Keywords:
- Scheduling, resource constraints, dis-equations, branch-and-bound, Dijkstra.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+20p
- Format: Gzipped postscript
- Get it
- Revisions: none
An algorithmic toolbox for Network Calculus.
- By: Anne Bouillard, Eric Thierry
- Number: RR2005-41
- Date: September 2005
- Abstract:
- Network Calculus offers powerful tools to analyze the
performances in communication networks, in particular to obtain determinismic bounds.
This theory is based on a strong mathematical ground, notably by the use of (min,+) algebra.
However the algorithmic aspects of this theory have not been much addressed yet.
This paper is an attempt to provide some efficient algorithms implementing Network
Calculus operations for some classical functions.
A class of functions which is often used are the piecewise linear functions which have a constant growth from
a certain point. As a first step towards algorithmic design, we present a class containing these functions and
closed under the Network Calculus operations: it is the piecewise linear functions which are periodic from a certain point.
They can be finitely described which enables us to propose some algorithms for each of the Network Calculus operations.
- Keywords:
- (min,+)-algebra, Network Calculus, algorithms.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+26p
- Format: Compressed postscript
- Get it
- Revisions: none
Service Discovery in a peer-to-peer environment for Computational Grids.
- By: Eddy Caron, Frédéric Desprez and Cédric Tedeschi
- Number: RR2005-42
- Date: September 2005
- Abstract:
- Grid computing is a form of distributed computing aiming
at aggregating available geographically distributed heterogeneous resources for high performace computing.
The services of a grid is the set of software components made available by servers to clients.
Service discovery is today broadly considered by grid' users as inefficient on large scale dynamic platforms.
Today's tools of service discovery does not reach several requirements: flexibility of the discovery,
efficiency on wide-area dynamic platforms. Therefore, it becomes crucial to propose new tools coping with such platforms.
Emerging peer-to-peer technologies provide algorithms allowing the distribution and the search of files taking into
account the dynamicity of the underlying network. This report present a new tool merging grid computing and peer-to-peer
technologies for an efficient and flexible service discovery in a dynamic and large scale environment, taking
into account the topology of the underlying network.
- Keywords:
- grid computing, peer-to-peer technology, service discovery.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+21p
- Format: Compressed postscript
- Get it
- Revisions: none
Assisted verification of elementary functions.
- By: Florent de Dinechin, Christoph Lauter, Guillaume Melquiond
- Number: RR2005-43
- Date: September 2005
- Abstract:
- The implementation of a correctly rounded or interval elementary
function needs to be proven carefully in the very last details. The proof requires a tight bound on
the overall error of the implementation with respect to the mathematical function. Such work is function
specific, concerns tens of lines of code for each function, and will usually be broken by the smallest
change to the code (e.g. for maintenance or optimization purpose). Therefore, it is very tedious and
error-prone if done by hand. This article discusses the use of the Gappa proof assistant in this context.
Gappa has two main advantages over previous approaches: Its input format is very close to the actual C
code to validate, and it automates error evaluation and propagation using interval arithmetic.
Besides, it can be used to incrementally prove complex mathematical properties pertaining to the C code.
Yet it does not require any specific knowledge about automatic theorem proving, and thus is accessible
to a wider community. Moreover, Gappa may generate a formal proof of the results that can be checked
independently by a lower-level proof assistant like Coq, hence providing an even higher confidence
in the certification of the numerical code.
- Keywords:
- proof assistant, floating-point, elementary functions, numerical code.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+15p
- Format: Portable Document Format
- Get it
- Revisions: none
Characterizing Valiant's algebraic complexity classes.
- By: Guillaume Malod and Natacha Portier
- Number: RR2005-44
- Date: September 2005
- Abstract:
- Valiant introduced 20 years ago $VP$ and $VNP$: an algebraic
analogue to the classes $P$ and $NP$. They are defined via non-uniform sequences of arithmetic
circuits and are especially useful to study the complexity of polynomials families. In this paper
we gather known results and new techniques under a unifying theme, namely the restrictions imposed
upon the gates of the circuit, building a hierarchy from formulas to circuits. As a consequence we
get a simpler proofs for known results such as the equality of the classes $VNP$ and $VNP_e$,
the completeness of the determinant for $VQP$ or the equivalence of skew and weakly skew arithmetic
circuits, and new results such as a characterisation of the classes $VP$ and $VQP$ or a full answer
to a conjecture of Bürgisser. We also show that for circuits of polynomial depth and unbounded size
these models all have the same expressive power and can be used to characterize a uniform version of $VNP$.
- Keywords:
- Valiant's theory, algebraic complexity, arithmetic circuits.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+19p
- Format: Portable Document Format
- Get it
- Revisions: none
Scheduling multiple bags of tasks on heterogeneous master-worker platforms: centralized versus distributed solutions.
- By: Olivier Beaumont, Larry Carter, Jeanne Ferrante, Arnaud Legrand, Loris Marchal, Yves Robert
- Number: RR2005-45
- Date: September 2005
- Abstract:
- Multiple applications that execute concurrently on heterogeneous platforms
compete for CPU and network resources. In this paper we consider the problem of scheduling applications
to ensure fair and efficient execution on master-worker platforms where the communication is restricted to
a tree embedded in the network. The goal of the scheduling is to obtain the best throughput while enforcing
some fairness between applications. We show how to derive an asymptotically optimal periodic schedule by
solving a linear program expressing all problem constraints. For single-level trees, the optimal solution
can be analytically computed. For large-scale platforms, gathering the global knowledge needed by the linear
programming approach might be unrealistic. One solution is to adapt the multi-commodity flow algorithm of
Awerbuch and Leighton, but it still requires some global knowledge. Thus, we also investigates heuristic
solutions using only local information, and test them via simulations. The best of our heuristics achieves
the optimal performance on about two-thirds of our test cases, but is far worse in a few cases.
- Keywords:
- Parallel computing, scheduling, divisible load, multiple applications, resource sharing.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+33p
- Format: Portable Document Format
- Get it
- Revisions: none
Threshold effect in algorithmic complexity: NP-hard and linear variants of hypergraph partitioning.
- By: Laurent Lyaudet
- Number: RR2005-46
- Date: September 2005
- Abstract:
- This report present an infinite family of combinatorial
problems that shows extremely abrupt changes of complexity between neighbours problems.
I will show that the problem $P_l^k$ that is a purely constraint-driven variant of hypergraph
partitioning is for fixed parameters $l$ and $k$ NP-complete when $k=1$ and is linear when $k\neq1$
on the class of hypergraphs with disjoint hyperedges.
- Keywords:
- Hypergraph partitioning, NP-Completeness, Threshold effect.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+14p
- Format: Compressed postscript
- Get it
- Revisions: none
Managing Data Persistence in Network Enabled Servers.
- By: Eddy Caron , Bruno DelFabbro , Frédéric Desprez , Emmanuel Jeannot , Jean-Marc Nicod
- Number: RR2005-47
- Date: October 2005
- Abstract:
- The GridRPC model \citegridrpc is an emerging standard promoted by
the Global Grid Forum (GGF)\footnote\urlhttp://www.ggf.org that defines how to perform remote client-server
computations on a distributed architecture. In this model data are sent back to the client at the end of
every computation. This implies unnecessary communications when computed data are needed by an other server
in further computations. Since, communication time is sometimes the dominant cost of remote computations,
this cost has to be lowered. Several tools instantiate the GridRPC model such as NetSolve developed at the
University of Tennessee, Knoxville, USA, and DIET developed at LIP laboratory, ENS Lyon, France. They are
usually called Network Enabled Servers (NES). In this paper, we present a discussion of the data management
solutions chosen for these two NES (NetSolve and DIET) as well as experimental results.
- Keywords:
- Data peristence, Distributed computing, Network Enabled Servers.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+25p
- Format: Portable Document Format
- Get it
- Revisions: none
Minimizing the stretch when scheduling flows of biological requests.
- By: Arnaud Legrand, Alan Su, Frédéric Vivien
- Number: RR2005-48
- Date: October 2005
- Abstract:
- In this paper, we consider the problem of scheduling
comparisons of motifs against biological databanks. This problem lies in the divisible
load framework with negligible communication costs. Thus far, very few results have been
proposed in this model. We first explain the relationship between this model and the preemptive
uni-processor one. After having selected a few relevant metrics (max-stretch and sum-stretch),
we show how to extend algorithms that have been proposed in the literature for the uni-processor
model to our setting. Then we extensively study the performance of these algorithms in realistic
scenarios. Our study clearly suggest an efficient heuristic for each of the two metrics, though
a combined optimization is in theory not possible in the general case.
- Keywords:
- Bioinformatics, heterogeneous computing, scheduling, divisible load, linear programming, stretch.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+20p
- Format: Portable Document Format
- Get it
- Revisions: none
Simple Seed Architectures for Reciprocal and Square Root Reciprocal.
- By: M. Ercegovac, J.-M. Muller and A. Tisserand
- Number: RR2005-49
- Date: October 2005
- Abstract:
- This report presents a simple hardware architecture for
computing the seed values for reciprocal and square root reciprocal. These seeds are used
in the initialization of floating-point division and square root software iterations.
The proposed solution is based on polynomial approximation with specific coefficients
and a table lookup. The obtained architectures lead to small and fast circuits.
- Keywords:
- computer arithmetic, reciprocal, square root reciprocal,
polynomial approximation, table based method, hardware arithmetic operator, Newton-Raphson iteration.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+19p
- Format: Compressed postscript
- Get it
- Revisions: none
Automatic Middleware Deployment Planning on Clusters.
- By: Pushpinder Kaur Chouhan, Holly Dail, Eddy Caron, Frédéric Vivien
- Number: RR2005-50
- Date: October 2005
- Abstract:
- The use of many distributed, heterogeneous resources as
a large collective resource offers great potential and has become an increasingly popular idea.
A key issue for these Grid platforms is middleware scalability and
how middleware services can best be mapped to the resource platform structure. Optimizing deployment
is a difficult problem with no existing general solutions. In this paper we address a simpler sub-problem:
how to carry out an adapted deployment on a cluster with hundreds of nodes? Efficient use of clusters
alone or as part of the Grid is an important issue. In this paper we present an approach for automatically determining
an optimal deployment for hierarchically distributed middleware services on clusters where the goal
is to optimize steady-state request throughput. We prove that a complete spanning d-ary tree provides
an optimal deployment and we present an algorithm to construct this optimal tree. We use a distributed
Problem Solving Environment called DIET to test our approach. We define a performance model for each
component of DIET and validate these models in a real-world cluster environment. Additional experiments
demonstrate that our approach selects a deployment that performs better than other reasonable deployments.
- Keywords:
- Deployment, Cluster, Cluster computing, Network Enabled Server, Steady-state scheduling.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+23p
- Format: Portable Document Format
- Get it
- Revisions: This article is a new version of this one.
The impact of heterogeneity on master-slave on-line scheduling.
- By: Jean-François Pineau, Yves Robert, Frédéric Vivien
- Number: RR2005-51
- Date: October 2005
- Abstract:
- In this paper, we assess the impact of heterogeneity for scheduling independent
tasks on master-slave platforms. We assume a realistic one-port model where the master can communicate
with a single slave at any time-step. We target on-line scheduling problems, and we focus on simpler
instances where all tasks have the same size. While such problems can be solved in polynomial time on homogeneous
platforms, we show that there does not exist any optimal deterministic algorithm for heterogeneous platforms.
Whether the source of heterogeneity comes from computation speeds, or from communication bandwidths, or from both,
we establish lower bounds on the competitive ratio of any deterministic algorithm. We provide such bounds for the
most important objective functions: the minimization of the makespan (or total execution time), the minimization
of the maximum response time (difference between completion time and release time), and the minimization of the
sum of all response times. Altogether, we obtain nine theorems which nicely assess the impact of heterogeneity
on on-line scheduling. These theoretical contributions are complemented on the practical side by the implementation
of several heuristics on a small but fully heterogeneous MPI platform. Our (preliminary) results show the superiority
of those heuristics which fully take into account the relative capacity of the communication links.
- Keywords:
- Scheduling, Master-slave platforms, Heterogeneous computing, On-line.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+20p
- Format: Portable Document Format
- Get it
- Revisions: none
FIFO scheduling of divisible loads with return messages under the one-port model.
- By: Olivier Beaumont, Loris Marchal, Veronika Rehn, Yves Robert
- Number: RR2005-52
- Date: October 2005
- Abstract:
- This paper deals with scheduling divisible
load applications on star networks, in presence of return messages.
This work is a follow-on of RR2005-21, where the same problem was considered
under the 2-port model, where a given processor can simultaneously send and
receive messages. Here, we concentrate on the one-port model, where a processor
can either send or receive a message at a given time step. The problem of
scheduling divisible load on star platforms turns out to be very difficult
as soon as return messages are involved. Unfortunately, we have not been able
to assess its complexity, but we provide an optimal solution in the special
(but important) case of FIFO communication schemes. We also provide an explicit
formula for the optimal number of load units that can be processed by a FIFO
ordering on a bus network. Finally, we provide a set of MPI experiments to
assess the accuracy and usefulness of our results in a real framework.
- Keywords:
- Scheduling, divisible load, master-worker
platform, heterogeneous cluster, return messages, one-port model.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+23p
- Format: Portable Document Format
- Get it
- Revisions: none
Analysis and Synthesis of Cycle-Accurate On-Chip Traffic with Long-Range-Dependence.
- By: Antoine Scherrer, Antoine Fraboulet, Tanguy Risset
- Number: RR2005-53
- Date: December 2005
- Abstract:
- In this report we propose a stochastic traffic
analysis and synthesis methodology adapted to on-chip network traffic.
This work confirms and extends recent works (Marculescu et al.~\cite{marculescu-tvlsi})
by providing more precise (cycle-accurate) simulations. Our framework based on System-C
simulations is able to model precisely the latency of each request-acknowledge transaction
or aggregated throughput taking into account its bursty behavior. Our experiments show that
on-chip traffic is non-stationary, with long-range-dependence property and that simulation
platforms of systems-on-chip will now need to use advanced statistical models for traffic simulation.
- Keywords:
- on-chip traffic, stochastic analysis, long-range-dependance, process synthesis.
- Availability: Electronic copy only.
- Citation: Not published yet.
- Size: 2+9p
- Format: Portable Document Format
- Get it
- Revisions: none