FLIP, a Floating-point Library for Integer Processors

Arénaire team

LIP ( CNRS - ENSL - INRIA - UCBL )

FLIP is a C library for the software support of single precision floating-point (FP) arithmetic on processors without FP hardware units such as VLIW (Very Large Instruction Word) or DSP (Digital Signal Processor) processor cores for embedded applications. The target architecture is the ST200 family of high-performance low-power VLIW processor cores which are targeted at STMicroelectronics system on chips (SOC) solutions for use in computationally intensive applications as a host or an audio or video processor. Such applications include embedded systems in consumer, digital TV and telecommunication markets.

FLIP is distributed under LGPL. This research project is funded by French Région Rhône-Alpes.

FLIP news

Initial Version 0.1 2004-09-13

supported operations: addition/subtraction, multiplication, division and square root
4 IEEE rounding modes
choice of support or not support subnormal numbers

Version 0.2 2005-10-17

new supported operations: square, reciprocal, inverse square root and multiply and add/sub
new division and square root code (execution speed up)
support of trivial computations

Version 0.3 in preparation

Sine, cosine, exponential and logarithm.

Performances

Performance in perfect cycles (equivalent to accounting for bundles and branch penalty, regardless of any cache effects) was measured for each variant of FLIP (tables 3-6) and compared to the current STMicroelectronics (table 2) Softfloat-based library operators (see table 1 for comparison), both on ST220 and ST231, producing the following results:

add/sub mul div sqrt square FMA/FMS reciprocal inverse sqrt
Speedup 1.38/1.37 1.25 3.47 2.49 1.66 1.79/1.78 3.76 4.53
Table1: Comparison of STMicroelectronics library with the faster variant of FLIP on ST231 processor core. Operators such as square, FMA/FMS, reciprocal and inverse sqrt were implemented using mul, mul followed by add/sub, div and sqrt followed by div respectively in STMicroelectronics library.

Table1: Comparison of STMicroelectronics library with the faster variant of FLIP on ST231 processor core. Operators such as square, FMA/FMS, reciprocal and inverse sqrt were implemented using mul, mul followed by add/sub, div and sqrt followed by div respectively in STMicroelectronics library.
	add/sub	mul	div	sqrt	square	FMA/FMS	reciprocal	inverse sqrt
Speedup	1.38/1.37	1.25	3.47	2.49	1.66	1.79/1.78	3.76	4.53

Proc. add/sub mul div sqrt
ST231 61/62 45 177 127
ST220 61/62 45 179 127
Table2: Original optimized STMicroelectronics library (No subnormals, round-to-nearest-even)

Table2: Original optimized STMicroelectronics library (No subnormals, round-to-nearest-even)
Proc.	add/sub	mul	div	sqrt
ST231	61/62	45	177	127
ST220	61/62	45	179	127

Proc. add/sub mul div sqrt square FMA/FMS reciprocal inverse sqrt
ST231 44/45 36 51 51 27 59/60 47 67
ST220 44/45 38 62 76 28 60/61 59 96
Table3: No subnormals, round-to-nearest-even, fast version of division and square root, additional operations

Table3: No subnormals, round-to-nearest-even, fast version of division and square root, additional operations
Proc.	add/sub	mul	div	sqrt	square	FMA/FMS	reciprocal	inverse sqrt
ST231	44/45	36	51	51	27	59/60	47	67
ST220	44/45	38	62	76	28	60/61	59	96

Proc. add/sub mul div sqrt
ST231 44/45 45 132 123
ST220 44/45 47 132 123
Table4: Subnormals, round-to-nearest-even

Table4: Subnormals, round-to-nearest-even
Proc.	add/sub	mul	div	sqrt
ST231	44/45	45	132	123
ST220	44/45	47	132	123

Proc. add/sub mul div sqrt
ST231 67/68 58 149 125
ST220 67/68 60 149 125
Table5: No subnormals, all rounding modes

Table5: No subnormals, all rounding modes
Proc.	add/sub	mul	div	sqrt
ST231	67/68	58	149	125
ST220	67/68	60	149	125

Proc. add/sub mul div sqrt
ST231 74/75 60 142 129
ST220 74/75 62 142 129
Table6: Subnormals, all rounding modes (IEEE 754 compatibility)

Table6: Subnormals, all rounding modes (IEEE 754 compatibility)
Proc.	add/sub	mul	div	sqrt
ST231	74/75	60	142	129
ST220	74/75	62	142	129

Publications concerning FLIP

C. Bertin, N. Brisebarre, B. Dupont de Dinechin, C.-P. Jeannerod, C. Monat, J.-M. Muller, S.K. Raina and A. Tisserand. A Floating-Point Library for Integer Processors. SPIE 49th Annual Meeting, proceedings of SPIE vol. 5559 (Advanced Signal Processing Algorithms, Architectures, and Implementations XIV), August 2-6, 2004, Denver, USA. (Rapport de recherche LIP, July 2004 n°RR2004-37, Rapport de recherche INRIA, July 2004 n°5268).
C.-P. Jeannerod, S.K. Raina, and A. Tisserand. High-Radix Floating-Point Division Algorithms for Embedded VLIW Integer Processors. Proceedings of 17th IMACS World Congress (Scientific Computation, Applied Mathematics and Simulation), July 11-15, 2005, Paris, France. (Rapport de recherche LIP n°RR2004-37

Miscellaneous

Genealogy of FLIP
- 16 July 2004 - FLIP provides basic operations with the support of subnormal numbers for round-to-nearest-even mode only.
- 13 Sep 2004 - FLIP v0.1 (same as above) with validation tests and documentation.
- 21 Sep 2005 - FLIP v0.11 with few bugs in FLIP v0.1 fixed.
- 17 Oct 2005 - FLIP v0.2 with all the fast and additional operations in addition to all the four variations for basic operations.
Do not hesitate to contact us (address at ens-lyon.fr) for more details.

Claude-Pierre Jeannerod, Saurabh Kumar Raina, Arnaud Tisserand

Last modified: Mon Oct 17 18:00:53 CEST 2005