BIP IP driver for LINUX performance

Our benchmark with TCP

Below is a comparison of the performance of BIP-IP against the Myricom v3.09 driver. The bandwidth was measured by doing 100 ping-pong tests between two machines and keeping the median value (generally the 100 results for each size are within a few percent).
Each driver was tested twice with two different compiled-in MTU values.

small TCP
	    Performance curves, link to large graph.
Click to get large graph

Test method details

The test code on the master side looks like:

    length = ...;
    buf = malloc(length);
    for (i=0;i<nbtest;i++) {
      starttimer();
      send(tcp_fd,buf,length,0);
      do {
        n = recv(tcp_fd,buf,length,0);
        buf += n;
        length -= n;
      } while (length > 0);
      end_timer();
      time[i] = timer_value();
    }

The code of the receiver is the same expect the send and recv are done in the opposite order. The TCP connection have been set with the following options (on each side):

    ndelay = 1;
    snd_buf = 65500;
    rcv_buf = 65500;
    setsockopt(fd,SOL_SOCKET,SO_SNDBUF,(char*)&snd_buf,sizeof(int)));
    setsockopt(fd,SOL_SOCKET,SO_RCVBUF,(char*)&rcv_buf,sizeof(int)));
    setsockopt(fd,IPPROTO_TCP, TCP_NODELAY,(char*)&ndelay, sizeof(int)));

Note that for the Linux TCP implementation, the TCP control flow window that will be used is about half the SO_RCVBUF option size, so about 32000 bytes.

Overview of some implementation details

The Linux BIP-IP driver use internally the BIP message-passing API and implementation in the same manner that the Myricom Linux driver is based on the MyrinetAPI.

There is a kind of bug in the Linux 2.0 TCP implementation that will make performance drop (to about 2Mbytes/s) for certain size of messages, this behaviour can be observed with both the Myricom driver and our BIP-IP driver, here is a patch that correct this behaviour. (Note that all tests were done with this patch applied).

Not that this patch will only eliminate the performance drop for TCP connection that have the Nagle algorithm disabled with the SO_NDELAY option. Fortunately this is the case for PVM, LAM-MPI and MPICH over IP.

Another problem is that Linux kernel allocation is not really designed for network buffers greater than a page size, so we have also written a small patch to allow to use efficiently larger MTU. Basically this patch allow the Linux kernel to keep a pool of buffers of "large" sizes that are recycled.

Netperf benchmarks

Netperf Benchmarks measure TCP and UDP performance. The results below show TCP and UDP performance of the Debian Linux TCP/IP stack over BIP. All results and information about Netperf Benchmarks are available at Netperf Home Page. Best other results are also given here for a comparison purpose.

O   Througput benchmarks

BENCH TYPE PAQUET SIZE THROUGPUT (Mb/s) SYSTEM NETWORK
UDP_STREAM 16384 356.11 DEC500/PPro Myrinet
7872 352.42 PPro Myrinet with BIP
8192 287.79 DEC500/DEC500 Myrinet
TCP_STREAM 1048576 750.16 SGI Power Challenge HiPPI
16384 338.05 PPro Myrinet with BIP
7872 315.89 PPro Myrinet with BIP
65536 271.35 DEC500/DEC500 Myrinet
O   Latency benchmarks (by request-response speed)
BENCH TYPE REQUEST SIZE TRANSACTIONS/s SYSTEM NETWORK
UDP_RR 1 8826.16 PPro Myrinet with BIP
1 4403.99 HP K460 FC-266
TCP_RR 1 7506.20 PPro Myrinet with BIP
1 4184.95 Dual PPro FastEthernet
(All messages sizes are expressed in bytes)

Home
Last modified: Thu Aug 21 14:06:34 CEST 1997  
© BIP team