The BIP API consists in several variants of send and receive primitives that should meet any need, with both blocking and non-blocking versions.
BIP ensures reliable and ordered transmission of messages in the absence of network fault (cf. 2.2).
It is an error to send a message to yourself.
All messages consist of a contiguous array of words (multiple of 4 bytes, properly aligned in memory). Buffers can be located anywhere in the process address space: either ``global'' data, or data allocated with malloc, or data on the stack.
In general a Myrinet network is reliable enough so that error recovery is not strongly necessary. Still sometimes a network error can happen, in this case it will be detected. Either an application terminates correctly, or it may be aborted in case of network error. In any case, BIP ensures that no messages can be silently lost and no corrupt message can arrive to the application.
Implementing error recovery is ongoing work. If the gain is worth, we will make it a compile-time option, because from our experience, we have never been able to show evidence of a network error on our platform.
Here is a small example before going into the details of BIP:
#include "bip.h" int main(int argc,char*argv[]) { int token; bip_init(); if (bip_mynode == 0) { token = 333; printf("token start on 0\n"); bip_send(1,&token,1); bip_recv(&token,1); printf("token arrived\n"); } else { bip_recv(&token,1); printf("token %d received on %d\n",token, bip_mynode); bip_send((bip_mynode+1)%bip_numnodes,&token,1); } return 0; }
This example can be run on 3 machines (here lhpca, lhpcb, lhpcc) by doing:
% bipcc token.c -o token % bipconf lhpca lhpcb lhpcc % bipload token token start on 0 token 333 received on 1 token 333 received on 2 token arrived %
BIP calls have some semantic differences between short and long messages, the same send and receive calls are used for both, so in simple cases, the user can ignore the semantic distinctions.
``long messages'' sends and receives have a rendez-vous semantic where a receive need to be posted before or at least ``not too long'' after the matching send has begun. This is a requirement similar to the ``ready send'' mode of MPI. It is in fact a bit more permissive, the precise restriction is that the receive should be posted no longer than about 50 ms after the send (after which a message blocked on the network could be dropped in some cases), but note that the receive should preferably be posted before or ``very soon'' after the send to avoid blocking communication paths in the Myrinet network (which could severely affect performance of other communications).
On the opposite ``short'' messages are stored into an circular queue, so that the send calls will not block even if no matching receive has been posted, (except if the destination receive buffers are full, the amount of buffering can be controlled with bip_taginit).
The limit between ``short'' and ``long'' messages is specified by BIPSMALLSIZE (and depending on the release is between 100 and 400 bytes).
Each message is sent to a particular queue of a particular node. Each
different queue on a node is identified with a tag between 0
and NTAGS-1
(NTAGS
is 200 in the current release). The
``token'' example shown above use the default queue by not specifying
any tag.
Communication buffers for small messages are allocated at initialization for each receive queue, you can override the default amount of buffers independantly for each queue before the call to bip_init with the primitive bip_taginit.
As small messages are buffered on reception, it is not mandatory for the user to receive these messages in the sending order. On the contrary, for a long message, the application must be ready to receive such a message when it is sent, and messages for other queues cannot be received before the user has provided a buffer for such a long message.
On one process send and receive functions are completely independent. At one time, you can have at most one send call in progress, and one receive call per tag posted. Receives on different tags, and a send can be done at the same time in different threads and are thread-safe (but note that blocking calls will not automatically generate a thread switch). Two receives with the same tag, or two sends cannot be done concurrently by two threads.