Myrinet on Linux/Intel

This page presents our first experiences with our Myrinet/Linux/PentiumPro platform. We try to include information that could benefits to others. There is no guarantee on the information given here, it is our own understanding on the subject, so please inform us if you disagree or find some bugs.

The patches are against myrinet driver 3.08d or 3.09.

Software installation

Thanks to the quality of Myricom software, all works well without any special effort by just installing the Myrinet binaries and following the documentation.

I would still recommend to people with a configuration similar to us (Linux/x86) to recompile the myrinet driver for you own version of the Linux kernel. This has the following advantages: no need to use the -f option of insmod to force loading of the module because of mismatch in kernel versions, in the (normally improbable) case something important has changed in the Linux kernel macros, a module compiled with another version of the kernel may not work.

The second personnal recommendation is to change the CFLAGS macro in the Makefile, first if you use only Linux machines add the optional -DSMALL_BUFFER macro provided by myrinet in the CFLAGS macro. From my understanding of the linux IP stack, if the MTU is greater than a page size, you may well be dropping a lot of packets when the memory becomes fragemented, see also the explanation of Myricom.

The last suggestion I would make for people who prefer reliability over some nanosecond gain is to change the optimisation flag from -O4 to -O (or -O2), for compilation of the myrinet driver. First there are some known bugs (although very few) in gcc when compiling with all optimisations. Moreover, the linux headers files contain a lot of assembly macros, and it is not known with a very high confidence, if all semantics of C code are preserved with the higher optimisation levels when you have some assembly code mixed with it. Finally the gain from -O to -O4 is not so big.

Some source modifications that may be useful

Being able to reload a new myrinet driver without rebooting

While doing some testing with the myrinet driver, it is quite comfortable to be able to remove the current myrinet driver and reload a new one in the kernel. Unfortunately the driver allocates a big area of physically contiguous memory at initialisation. At boot time there is no problem, but if the machine has been up for some time, these allocation will fail, and you will have to reboot the machine to reinstall a new driver. We were able to solve this problem with two modifications:

First I think it is safe to replace in the myrinet code the request for dma-able pages by normal pages. From what I understand, the dma page allocation of Linux/x86 is only necessary for ISA cards that are restricted to the bottom 16Mbytes of memory. Here is the corresponding patch.

Then the problem of contiguous memory is not specific to the myrinet driver and there is a very small patch (2 lines) available against the standard linux kernel to make these kind of allocation doable (basically these patches allow to swap out pages when somebody needs a contiguous memory area). It works great on our machines.

Making the exchange memory area cacheable

Lastly we made a modification to the Myrinet driver so that the main memory area, reserved for communication between the Lanai and the Processor is cacheable, the PC hardware ensures coherency between the the main memory, the cache and the PCI bus (snooping is done on PCI access to main memory in case the data is in a cache), so this changes should hopefully not do any harm. You can find the patch here. We have made some tests with BIP, and it makes the performance go from ~90Mbytes/s to 126Mbytes/s. Note that it has no influence on IP use, only on people having their own MCP.
Home
Last modified: Thu Aug 21 13:39:26 CEST 1997  
© BIP team