NMmpa

From the file NMmp2g.out we will then create as many files as the number of genome sequences (or chromosomes) per match strand (+ or -). This is done by running the program NMmpa which is a C implementation of the R script makeProbeAnno included in the R package distributed by David et al. (2).

So, let's do it:

$NMmpa -i NMmp2g.out -o NMmpa -a 2560

where the option -a 2560 indicates that the array is a squared matrix of $2560 \times 2560$ probes. Once the program ends, you can see that a new folder called NMmpa has been created and that its folder contains files with extension .prb which names start by the corresponding chromosome name (chrI, chrII, etc...) suffixed by the strand of the match. So if a probe have a match in positive strand (+) on chromosome chrI, the match will be listed inside the file chrI+.prb.

Let's now have a look at these files:

$head -10 NMmpa/chrI+.prb
86_3 7767 114699 25 0
93_3 7774 27052 25 3
93_3 7774 25702 25 3
208_3 7889 15351 25 3
510_3 8191 32509 25 0
860_3 8541 192923 25 0
894_3 8575 193187 25 0
1061_3 8742 79283 25 0
1171_3 8852 144051 25 0
1275_3 8956 541 25 3

As you can see, all these files are formated as tables of 5 columns sepearated by a whitespace. The meaning of the columns is:

the probe coordinates on the array,
the probe id,
the starting position (bp) of the match on the sequence,
the length of the match,
the uniqueness status of the match, defined as follows:
- 0 unique perfect match
- 1 multiple perfect matches
- 2 unique near match
- 3 multiple near matches

Jean-Baptiste Veyrieras 2010-05-28