NMmp2g

Once MUMmer have been installed, the first step consists in mapping the probe sequences onto the genome sequence. Go into folder S1/BY_S288 and use the script NMmp2g as follows:

$NMmp2g ../../Data/BY_S288c/Sequence/Genome.fasta \
       ../../Data/Affymetrix/S.cerevisiae_tiling.fasta \
       NMmp2g.out 2>NMmp2g.log
where the first argument (here ../../Data/BY_S288c/Sequence/Genome.fasta) is the path to the entire genome sequence in fasta format, the second (../../Data/Affymetrix/S.cerevisiae_tiling.fasta) is the path to the probe sequences in fasta format - as previously generated - and the last one, NMmp2g.out is the output file into which results from MUMmer will be stored. The last part NMmp2g.log is just a redirection of the standard output (of MUMmer) into the file NMmp2g.log.

So, if everythink worked fine, you should have a file NMmp2g.out that looks like that:

$head -10 NMmp2g.out
> 0_0  Len = 25
> 0_0 Reverse  Len = 25
> 1_0  Len = 25
> 1_0 Reverse  Len = 25
> 2_0  Len = 25
> 2_0 Reverse  Len = 25
> 3_0  Len = 25
> 3_0 Reverse  Len = 25
> 4_0  Len = 25
> 4_0 Reverse  Len = 25

As you can see, the output of MUMmer needs to be reformated in order to create the probe annotation files. In the above extract of the output, you can note that none of the probes match the genome sequence. When a probe have a match on the genome, we have something like that:

> 31_3  Len = 25
  chrVII     707119         1        25
  chrII      643009         1        25
  chrXVI     775770         1        25
  chrXVI     435897         1        25

where each line after the header > 31_3 Len = 25 (that gives you the probe id, as previously defined), gives details on the location and the nature of the match.

Jean-Baptiste Veyrieras 2010-05-28