MUMmer
have been installed, the first step consists in mapping the probe sequences onto the genome sequence. Go into folder S1/BY_S288
and use the script NMmp2g
as follows:
$NMmp2g ../../Data/BY_S288c/Sequence/Genome.fasta \ ../../Data/Affymetrix/S.cerevisiae_tiling.fasta \ NMmp2g.out 2>NMmp2g.logwhere the first argument (here ../../Data/BY_S288c/Sequence/Genome.fasta) is the path to the entire genome sequence in fasta format, the second (../../Data/Affymetrix/S.cerevisiae_tiling.fasta) is the path to the probe sequences in fasta format - as previously generated - and the last one, NMmp2g.out is the output file into which results from
MUMmer
will be stored. The last part NMmp2g.log is just a redirection of the standard output (of MUMmer
) into the file NMmp2g.log.
So, if everythink worked fine, you should have a file NMmp2g.out that looks like that:
$head -10 NMmp2g.out > 0_0 Len = 25 > 0_0 Reverse Len = 25 > 1_0 Len = 25 > 1_0 Reverse Len = 25 > 2_0 Len = 25 > 2_0 Reverse Len = 25 > 3_0 Len = 25 > 3_0 Reverse Len = 25 > 4_0 Len = 25 > 4_0 Reverse Len = 25
As you can see, the output of MUMmer
needs to be reformated in order to create the probe annotation files. In the above extract of the output, you can note that none of the probes match the genome sequence. When a probe have a match on the genome, we have something like that:
> 31_3 Len = 25 chrVII 707119 1 25 chrII 643009 1 25 chrXVI 775770 1 25 chrXVI 435897 1 25
where each line after the header > 31_3 Len = 25
(that gives you the probe id, as previously defined), gives details on the location and the nature of the match.
Jean-Baptiste Veyrieras 2010-05-28