The .c2c file

The file S3/BY_RM_gxcomp.c2c is a simple table that describes how the genome sequence can be aligned. This file will be very usefull in the next step of this tutorial. Let's have a look to its content:

$head -10 S3/BY_RM_gxcomp.c2c
chrVIII 12680 14779 c + - supercontig_1.12 1 2835 4934
chrVIII 14780 14780 c + i supercontig_1.12 1 4934 4934
chrVIII 14781 15406 c + - supercontig_1.12 1 4935 5560
chrVIII 15407 15407 c + i supercontig_1.12 1 5560 5560
chrVIII 15408 17132 c + - supercontig_1.12 1 5561 7285
chrVIII 17132 17132 c + d supercontig_1.12 1 7286 7286
chrVIII 17133 18415 c + - supercontig_1.12 1 7287 8569
chrVIII 18416 18416 c + i supercontig_1.12 1 8569 8569
chrVIII 18417 18566 c + - supercontig_1.12 1 8570 8719
chrVIII 18566 18566 c + d supercontig_1.12 1 8720 8720

which is a plain text file organized as a table of 10 columns (separated by a whitespace):

  1. the name of the reference sequence,
  2. when column 6 is:
  3. when column 6 is:
  4. the nature of the alignment (c for cis and t for trans, see above),
  5. the strand of the alignment (+ for normal, - for reverse),
  6. the nature of the event:
  7. the name of the query sequence,
  8. the strand of the alignment (1 for normal, -1 for reverse),
  9. when column 6 is:
  10. when column 6 is:

Finally, you can use the R function NMgcxplot from the nucleominer R package to visualy inspect the alignements (e.g. see Figure 5).

Figure:
\includegraphics[scale=0.35]{figures/gcxplot.eps}

You can also plot the SNP density along the entire genome by using another R function from the nucleominer R package, namely NMsnpxplot. This function needs a file that gives the size (in bp) of the chromosomes we want to plot. This file can be simply created by using an single line perl function as follows:

$cat Data/BY_S288c/Sequence/Genome.fasta | perl -ane \
           
          'if (/^>(.+)/){print $seqid," ",$size,"\n" if ($size); \ 
           $seqid=$1;$size=0;}else{$size+=scalar(split(//));} \
           END{print $seqid," ",$size,"\n";}' \
           > Data/BY_S288c/Sequence/Genome.size

We need also to slightly reformat the .map file as follows:

$grep -v '^>' S3/BY_RM_gxcomp.map | grep -v '\*' > S3/BY_RM_gxcomp.txt

Now, we have all the required files to run NMsnpxplot. Result is depicted in Figure 6.

Figure:
\includegraphics[scale=0.35]{figures/snpxplot.eps}

Jean-Baptiste Veyrieras 2010-05-28