Aligning genome sequences

Before aligning our nucleosome occupancy profiles (as computed in the previous section), we need first to align the DNA sequences of the two strains. We will then use the sequence alignments to initiate the nucleosome alignment. Note that it is difficult to say something about nucleosomes in regions where sequences have poor matches.

In this section, we will assume that the sequences of our strains are of relatively good quality and that the divergence between the two strains is modest, with eventually just a few and small genomic rearrangements (translocation, inversion, etc...). We will then use MUMmer to perform the sequence alignments, using one strain as the reference and the other as the query. However, the output of MUMmer needs to be reprocessed and reformated in order to get a clean picture of the whole alignment. This entire process can be realized by one NucleoMiner program, namely NMgxcomp, as follows:

$NMgxcomp Data/BY_S288c/Sequence/Genome.fasta \
          Data/RM_11-1a/Sequence/Genome.fasta \
          S3/BY_RM 2>S3/BY_RM.log

where the firt argument Data/BY_S288c/Sequence/Genome.fasta of NMgxcomp gives the genome sequence in fasta format of the strain used as reference (here BY), the second argument Data/RM_11-1a/Sequence/Genome.fasta gives the genome sequence in fasta format of the strain used as the query (here RM) and the last argument S3/BY_RM indicates the stem name/path of the output files.

As you can see, the folder S3 contains now a bunch of files prefixed by BY_RM. Some are direct outputs from MUMmer programs, while those containing the suffix _gxcomp have been created by the script NMmum2gxc of NucleoMiner. Here, we will describe only these files.

Subsections

Jean-Baptiste Veyrieras 2010-05-28