Before aligning our nucleosome occupancy profiles (as computed in the previous section), we need first to align the DNA sequences of the two strains. We will then use the sequence alignments to initiate the nucleosome alignment. Note that it is difficult to say something about nucleosomes in regions where sequences have poor matches.
In this section, we will assume that the sequences of our strains are of relatively good quality and that the divergence between the two strains is modest, with eventually just a few and small genomic rearrangements (translocation, inversion, etc...). We will then use MUMmer to perform the sequence alignments, using one strain as the reference and the other as the query. However, the output of MUMmer needs to be reprocessed and reformated in order to get a clean picture of the whole alignment. This entire process can be realized by one NucleoMiner program, namely NMgxcomp
, as follows:
$NMgxcomp Data/BY_S288c/Sequence/Genome.fasta \ Data/RM_11-1a/Sequence/Genome.fasta \ S3/BY_RM 2>S3/BY_RM.logwhere the firt argument Data/BY_S288c/Sequence/Genome.fasta of
NMgxcomp
gives the genome
sequence in fasta format of the strain used as reference (here BY), the second argument Data/RM_11-1a/Sequence/Genome.fasta gives the genome sequence in fasta format of the strain
used as the query (here RM) and the last argument S3/BY_RM indicates the stem name/path of the
output files.
As you can see, the folder S3 contains now a bunch of files prefixed by BY_RM. Some are direct
outputs from MUMmer programs, while those containing the suffix _gxcomp
have been created by
the script NMmum2gxc
of NucleoMiner. Here, we will describe only these files.