NMhmmvit

Once the .nhw files have been generated by NMhmmfit, you can use first NMhmmvit to have a look at the content of these binary files. To do so, just type the following line:

$NMhmmvit -i Data/BY_S288c/Array/NucOccupancy_norm.db \
         -r S2/BY_S288c/NMhmmfit/ -c chrI \
         --print 1>S2/BY_S288c/NMhmmfit/chrI.txt

where the output is redirected to the file S2/BY_S288c/NMhmmfit/chrI.txt. Let's look at this file, at least at its header:

$head -1 S2/BY_S288c/NMhmmfit/chrI.txt
chromosome probe position virtual ignore chunk mu(0) var(0) mu(1) var(1) pr(in,0) pr(in,0)
pr(in,0) pr(in,0) pr(in,0) pr(in,0) pr(in,0) pr(in,0)
pr(in,0) pr(out,0) pr(out,0) pr(out,0) pr(out,31) pr(out,32)
pr(out,33) pr(out,34) pr(out,35) pr(out,36) pr(out,37) pr(out,77)
pr(0,0) pr(0,1) pr(0,2)

As you can see, the file is organized as a table which columns are seperated by single whitespace. The three columns (chromosome, probe and position) give the location of the current probe, the fourth column, virtual, is either '0' (for no) or '1' (for yes) depending on the status of the current probe. A probe is said to be ``virtual'' if no probe has been reported at that position so that results for that position have been imputed by the HMM. The fith column, ignore is either 'y' (for yes) or 'n' (for no). If a probe is flagged as to be ignored, it generally means that NMhmmfit was unable to fit the HMM around this probe. The sixth column, chunk, gives the index of the chromosome chunk to which the probe belongs. In fact, it is common that the HMM cannot be really run on the entire chromosome (e.g. due to the presence of long repeat regions that introduce a long break within the tiling coverage) but rather on contiguous pieces of the chromosome, here denominated as chunks. Then, the following columns depends on the HMM parametrization. The four first columns, (mu(0), var(0), mu(1), var(1), give the mean and variance estimates for the two gaussian distributions used in the HMM, one for the linker state (mu(0), var(0)) and one for the nucleosomal state, either well-positioned or fuzzy (mu(1), var(1)). The following 23 columns correspond to the estimates - at the current probe - of the transition probabilities of the HMM. Since we used here the default parametrization of the HMM (i.e, at least 31 probes and at most 38 probes for a well-positionned nucleosome), these 23 columns are ordered as described in Figure 1.

**Figure:**
$\includegraphics[scale=0.75]{figures/hmm_scheme.eps}$

So, as you can see, we have all the information in hands to determine the most likely positions of the nucleosomes along the chromosome. This is done by applying the Viterbi algorithm as implemented in NMhmmvit. Let's do it:

$NMhmmvit -i Data/BY_S288c/Array/NucOccupancy_norm.db \
          -r S2/BY_S288c/NMhmmfit -c chrI \
          -o S2/BY_S288c/NMhmmvit 2>/dev/null
$head -10 S2/BY_S288c/NMhmmvit/chrI.nuc
chromosome probe position virtual ignore chunk calling s1 p1 p2 p3
chrI 1 55 0 n 1 0 10.0845 0.000 0.585 0.414
chrI 2 59 0 n 1 0 10.0594 0.000 0.553 0.447
chrI 3 63 0 n 1 0 10.0315 0.333 0.369 0.298
chrI 4 67 0 n 1 0 9.88305 0.500 0.277 0.223
chrI 5 71 0 n 1 0 9.72784 0.600 0.221 0.179
chrI 6 75 0 n 1 0 9.59281 0.667 0.184 0.149
chrI 7 79 0 n 1 0 9.47126 0.714 0.158 0.128
chrI 8 83 0 n 1 0 9.36506 0.750 0.138 0.112
chrI 9 87 0 n 1 0 9.27393 0.778 0.123 0.099

Now, you should have a new directory S2/BY_S288c/NMhmmvit/ which contains a new file chrI.nuc. As you can see, this file is again a simple table which columns are seperated by a single whitespace. As for the --print option, the first six columns give details about the current probe location and its status. Follow four columns that summarize the result of NMhmmvit:

calling is 0,1,2 depending on the most likely state at the current probe,
- 0 linker,
- 1 fuzzy nucleosome,
- 2 well-localized nucleosome,
s1 is the smoothed hybridization signal using the output of the HMM,
p1 is the posterior probability that the current probe belongs to a linker,
p2 is the posterior probability that the current probe belongs to a fuzzy nucleosome,
p3 is the posterior probability that the current probe belongs to a well-localized nucleosome.

$NMhmm2plot -i S2/BY_S288c/NMhmm2plot.config \
            -c chrI -s 100000 -e 105000 -o S2/BY_S288c/

This creates two text file (.txt) in the directory S2/BY_S288c/ with the same prefix obtained by concatenating the name of the chromosome and the starting and the ending positions, namely chrI_100000_105000. The file suffixed by _d contains a table of 5 columns separated by a single whitespace,

$ head -10 S2/BY_S288c/chrI_100000_105000_d.txt
100003 6.61 8.25 0.79 1.00
100007 NA 8.27 0.79 1.00
100011 NA 8.28 0.80 1.00
100015 8.73 8.27 0.79 1.00
100019 6.52 8.15 0.73 1.00
100023 7.39 8.11 0.72 1.00
100027 6.85 8.09 0.71 1.00
100031 7.00 8.08 0.70 1.00
100035 6.84 8.08 0.70 1.00
100039 6.42 8.08 0.71 1.00

where the first column is the position of the interrogated point, the second one the average hybrization signal for the corresponding probe (NA when missing), the third one the smoothed signal (by the HMM), the fourth one gives the probability to be within a nucleosome (either fuzzy or delocalized) and the fifth column the most likely state at this point (as computed by NMhmmvit).

As depicted in Figure 2, we can then use the NucleoMiner R package to plot these data by using the R function NMhmm2plot (see the documentation for more details).

**Figure:**
$\includegraphics[scale=0.35]{figures/hmm2plot_1.eps}$

Jean-Baptiste Veyrieras 2010-05-28