.nhw
files have been generated by NMhmmfit
, you can use first
NMhmmvit
to have a look at the content of these binary files. To do so, just type the following line:
$NMhmmvit -i Data/BY_S288c/Array/NucOccupancy_norm.db \ -r S2/BY_S288c/NMhmmfit/ -c chrI \ --print 1>S2/BY_S288c/NMhmmfit/chrI.txt
where the output is redirected to the file S2/BY_S288c/NMhmmfit/chrI.txt. Let's look at this file, at least at its header:
$head -1 S2/BY_S288c/NMhmmfit/chrI.txt chromosome probe position virtual ignore chunk mu(0) var(0) mu(1) var(1) pr(in,0) pr(in,0) pr(in,0) pr(in,0) pr(in,0) pr(in,0) pr(in,0) pr(in,0) pr(in,0) pr(out,0) pr(out,0) pr(out,0) pr(out,31) pr(out,32) pr(out,33) pr(out,34) pr(out,35) pr(out,36) pr(out,37) pr(out,77) pr(0,0) pr(0,1) pr(0,2)
As you can see, the file is organized as a table which columns are seperated by single whitespace. The three columns (chromosome, probe and position) give the location of the current probe, the fourth column, virtual, is either '0' (for no) or '1' (for yes) depending on the status of the current probe. A probe is said to be ``virtual'' if no probe has been reported at that position so that results for that position have been imputed by the HMM. The fith column, ignore is either 'y' (for yes) or 'n' (for no). If a probe is flagged as to be ignored, it generally means that NMhmmfit
was unable to fit the HMM around this probe. The sixth column, chunk, gives the index of the chromosome chunk to which the probe belongs. In fact, it is common that the HMM cannot be really run on the entire chromosome (e.g. due to the presence of long repeat regions that introduce a long break within the tiling coverage) but rather on contiguous pieces of the chromosome, here denominated as chunks. Then, the following columns depends on the HMM parametrization. The four first columns, (mu(0), var(0), mu(1), var(1), give the mean and variance estimates for the two gaussian distributions used in the HMM, one for the linker state (mu(0), var(0)) and one for the nucleosomal state, either well-positioned or fuzzy (mu(1), var(1)). The following 23 columns correspond to the estimates - at the current probe - of the transition probabilities of the HMM. Since we used here the default parametrization of the HMM (i.e, at least 31 probes and at most 38 probes for a well-positionned nucleosome), these 23 columns are ordered as described in Figure 1.
So, as you can see, we have all the information in hands to determine the most likely
positions of the nucleosomes along the chromosome. This is done by applying the Viterbi
algorithm as implemented in NMhmmvit
. Let's do it:
$NMhmmvit -i Data/BY_S288c/Array/NucOccupancy_norm.db \ -r S2/BY_S288c/NMhmmfit -c chrI \ -o S2/BY_S288c/NMhmmvit 2>/dev/null $head -10 S2/BY_S288c/NMhmmvit/chrI.nuc chromosome probe position virtual ignore chunk calling s1 p1 p2 p3 chrI 1 55 0 n 1 0 10.0845 0.000 0.585 0.414 chrI 2 59 0 n 1 0 10.0594 0.000 0.553 0.447 chrI 3 63 0 n 1 0 10.0315 0.333 0.369 0.298 chrI 4 67 0 n 1 0 9.88305 0.500 0.277 0.223 chrI 5 71 0 n 1 0 9.72784 0.600 0.221 0.179 chrI 6 75 0 n 1 0 9.59281 0.667 0.184 0.149 chrI 7 79 0 n 1 0 9.47126 0.714 0.158 0.128 chrI 8 83 0 n 1 0 9.36506 0.750 0.138 0.112 chrI 9 87 0 n 1 0 9.27393 0.778 0.123 0.099
Now, you should have a new directory S2/BY_S288c/NMhmmvit/ which contains a new file
chrI.nuc. As you can see, this file is again a simple table which columns are seperated
by a single whitespace. As for the --print
option, the first six columns give details
about the current probe location and its status. Follow four columns that summarize the result
of NMhmmvit
:
$NMhmm2plot -i S2/BY_S288c/NMhmm2plot.config \ -c chrI -s 100000 -e 105000 -o S2/BY_S288c/This creates two text file (
.txt
) in the directory S2/BY_S288c/ with the same prefix obtained
by concatenating the name of the chromosome and the starting and the ending positions, namely
chrI_100000_105000. The file suffixed by _d
contains a table of 5 columns separated by a single
whitespace,
$ head -10 S2/BY_S288c/chrI_100000_105000_d.txt 100003 6.61 8.25 0.79 1.00 100007 NA 8.27 0.79 1.00 100011 NA 8.28 0.80 1.00 100015 8.73 8.27 0.79 1.00 100019 6.52 8.15 0.73 1.00 100023 7.39 8.11 0.72 1.00 100027 6.85 8.09 0.71 1.00 100031 7.00 8.08 0.70 1.00 100035 6.84 8.08 0.70 1.00 100039 6.42 8.08 0.71 1.00where the first column is the position of the interrogated point, the second one the average hybrization signal for the corresponding probe (
NA
when missing), the third one the smoothed signal (by the HMM), the fourth one gives the probability to be within a nucleosome (either fuzzy or delocalized) and the fifth column the most likely state at this point (as computed by NMhmmvit
).
As depicted in Figure 2, we can then use the NucleoMiner R package to plot these data by using the R function NMhmm2plot
(see the documentation for more details).
Jean-Baptiste Veyrieras 2010-05-28