Most of the program implemented in NucleoMiner used a binary version of the
tiling array datasets. To convert a tiling array dataset, as obtained previously,
we will use the program NMtfb
. This program requires a small configuration
file that describes the organization of the dataset and if some simple pre-processing
is needed to create the final dataset (that will then be used in subsequent analyses
with NucleoMiner).
Since the next step of the tutorial deals with the inference of nucleosome occupancy, we will then prepare the corresponding dataset. As we will see in the next section, the inference of nucleosome occupancy is perfomed by fitting a Hidden Markov Model (HMM) to the average hybrization signal among replicates (and of course per strain).
The program NMtfb
can be directly used to compute the average hybrization intensities
and to store the result in an appropriate binary file that could be used in the next step.
To do so, we need first to create the configuration file. For simplicity, we have created this
file for you: Data/BY_S288c/Array/NucOccupancy.dbconf. Let's have a look at its content
cat Data/BY_S288c/Array/NucOccupancy.dbconf ############################ # NMtdb configuration file # ############################ ## # The name of the dataset ## name = BY_NucOccupancy ## # Number of arrays to consider ## narray = 3 ## # Membership of the arrays ## groupId = 1 1 1 ## # Pre-processing algorithm # full = do nothing # mean = compute the mean across replicates (per group) # median = compute the median accross replicates (per group) ## type = mean
As you can, by using the mean
tag in the pre-processing section of
the configuration file, we will ask NMtdb
to compute for each probe
of the input dataset the mean hybridization value and to consider it as the final
datapoint. Let's run NMtdb
:
$NMtdb -i Data/BY_S288c/Array/NucOccupancy_norm.txt \ -c Data/BY_S288c/Array/NucOccupancy.dbconf \ -o Data/BY_S288c/Array/NucOccupancy_norm.db -- tilingdb -- Read configuration file: [ OK ] -- Read data file: [ OK ]. -- Save data: [ OK ] --
Note that the Read data file:
step can take some times depending on how big is the
input dataset. The program has then created a new file Data/BY_S288c/Array/NucOccupancy_norm.db which
is in fact a binary file,
$file Data/BY_S288c/Array/NucOccupancy_norm.db Data/BY_S288c/Array/NucOccupancy_norm.db: data
So, don't try to open this file with a text editor or something else: it is very important to not
modify this file by hand, otherwise you will experience some troubles in subsequent analyses that
require this file. Besides, since this file is a binary file, it is specific to your machine
architecture and operating system, so it is highly recommanded to not send this file to other users,
unless they have exactly the same machine architecture and operating system. Unless, send the original
input file togeter with the configuration file and the corresponding arguments of NMtdb
.
Finally, we can have a look to the content of Data/BY_S288c/Array/NucOccupancy_norm.db by using the
flag option --print
of NMtdb
as follows:
$NMtdb -i Data/BY_S288c/Array/NucOccupancy_norm.db --print | head -10 chromosome position A1 chrIII 38 1.332212e+01 chrIII 42 1.333482e+01 chrIII 226 1.473298e+01 chrIII 230 1.263931e+01 chrIII 234 1.470365e+01 chrIII 238 1.290735e+01 chrIII 242 1.438298e+01 chrIII 278 1.357645e+01 chrIII 358 1.241799e+01
As you can see there is only one column of hybrization intensity values since we ask NMtdb
to
compute the mean from the three replicates.