Averaging nucleosome occupancy profiles around genomic features

An interesting issue with nucleosome occupancy profile is to look at the pattern of nucleosome density around specific genomic features like for example the transcriptions start sites (TSS) or end sites (TES). We can also be interested in computing average nucleosome occupancy profiles over cis-regions like the entire cis-region of genes. To do so we need first to reformat the output of NMhmmvit into a tiling array dataset like format. This is done by the perl script NMnuc2toc as follows:

$NMnuc2toc -i S2/BY_S288c/NMhmmvit 1>S2/BY_S288c/BY_S288c_nuc.txt

Then, as for the original dataset, we have now to convert this file into a .db file by using NMtdb:

$NMtdb -i S2/BY_S288c/BY_S288c_nuc.txt \
       -c S2/BY_S288c/BY_S288c_nuc.dbconf \
       -o S2/BY_S288c/BY_S288c_nuc.db

Now, we will create a new folder S2/BY_S288c/Features to store the results for any feature we want to investigate. Let's look first at the gene transcript boundaries (TSS and TES):

$mkdir -p S2/BY_S288c/Features/Transcript
$echo "transcrpit ." >S2/BY_S288c/Features/Transcript/feature_id.txt

The file S2/BY_S288c/Features/Transcript/feature_id.txt is required by NMt2feat. It tells to the program that we are interested only in feature of type transcript, whatever the ID of the feature (that's why we used the wildcard '.' after the transcript tag).



Subsections
Jean-Baptiste Veyrieras 2010-05-28