NMcl2tab

The C program NMcl2tab will help us to extract all the probe hybridization measurements from the .CEL files and to format them into a useful table. Before running the program, we need to write a short configuration file that will indicate to NMcl2tab which files he has to consider. The configuration file consists in a table of two columns: the left column is for the id of the experiment when the right column gives the name of the corresponding file.

If you have a look inside directory Data/BY_S228c/Array you should see a file called NucOccupancy.conf. Let's have a look at this file:

$cat Data/BY_S228c/Array/NucOccupancy.conf
CBY01   CBY01-II.CEL
CBY02   CBY02-I.CEL
CBY08   CBY08-II.CEL

So, here we will extract the probe hybrization signal from 3 .CEL files and each dataset will have its own unique name (CBY01, CBY02 and CBY08). Let's do it:

$ NMcl2tab -c Data/BY_S288c/Array/NucOccupancy.conf \
           -d Data/BY_S288c/Array/ -p S1/BY_S288c/BY_S288c.prb \
           -o Data/BY_S288c/Array/NucOccupancy_raw.txt
--
 extractAffy
--
 CEL description from file [ Data/BY_S288c/Array/NucOccupancy.conf ]
 CEL directory             [ Data/BY_S288c/Array/ ]
 Probe file                [ S1/BY_S288c/BY_S288c.prb ]
--
 Read probe set
--
Chromosome chrIII : 72257 probes [ OK ]
Chromosome chrII : 192607 probes [ OK ]
Chromosome chrI : 46546 probes [ OK ]
Chromosome chrIV : 355204 probes [ OK ]
Chromosome chrIX : 100626 probes [ OK ]
Chromosome chrVIII : 126301 probes [ OK ]
Chromosome chrVII : 255874 probes [ OK ]
Chromosome chrVI : 61945 probes [ OK ]
Chromosome chrV : 134902 probes [ OK ]
Chromosome chrXIII : 219184 probes [ OK ]
Chromosome chrXII : 242748 probes [ OK ]
Chromosome chrXI : 162453 probes [ OK ]
Chromosome chrXIV : 182756 probes [ OK ]
Chromosome chrX : 170169 probes [ OK ]
Chromosome chrXVI : 220699 probes [ OK ]
Chromosome chrXV : 257613 probes [ OK ]
--
 Import data from .CEL files
--
For experiment CBY01:
  Read data from CEL file CBY01-II.CEL: [ OK ]
For experiment CBY02:
  Read data from CEL file CBY02-I.CEL: [ OK ]
For experiment CBY08:
  Read data from CEL file CBY08-II.CEL: [ OK ]
--
 Export data
--

So now, you should have a new file in Data/BY_S288c/Array called NucOccupancy_raw.txt. Let's have a look at this file:

$head -10 Data/BY_S288c/Array/NucOccupancy_raw.txt
chromosome      position        CBY01   CBY02   CBY08
chrIII  38      1.470300e+04    5.044000e+03    1.227300e+04
chrIII  42      7.693000e+03    1.226100e+04    1.193100e+04
chrIII  226     2.153200e+04    3.195700e+04    3.193200e+04
chrIII  230     7.490000e+03    2.793000e+03    1.007000e+04
chrIII  234     2.126100e+04    3.004300e+04    3.353300e+04
chrIII  238     9.659000e+03    4.564000e+03    8.916000e+03
chrIII  242     1.642300e+04    2.709500e+04    2.464800e+04
chrIII  278     1.581800e+04    6.362000e+03    1.652100e+04
chrIII  358     7.156000e+03    2.731000e+03    7.374000e+03
which is in fact a simple table with 5 columns: the two first columns indicate the genomic coordinates of the (probe) data point and the following columns (here 3) report the hybridization intensity as given in each original .CEL file. Note that the names of the 3 last columns correspond to the ones specified in the configuration file NucOccupancy.conf.

Jean-Baptiste Veyrieras 2010-05-28