NMcl2tab
will help us to extract all the probe hybridization
measurements from the .CEL
files and to format them into a useful table. Before running the program,
we need to write a short configuration file that will indicate to NMcl2tab
which files he has to
consider. The configuration file consists in a table of two columns: the left column is for the id of
the experiment when the right column gives the name of the corresponding file.
If you have a look inside directory Data/BY_S228c/Array you should see a file called NucOccupancy.conf. Let's have a look at this file:
$cat Data/BY_S228c/Array/NucOccupancy.conf CBY01 CBY01-II.CEL CBY02 CBY02-I.CEL CBY08 CBY08-II.CEL
So, here we will extract the probe hybrization signal from 3 .CEL
files and each dataset will have
its own unique name (CBY01, CBY02 and CBY08). Let's do it:
$ NMcl2tab -c Data/BY_S288c/Array/NucOccupancy.conf \ -d Data/BY_S288c/Array/ -p S1/BY_S288c/BY_S288c.prb \ -o Data/BY_S288c/Array/NucOccupancy_raw.txt -- extractAffy -- CEL description from file [ Data/BY_S288c/Array/NucOccupancy.conf ] CEL directory [ Data/BY_S288c/Array/ ] Probe file [ S1/BY_S288c/BY_S288c.prb ] -- Read probe set -- Chromosome chrIII : 72257 probes [ OK ] Chromosome chrII : 192607 probes [ OK ] Chromosome chrI : 46546 probes [ OK ] Chromosome chrIV : 355204 probes [ OK ] Chromosome chrIX : 100626 probes [ OK ] Chromosome chrVIII : 126301 probes [ OK ] Chromosome chrVII : 255874 probes [ OK ] Chromosome chrVI : 61945 probes [ OK ] Chromosome chrV : 134902 probes [ OK ] Chromosome chrXIII : 219184 probes [ OK ] Chromosome chrXII : 242748 probes [ OK ] Chromosome chrXI : 162453 probes [ OK ] Chromosome chrXIV : 182756 probes [ OK ] Chromosome chrX : 170169 probes [ OK ] Chromosome chrXVI : 220699 probes [ OK ] Chromosome chrXV : 257613 probes [ OK ] -- Import data from .CEL files -- For experiment CBY01: Read data from CEL file CBY01-II.CEL: [ OK ] For experiment CBY02: Read data from CEL file CBY02-I.CEL: [ OK ] For experiment CBY08: Read data from CEL file CBY08-II.CEL: [ OK ] -- Export data --
So now, you should have a new file in Data/BY_S288c/Array called NucOccupancy_raw.txt. Let's have a look at this file:
$head -10 Data/BY_S288c/Array/NucOccupancy_raw.txt chromosome position CBY01 CBY02 CBY08 chrIII 38 1.470300e+04 5.044000e+03 1.227300e+04 chrIII 42 7.693000e+03 1.226100e+04 1.193100e+04 chrIII 226 2.153200e+04 3.195700e+04 3.193200e+04 chrIII 230 7.490000e+03 2.793000e+03 1.007000e+04 chrIII 234 2.126100e+04 3.004300e+04 3.353300e+04 chrIII 238 9.659000e+03 4.564000e+03 8.916000e+03 chrIII 242 1.642300e+04 2.709500e+04 2.464800e+04 chrIII 278 1.581800e+04 6.362000e+03 1.652100e+04 chrIII 358 7.156000e+03 2.731000e+03 7.374000e+03which is in fact a simple table with 5 columns: the two first columns indicate the genomic coordinates of the (probe) data point and the following columns (here 3) report the hybridization intensity as given in each original
.CEL
file. Note that the names of the 3 last columns correspond to the
ones specified in the configuration file NucOccupancy.conf.
Jean-Baptiste Veyrieras 2010-05-28