#! /bin/bash ## # Arguments ## strain=$1 # The name of the strain genome_fasta=$2 # The genome sequence in fasta format probe_fasta=$3 # The probe sequences in fasta format array_size=$4 # The dimension of the array (yeast tiling array = 2560) output_dir=$5 # The output directory output_dir=${output_dir}/${strain} ## # Create the output directory ## mkdir -p $output_dir ## # Map the probes on the genome using MUMmer ## echo -n "Map probes on the genome: ..." NMmp2g $genome_fasta $probe_fasta $output_dir/NMmp2g.out 2>$output_dir/NMmp2g.log echo " [ OK ]" # Extract the probes that have matche(s) on the genome and output the result # by chromosome and by strand (+/-) into folder NMmpa echo -n "Extract and format probe matches: ..." NMmpa -i $output_dir/NMmp2g.out -o $output_dir/NMmpa -a $array_size echo " [ OK ]" # Now, extract only probes that have a unique perfect match # on the genome (both strands) and remobed probe which do not match at an # expected tiling step (4bp). echo -n "Extract and format PM probes: ..." grep '^>' $genome_fasta | sed 's/^>//' > $output_dir/seqid.txt NMepm -i $output_dir/NMmpa -c $output_dir/seqid.txt -o $output_dir/NMepm -t 4 2>$output_dir/NMepm.log awk '{print $4" "$9" "$1" "$2}' ${output_dir}/NMepm/*.prb > ${output_dir}/${strain}.prb echo " [ OK ]"
This bash script has been implemented in the program NMstrain2array
that we can use now to get the list of PM probes on RM_11-1a (RM) genome.
$NMstrain2array RM_11-1a ./Data/RM_11-1a/Sequence/Genome.fasta \ ./Data/Affymetrix/S.cerevisiae_tiling.fasta 2560 S1 Map probes on the genome: ... [ OK ] Extract and format probe matches: ... [ OK ] Extract and format PM probes: ... [ OK ]
Finally, we have now two files, BY_S288c.prb and RM_11-1a.prb that give us the probes that have a unique PM on each genome. We can then get an idea of the number of probes that are shared between the two strains as follows:
$cat S1/BY_S288c/BY_S288c.prb S1/RM_11-1a/RM_11-1a.prb \ | cut -d ' ' -f 3,4 | sort | uniq -c \ | awk '{print $1}' | sort | uniq -c 368639 1 2501942 2So,BY and RM share 2,501,942 probes and there are 368,639 probes that are specific to either BY and RM. We will now use these two files for the next section.
Jean-Baptiste Veyrieras 2010-05-28