Skip to content. | Skip to navigation

Personal tools

You are here: Home / Publications / 2007


A new advance in alternative splicing databases: from catalogue to detailed analysis of regulation of expression and function of human alternative splicing variants

Author(s) : de la Grange P, Dutertre M, Correa M, Auboeuf D,
Journal : BMC Bioinformatics

Adherens junction remodeling by the Notch pathway in Drosophila melanogaster oogenesis.

Author(s) : Grammont M,
Journal : J Cell Biol
Identifying genes involved in the control of adherens junction (AJ) remodeling is essential to understanding epithelial morphogenesis. During follicular epithelium development in Drosophila melanogaster, the main body follicular cells (MBFCs) are displaced toward the oocyte and become columnar. Concomitantly, the stretched cells (StCs) become squamous and flatten around the nurse cells. By monitoring the expression of epithelial cadherin and Armadillo, I have discovered that the rate of AJ disassembly between the StCs is affected in follicles with somatic clones mutant for fringe or Delta and Serrate. This results in abnormal StC flattening and delayed MBFC displacement. Additionally, accumulation of the myosin II heavy chain Zipper is delayed at the AJs that require disassembly. Together, my results demonstrate that the Notch pathway controls AJ remodeling between the StCs and that this role is crucial for the timing of MBFC displacement and StC flattening. This provides new evidence that Notch, besides playing a key role in cell differentiation, also controls cell morphogenesis.

Alternative RNA splicing complexes containing the scaffold attachment factor SAFB2

Author(s) : Sergeant K, Bourgeois C, Dalgliesh C, Venables J, Stevenin J, Elliott D,
Journal : J Cell Sci

Antagonistic functions of SET-2/SET1 and HPL/HP1 proteins in C. elegans development.

Author(s) : Simonet T, Dulermo R, Schott S, Palladino F,
Journal : Dev Biol
Cellular identity during metazoan development is maintained by epigenetic modifications of chromatin structure brought about by the activity of specific proteins which mediate histone variant incorporation, histone modifications, andnucleosome remodeling. HP1 proteins directly influence gene expression by modifying chromatin structure. We previously showed that the Caenorhabditis elegans HP1 proteins HPL-1 and HPL-2 are required for several aspects of post-embryonic development. To gain insight into how HPL proteins influence geneexpression in a developmental context, we carried out a candidate RNAi screen toidentify suppressors of hpl-1 and hpl-2 phenotypes. We identified SET-2, the homologue of yeast and mammalian SET1, as an antagonist of HPL-1 and HPL-2 activity in growth and somatic gonad development. Yeast Set1 and its mammalian counterparts SET1/MLL are H3 lysine 4 (H3K4) histone methyltransferases associated with gene activation as part of large multisubunit complexes. We showthat the nematode counterparts of SET1/MLL complex subunits also antagonize HPL function in post-embryonic development. Genetic analysis is consistent with SET1/MLL complex subunits having both shared and unique functions in development. Furthermore, as observed in other species, we find that SET1/MLL complex homologues differentially affect global H3K4 methylation. Our results suggest that HP1 and a SET1/MLL-related complex may play antagonistic roles in the epigenetic regulation of specific developmental programs.

Back to basics: the untreated rabbit reticulocyte lysate as a competitive system to recapitulate cap/poly(A) synergy and the selective advantage of IRES-driven translation.

Author(s) : Soto Rifo R, Ricci E, Decimo D, Moncorge O, Ohlmann T,
Journal : Nucleic Acids Res
Translation of most eukaryotic mRNAs involves the synergistic action between the5' cap structure and the 3' poly(A) tail at the initiation step. The poly(A) tail has also been shown to stimulate translation of picornavirus internal ribosome entry sites (IRES)-directed translation. These effects have been attributed principally to interactions between eIF4G and poly(A)-binding protein (PABP) butalso to the participation of PABP in other steps during translation initiation. As the rabbit reticulocyte lysate (RRL) does not recapitulate this cap/poly(A) synergy, several systems based on cellular cell-free extracts have been developed to study the effects of poly(A) tail in vitro but they generally exhibit low translational efficiency. Here, we describe that the non-nuclease-treated RRL (untreated RRL) is able to recapitulate the effects of poly(A) tail on translation in vitro. In this system, translation of a capped/polyadenylated RNAwas specifically inhibited by either Paip2 or poly(rA), whereas translation directed by HCV IRES remained unaffected. Moreover, cleavage of eIF4G by FMDV L protease strongly stimulated translation directed by the EMCV IRES, thus recapitulating the competitive advantage that the proteolytic processing of eIF4G confers to IRES-driven RNAs.

CD44 and beta3 integrin organize two functionally distinct actin-based domains in osteoclasts.

Author(s) : Chabadel A, Banon-Rodriguez I, Cluet D, Rudkin B, Wehrle-Haller B, Genot E, Jurdic P, Anton I, Saltel F,
Journal : Mol Biol Cell
The actin cytoskeleton of mature osteoclasts (OCs) adhering to nonmineralized substrates is organized in a belt of podosomes reminiscent of the sealing zone (SZ) found in bone resorbing OCs. In this study, we demonstrate that the belt iscomposed of two functionally different actin-based domains: podosome cores linked with CD44, which are involved in cell adhesion, and a diffuse cloud associated with beta3 integrin, which is involved in cell adhesion and contraction. WiskottAldrich Syndrome Protein (WASp) Interacting Protein (WIP)-/- OCs were devoid of podosomes, but they still exhibited actin clouds. Indeed, WIP-/- OCs show diminished expression of WASp, which is required for podosome formation. CD44 isa novel marker of OC podosome cores and the first nonintegrin receptor detected in these structures. The importance of CD44 is revealed by showing that its clustering restores podosome cores and WASp expression in WIP-/- OCs. However, although CD44 signals are sufficient to form a SZ, the presence of WIP is indispensable for the formation of a fully functional SZ.

Cell dynamics and immune response to BLV infection: a unifying model

Author(s) : Florins A, Gillet N, Asquith B, Boxus M, Burteau C, Twizere J, Urbain P, Vandermeers F, Debacq C, Sanchez-Alcaraz M, Schwartz-Cornil I, Kerkhofs P, Jean G, Th?wis A, Hay J, Mortreux F, Wattel E, Reichert M, Burny A, Kettmann R, Bangham C, Willems L,
Journal : Front Biosci

Clustering formal concepts to discover biologically relevant knowledge from gene expression data.

Author(s) : Blachon S, Pensa R, Besson J, Robardet C, Boulicaut J, Gandrillon O,
Journal : In Silico Biol
The production of high-throughput gene expression data has generated a crucial need for bioinformatics tools to generate biologically interesting hypotheses. Whereas many tools are available for extracting global patterns, less attention has been focused on local pattern discovery. We propose here an original way to discover knowledge from gene expression data by means of the so-called formal concepts which hold in derived Boolean gene expression datasets. We first encoded the over-expression properties of genes in human cells using human SAGE data. Ithas given rise to a Boolean matrix from which we extracted the complete collection of formal concepts, i.e., all the largest sets of over-expressed genes associated to a largest set of biological situations in which their over-expression is observed. Complete collections of such patterns tend to be huge. Since their interpretation is a time-consuming task, we propose a new method to rapidly visualize clusters of formal concepts. This designates a reasonable number of Quasi-Synexpression-Groups (QSGs) for further analysis. Theinterest of our approach is illustrated using human SAGE data and interpreting one of the extracted QSGs. The assessment of its biological relevancy leads to the formulation of both previously proposed and new biological hypotheses.

Consequences of genome duplication.

Author(s) : Semon M, Wolfe K,
Journal : Curr Opin Genet Dev
Polyploidy has been widely appreciated as an important force in the evolution ofplant genomes, but now it is recognized as a common phenomenon throughout eukaryotic evolution. Insight into this process has been gained by analyzing theplant, animal, fungal, and recently protozoan genomes that show evidence of whole genome duplication (a transient doubling of the entire gene repertoire of an organism). Moreover, comparative analyses are revealing the evolutionary processes that occur as multiple related genomes diverge from a shared polyploidancestor, and in individual genomes that underwent several successive rounds of duplication. Recent research including laboratory studies on synthetic polyploids indicates that genome content and gene expression can change quickly after wholegenome duplication and that cross-genome regulatory interactions are important. We have a growing understanding of the relationship between whole genome duplication and speciation. Further, recent studies are providing insights into why some gene pairs survive in duplicate, whereas others do not.

Coregulators: transducing signal from transcription to alternative splicing

Author(s) : Auboeuf D, Batsch? E, Dutertre M, Muchardt C, O'Malley B,
Journal : Trends Endocrinol Metab

Disruption of the palatal rugae pattern in Tabby (eda) mutant mice.

Author(s) : Charles C, Pantalacci S, Peterkova R, Peterka M, Laudet V, Viriot L,
Journal : Eur J Oral Sci
The eda mouse gene is linked with anomalies of ectodermal derivatives, such as hair, glands, and teeth. The palatal rugae (oral mucosa foldings on the hard palate) are also ectodermal derivatives. Therefore, we searched for and comparedpalatal rugae anomalies of Tabby mice bearing a mutation in the eda gene with their wild-type counterparts. We compared the number and shape of palatal rugae in 179 mutant and 102 wild-type mice from four different stocks of Tabby mice. Palatal rugae anomalies were documented at a low frequency in wild-type mice of different backgrounds, which may reflect a lack of robustness of palatal rugae development. However, the proportion of anomalies observed in the C57BL/6J background makes us recommend avoiding its use in further palate studies. We showed statistically that the phenotypic variability seen in wild-type animals is further increased in Tabby mutants. The anomalies mainly included various forms of reduction, with rugae IV-VI being more frequently affected. Those rugae were shortened, dotted or absent (mainly ruga V). By analogy to the role played by eda in other ectodermal derivatives, we propose that it might play a role in defining the pattern of the palatal rugae.

Efficient use of DNA molecular markers to construct industrial yeast strains.

Author(s) : Marullo P, Yvert G, Bely M, Aigle M, Dubourdieu D,
Journal : FEMS Yeast Res
Saccharomyces cerevisiae yeast strains exhibit a huge genotypic and phenotypic diversity. Breeding strategies taking advantage of these characteristics would contribute greatly to improving industrial yeasts. Here we mapped and introgressed chromosomal regions controlling industrial yeast properties, such as hydrogen sulphide production, phenolic off-flavor and a kinetic trait (lag phaseduration). Two parent strains derived from industrial isolates used in winemaking and which exhibited significant quantitative differences in these traits were crossed and their progeny (50-170 clones) was analyzed for the segregation of these traits. Forty-eight segregants were genotyped at 2212 marker positions using DNA microarrays and one significant locus was mapped for each trait. To exploit these loci, an introgression approach was supervised by molecular markers monitoring using PCR/RFLP. Five successive backcrosses between an elite strain and appropriate segregants were sufficient to improve three trait values. Microarray-based genotyping confirmed that over 95% of the elite strain genome was recovered by this methodology. Moreover, karyotype patterns, mtDNA and tetrad analysis showed some genomic rearrangements during the introgression procedure.

Genetic complexity and quantitative trait loci mapping of yeast morphological traits.

Author(s) : Nogami S, Ohya Y, Yvert G,
Journal : PLoS Genet
Functional genomics relies on two essential parameters: the sensitivity of phenotypic measures and the power to detect genomic perturbations that cause phenotypic variations. In model organisms, two types of perturbations are widelyused. Artificial mutations can be introduced in virtually any gene and allow thesystematic analysis of gene function via mutants fitness. Alternatively, naturalgenetic variations can be associated to particular phenotypes via genetic mapping. However, the access to genome manipulation and breeding provided by model organisms is sometimes counterbalanced by phenotyping limitations. Here weinvestigated the natural genetic diversity of Saccharomyces cerevisiae cellular morphology using a very sensitive high-throughput imaging platform. We quantified 501 morphological parameters in over 50,000 yeast cells from a cross between twowild-type divergent backgrounds. Extensive morphological differences were found between these backgrounds. The genetic architecture of the traits was complex, with evidence of both epistasis and transgressive segregation. We mapped quantitative trait loci (QTL) for 67 traits and discovered 364 correlations between traits segregation and inheritance of gene expression levels. We validated one QTL by the replacement of a single base in the genome. This study illustrates the natural diversity and complexity of cellular traits among natural yeast strains and provides an ideal framework for a genetical genomics dissection of multiple traits. Our results did not overlap with results previously obtainedfrom systematic deletion strains, showing that both approaches are necessary forthe functional exploration of genomes.

HTLV-1 HBZ cooperates with JunD to enhance transcription of the human telomerase reverse transcriptase gene (hTERT).

Author(s) : Kuhlmann A, Villaudy J, Gazzolo L, Castellazzi M, Mesnard J, Duc Dodon M,
Journal : Retrovirology
BACKGROUND: Activation of telomerase is a critical and late event in tumor progression. Thus, in patients with adult-T cell leukaemia (ATL), an HTLV-1 (Human T cell Leukaemia virus type 1)-associated disease, leukemic cells displaya high telomerase activity, mainly through transcriptional up-regulation of the human telomerase catalytic subunit (hTERT). The HBZ (HTLV-1 bZIP) protein coded by the minus strand of HTLV-1 genome and expressed in ATL cells has been shown to increase the transcriptional activity of JunD, an AP-1 protein. The presence of several AP-1 binding sites in the hTERT promoter led us to investigate whether HBZ regulates hTERT gene transcription. RESULTS: Here, we demonstrate using co-transfection assays that HBZ in association with JunD activates the hTERT promoter. Interestingly, the -378/+1 proximal region, which does not contain anyAP-1 site was found to be responsible for this activation. Furthermore, an increase of hTERT transcripts was observed in cells co-expressing HBZ and JunD. Chromatin immunoprecipitation (ChIP) assays revealed that HBZ, and JunD coexist in the same DNA-protein complex at the proximal region of hTERT promoter. Finally, we provide evidence that HBZ/JunD heterodimers interact with Sp1 transcription factors and that activation of hTERT transcription by these heterodimers is mediated through GC-rich binding sites for Sp1 present in the proximal sequences of the hTERT promoter. CONCLUSION: These observations establish for the first time that HBZ by intervening in the re-activation of telomerase, may contribute to the development and maintenance of the leukemic process.

Human INT6 interacts with MCM7 and regulates its stability during S phase of the cell cycle.

Author(s) : Buchsbaum S, Morris C, Bochard V, Jalinot P,
Journal : Oncogene
The mouse int6 gene is a frequent integration site of the mouse mammary tumor virus and INT6 silencing by RNA interference in HeLa cells causes an increased number of cells in the G2/M phases of the cell cycle, along with mitotic defects. In this report, we investigated the functional significance of the interaction between INT6 and MCM7, which was observed in a two-hybrid screen performed with INT6 as bait. It was found that proteasome inhibition strengthens interaction between both proteins and that INT6 stabilizes MCM7. Removal of MCM7 from chromatin as replication proceeds was accelerated in INT6-silenced cells and reduced amounts of protein were transiently observed, followed by a correction resulting from stimulation of mcm7 gene expression. Synchronized cells depleted for either INT6 or MCM7 display a reduction in thymidine incorporation and a reinforced association of RPA and claspin with chromatin. These data show that INT6 stabilizes chromatin-bound MCM7 and that alteration of this effect is associated with replication deficiency.

Human INT6/eIF3e is required for nonsense-mediated mRNA decay.

Author(s) : Morris C, Wittmann J, Jack H, Jalinot P,
Journal : EMBO Rep
The mammalian integration site 6 (INT6) protein has been implicated in breast carcinogenesis and characterized as the eIF3e non-core subunit of the translation initiation factor eIF3, but its role in this complex is not known. Here, we showthat INT6 knockdown by RNA interference strongly inhibits nonsense-mediated messenger RNA decay (NMD), which triggers degradation of mRNAs with premature stop codons. In contrast to the eIF3b core subunit, which is required for both NMD and general translation, INT6 is only necessary for the former process. Consistent with such a role, immunoprecipitation experiments showed that INT6 co-purifies with CBP80 and the NMD factor UPF2. In addition, several transcriptsknown to be upregulated by UPF1 or UPF2 depletion were also found to be sensitive to INT6 suppression. From these observations, we propose that INT6, in association with eIF3, is involved in routing specific mRNAs for degradation.

Immunological changes and cytokine gene expression during primary infection with human T-cell leukaemia virus type 1 in squirrel monkeys (Saimiri sciureus)

Author(s) : Heraud J, Merien F, Mortreux F, Mahieux R, Kazanji M,
Journal : Virology

Influence of the transposable element neighborhood on human gene expression in normal and tumor tissues.

Author(s) : Lerat E, Semon M,
Journal : Gene
Transposable elements (TEs) are genomic sequences able to replicate themselves, and to move from one chromosomal position to another within the genome. Many TEscontain their own regulatory regions, which means that they may influence the expression of neighboring genes. TEs may also be activated and transcribed in various cancers. We therefore tested whether gene expression in normal and tumortissues is influenced by the neighboring TEs. To do this, we associated all human genes to the nearest TEs. We analyzed the expression of these genes in normal and tumor tissues using SAGE and EST data, and related this to the presence and typeof TEs in their vicinity. We confirmed that TEs tend to be located in antisense orientation relative to their hosting genes. We found that the average number oftissues where a gene is expressed varies depending on the type of TEs located near the gene, and that the difference in expression level between normal and tumor tissues is greatest for genes that host SINE elements. This deregulation increases with the number of SINE copies in the gene vicinity. This suggests that SINE elements might contribute to the cascade of gene deregulation in cancer cells.

Large-scale analysis by SAGE reveals new mechanisms of v-erbA oncogene action.

Author(s) : Bresson C, Keime C, Faure C, Letrillard Y, Barbado M, Sanfilippo S, Benhra N, Gandrillon O, Gonin-Giraud S,
Journal : BMC Genomics
BACKGROUND: The v-erbA oncogene, carried by the Avian Erythroblastosis Virus, derives from the c-erbAalpha proto-oncogene that encodes the nuclear receptor for triiodothyronine (T3R). v-ErbA transforms erythroid progenitors in vitro by blocking their differentiation, supposedly by interference with T3R and RAR (Retinoic Acid Receptor). However, v-ErbA target genes involved in its transforming activity still remain to be identified. RESULTS: By using Serial Analysis of Gene Expression (SAGE), we identified 110 genes deregulated by v-ErbA and potentially implicated in the transformation process. Bioinformatic analysisof promoter sequence and transcriptional assays point out a potential role of c-Myb in the v-ErbA effect. Furthermore, grouping of newly identified target genes by function revealed both expected (chromatin/transcription) and unexpected (protein metabolism) functions potentially deregulated by v-ErbA. We then focused our study on 15 of the new v-ErbA target genes and demonstrated by real time PCRthat in majority their expression was activated neither by T3, nor RA, nor during differentiation. This was unexpected based upon the previously known role of v-ErbA. CONCLUSION: This paper suggests the involvement of a wealth of new unanticipated mechanisms of v-ErbA action.

Parasitic inhibition of cell death facilitates symbiosis.

Author(s) : Pannebakker B, Loppin B, Elemans C, Humblot L, Vavre F,
Journal : Proc Natl Acad Sci U S A
Symbiotic microorganisms have had a large impact on eukaryotic evolution, with effects ranging from parasitic to mutualistic. Mitochondria and chloroplasts areprime examples of symbiotic microorganisms that have become obligate for their hosts, allowing for a dramatic extension of suitable habitats for life. Out of the extraordinary diversity of bacterial endosymbionts in insects, most are facultative for their hosts, such as the ubiquitous Wolbachia, which manipulateshost reproduction. Some endosymbionts, however, have become obligatory for host reproduction and/or survival. In the parasitoid wasp Asobara tabida the presenceof Wolbachia is necessary for host oogenesis, but the mechanism involved is yet unknown. We show that Wolbachia influences programmed cell death processes (a host regulatory feature typically targeted by pathogens) in A. tabida, making its presence essential for the wasps' oocytes to mature. This suggests that parasitestrategies, such as bacterial regulation of host apoptosis, can drive the evolution of host dependence, allowing for a swift transition from parasitism tomutualism.

Regulation of H-ras splice variant expression by cross talk between the p53 and nonsense-mediated mRNA decay pathways

Author(s) : Barbier J, Dutertre M, Bittencourt D, Sanchez G, Gratadou L, de la Grange P, Auboeuf D,
Journal : Mol Cell Biol

SAGE analysis of mosquito salivary gland transcriptomes during Plasmodium invasion.

Author(s) : Rosinski-Chupin I, Briolay J, Brouilly P, Perrot S, Gomez S, Chertemps T, Roth C, Keime C, Gandrillon O, Couble P, Brey P,
Journal : Cell Microbiol
Invasion of the vector salivary glands by Plasmodium is a critical step for malaria transmission. To describe salivary gland cellular responses to sporozoite invasion, we have undertaken the analysis of Anopheles gambiae salivary gland transcriptome using Serial Analysis of Gene Expression (SAGE). Statistical analysis of the more than 160000 sequenced tags generated from four libraries, two from glands infected by Plasmodium berghei, two from glands of controls, revealed that at least 57 Anopheles genes are differentially expressed in infected salivary glands. Among the 37 immune-related genes identified by SAGE tags, four (Defensin1, GNBP, Serpin6 and Cecropin2) were found to be upregulatedduring salivary gland invasion, while five genes encoding small secreted proteins display induction patterns strongly reminiscent of that of Cecropin2. Invasion by Plasmodium has also an impact on the expression of genes involved in transport, lipid and energy metabolism, suggesting that the sporozoite may exploit the metabolism of its host. In contrast, protein composition of saliva is predicted to be only slightly modified after infection. This study, which is the first transcriptome analysis of the salivary gland response to Plasmodium infection, provides a basis for a better understanding of Plasmodium/Anopheles salivary gland interactions.

Single QTL mapping and nucleotide-level resolution of a physiologic trait in wine Saccharomyces cerevisiae strains.

Author(s) : Marullo P, Aigle M, Bely M, Masneuf-Pomarede I, Durrens P, Dubourdieu D, Yvert G,
Journal : FEMS Yeast Res
Natural Saccharomyces cerevisiae yeast strains exhibit very large genotypic and phenotypic diversity. However, the link between phenotype variation and genetic determinism is still difficult to identify, especially in wild populations. Using genome hybridization on DNA microarrays, it is now possible to identify single-feature polymorphisms among divergent yeast strains. This tool offers thepossibility of applying quantitative genetics to wild yeast strains. In this instance, we studied the genetic basis for variations in acetic acid production using progeny derived from two strains from grape must isolates. The trait was quantified during alcoholic fermentation of the two strains and 108 segregants derived from their crossing. A genetic map of 2212 markers was generated using oligonucleotide microarrays, and a major quantitative trait locus (QTL) was mapped with high significance. Further investigations showed that this QTL was due to a nonsynonymous single-nucleotide polymorphism that targeted the catalytic core of asparaginase type I (ASP1) and abolished its activity. This QTL was onlyeffective when asparagine was used as a major nitrogen source. Our results link nitrogen assimilation and CO(2) production rate to acetic acid production, as well as, on a broader scale, illustrating the specific problem of quantitative genetics when working with nonlaboratory microorganisms.

The essential role of Drosophila HIRA for de novo assembly of paternal chromatin at fertilization.

Author(s) : Bonnefoy E, Orsi G, Couble P, Loppin B,
Journal : PLoS Genet
In many animal species, the sperm DNA is packaged with male germ line--specific chromosomal proteins, including protamines. At fertilization, these non-histone proteins are removed from the decondensing sperm nucleus and replaced with maternally provided histones to form the DNA replication competent male pronucleus. By studying a point mutant allele of the Drosophila Hira gene, we previously showed that HIRA, a conserved replication-independent chromatin assembly factor, was essential for the assembly of paternal chromatin at fertilization. HIRA permits the specific assembly of nucleosomes containing the histone H3.3 variant on the decondensing male pronucleus. We report here the analysis of a new mutant allele of Drosophila Hira that was generated by homologous recombination. Surprisingly, phenotypic analysis of this loss of function allele revealed that the only essential function of HIRA is the assembly of paternal chromatin during male pronucleus formation. This HIRA-dependent assembly of H3.3 nucleosomes on paternal DNA does not require the histone chaperone ASF1. Moreover, analysis of this mutant established that protamines are correctly removed at fertilization in the absence of HIRA, thus demonstrating that protamine removal and histone deposition are two functionally distinct processes. Finally, we showed that H3.3 deposition is apparently not affected inHira mutant embryos and adults, suggesting that different chromatin assembly machineries could deposit this histone variant.

The testis-specific human protein RBMY recognizes RNA through a novel mode of interaction

Author(s) : Skrisovska L, Bourgeois C, Stefl R, Grellscheid S, Kister L, Wenter P, Elliott D, Stevenin J, Allain F,
Journal : EMBO Rep

Unexpected observations after mapping LongSAGE tags to the human genome.

Author(s) : Keime C, Semon M, Mouchiroud D, Duret L, Gandrillon O,
Journal : BMC Bioinformatics
BACKGROUND: SAGE has been used widely to study the expression of known transcripts, but much less to annotate new transcribed regions. LongSAGE produces tags that are sufficiently long to be reliably mapped to a whole-genome sequence. Here we used this property to study the position of human LongSAGE tags obtainedfrom all public libraries. We focused mainly on tags that do not map to known transcripts. RESULTS: Using a published error rate in SAGE libraries, we first removed the tags likely to result from sequencing errors. We then observed that an unexpectedly large number of the remaining tags still did not match the genome sequence. Some of these correspond to parts of human mRNAs, such as polyA tails,junctions between two exons and polymorphic regions of transcripts. Another non-negligible proportion can be attributed to contamination by murine transcripts and to residual sequencing errors. After filtering out our data withthese screens to ensure that our dataset is highly reliable, we studied the tagsthat map once to the genome. 31% of these tags correspond to unannotated transcripts. The others map to known transcribed regions, but many of them (nearly half) are located either in antisense or in new variants of these known transcripts. CONCLUSION: We performed a comprehensive study of all publicly available human LongSAGE tags, and carefully verified the reliability of these data. We found the potential origin of many tags that did not match the human genome sequence. The properties of the remaining tags imply that the level of sequencing error may have been under-estimated. The frequency of tags matching once the genome sequence but not in an annotated exon suggests that the human transcriptome is much more complex than shown by the current human genome annotations, with many new splicing variants and antisense transcripts. SAGE data is appropriate to map new transcripts to the genome, as demonstrated by the highrate of cross-validation of the corresponding tags using other methods.