The identification of cellular proteins that interfere with the replication of pathogenic viruses is a major endeavor in virology. Amongst them, finding those engaged in long-term virus-host interactions and co-evolution is of particular interest. In the host protein-coding genes, this can be notably witnessed by marks of diverse genetic innovations, such as site-specific positive selection, gene copy number variation, recombination, etc. Because marks of adaptation and genetic innovations are prevalent in viral interacting proteins, I developed a pipeline, DGINN: From a gene coding sequence, it retrieves orthologous sequences, aligns them and reconstructs their phylogeny, followed by the Detection of Genetic INNovations. This streamlined procedure uniquely allows for the detection of paralogous genes, recombination breakpoints, and marks of positive selection with several gold-standard methods. We validated this pipeline on genes with diverse evolutionary profiles.
We then used DGINN to screen new candidate datasets, including one of 84 genes upregulated in macrophages resistant to HIV infection. We found numerous genes presenting important marks of genetic conflict, thus potentially encoding for bona fide (lenti)viral interacting proteins. Three of these candidates are undergoing detailed phylogenetic and functional characterization for their role in the HIV replication cycle, and others are pending further investigation.
Overall, this work led to the design of a public, complete, and highly-flexible pipeline to screen large datasets for genetic innovations, and that allowed us to identify new antiviral gene candidates against HIV for functional characterization.