Gene duplication generates new genetic material that can contribute to the evolution of gene regulatory networks and phenotypes. Duplicated genes can undergo subfunctionalization to partition ancestral functions and/or neofunctionalization to assume a new function. We previously found there had been a whole genome duplication (WGD) in an ancestor of arachnopulmonates, the lineage including spiders and scorpions but excluding other arachnids like mites, ticks, and harvestmen. This WGD was evidenced by many duplicated homeobox genes, including two Hox clusters, in spiders. However, it was unclear which homeobox paralogues originated by WGD versus smaller-scale events such as tandem duplications. Understanding this is a key to determining the contribution of the WGD to arachnopulmonate genome evolution. Here we characterized the distribution of duplicated homeobox genes across eight chromosome-level spider genomes. We found that most duplicated homeobox genes in spiders are consistent with an origin by WGD. We also found two copies of conserved homeobox gene clusters, including the Hox, NK, HRO, Irx, and SINE clusters, in all eight species. Consistently, we observed one copy of each cluster was degenerated in terms of gene content and organization while the other remained more intact. Focussing on the NK cluster, we found evidence for regulatory subfunctionalization between the duplicated NK genes in the spider Parasteatoda tepidariorum compared to their single-copy orthologues in the harvestman Phalangium opilio. Our study provides new insights into the relative contributions of multiple modes of duplication to the homeobox gene repertoire during the evolution of spiders and the function of NK genes.
When new genes evolve through modification of existing genes, there are often trade-offs between the new and original functions, making gene duplication and amplification necessary to buffer deleterious effects on the original function. We have used experimental evolution of a bacterial strain lacking peptide release factor 1 (RF1) in order to study how peptide release factor 2 (RF2) evolves to compensate the loss of RF1. As expected, amplification of the RF2-encoding gene prfB to high copy number was a rapid initial response, followed by the appearance of mutations in RF2 and other components of the translation machinery. Characterization of the evolved RF2 variants by their effects on bacterial growth rate, reporter gene expression, and in vitro translation termination reveals a complex picture of reduced discrimination between the cognate and near cognate stop codons and highlight a functional trade-off that we term “collateral toxicity”. We suggest that this type of trade-off may be a more serious obstacle in new gene evolution than the more commonly discussed evolutionary trade-offs between “old” and “new” functions of a gene, as it cannot be overcome by gene copy number changes. Further, we suggest a model for how RF2 autoregulation responds not only to alterations in the demand for RF2 activity, but also for RF1 activity.
An important mechanism for generation of new genes is by duplication-divergence of existing genes. Duplication-divergence includes several different sub-models, such as subfunctionalization where after accumulation of neutral mutations the original function is distributed between two partially functional and complementary genes, and neofunctionalization where a new function evolves in one of the duplicated copies while the old function is maintained in another copy. The likelihood of these mechanisms depends on the longevity of the duplicated state, which in turn depends on the fitness cost and genetic stability of the duplications. Here, we determined the fitness cost and stability of defined gene duplications/amplifications on a low copy number plasmid. Our experimental results show that the costs of carrying extra gene copies are substantial and that each additional kbp of DNA reduces fitness by approximately 0.15%. Furthermore, gene amplifications are highly unstable and rapidly segregate to lower copy numbers in absence of selection. Mathematical modelling shows that the fitness costs and instability strongly reduces the likelihood of both sub- and neofunctionalization, but that these effects can be off-set by positive selection for novel beneficial functions.
Sex chromosome evolution is usually seen as a process that, once initiated, will inevitably progress toward an advanced stage of degeneration of the nonrecombining chromosome. However, despite evidence that avian sex chromosome evolution was initiated > 100 Ma, ratite birds have been trapped in an arrested stage of sex chromosome divergence. We performed RNA sequencing of several tissues from male and female ostriches and assembled the transcriptome de novo. A total of 315 Z-linked genes fell into two categories: those that have equal expression level in the two sexes (for which Z-W recombination still occurs) and those that have a 2-fold excess of male expression (for which Z-W recombination has ceased). We suggest that failure to evolve dosage compensation has constrained sex chromosome divergence in this basal avian lineage. Our results indicate that dosage compensation is a prerequisite for, not only a consequence of, sex chromosome evolution.
Multiple sequence alignment (MSA) is ubiquitous in evolution and bioinformatics. MSAs are usually taken to be a known and fixed quantity on which to perform downstream analysis despite extensive evidence that MSA accuracy and uncertainty affect results. These errors are known to cause a wide range of problems for downstream evolutionary inference, ranging from false inference of positive selection to long branch attraction artifacts. The most popular approach to dealing with this problem is to remove (filter) specific columns in the MSA that are thought to be prone to error. Although popular, this approach has had mixed success and several studies have even suggested that filtering might be detrimental to phylogenetic studies. We present a graph-based clustering method to address MSA uncertainty and error in the software Divvier (available at https://github.com/simonwhelan/Divvier), which uses a probabilistic model to identify clusters of characters that have strong statistical evidence of shared homology. These clusters can then be used to either filter characters from the MSA (partial filtering) or represent each of the clusters in a new column (divvying). We validate Divvier through its performance on real and simulated benchmarks, finding Divvier substantially outperforms existing filtering software by retaining more true pairwise homologies calls and removing more false positive pairwise homologies. We also find that Divvier, in contrast to other filtering tools, can alleviate long branch attraction artifacts induced by MSA and reduces the variation in tree estimates caused by MSA uncertainty.
Adaptation from standing genetic variation is an important process underlying evolution in natural populations, but we rarely get the opportunity to observe the dynamics of fitness and genomic changes in real time. Here, we used experimental evolution and Pool-Seq to track the phenotypic and genomic changes of genetically diverse asexual populations of the yeast Saccharomyces cerevisiae in four environments with different fitness costs. We found that populations rapidly and in parallel increased in fitness in stressful environments. In contrast, allele frequencies showed a range of trajectories, with some populations fixing all their ancestral variation in <30 generations and others maintaining diversity across hundreds of generations. We detected parallelism at the genomic level (involving genes, pathways, and aneuploidies) within and between environments, with idiosyncratic changes recurring in the environments with higher stress. In particular, we observed a tendency of becoming haploid-like in one environment, whereas the populations of another environment showed low overall parallelism driven by standing genetic variation despite high selective pressure. This work highlights the interplay between standing genetic variation and the influx of de novo mutations in populations adapting to a range of selective pressures with different underlying trait architectures, advancing our understanding of the constraints and drivers of adaptation.
It has been suggested that Rickettsia Palindromic Elements (RPEs) have evolved as selfish DNA that mediate protein sequence evolution by being targeted to genes that code for RNA and proteins. Here, we have examined the phylogenetic depth of two RPEs that are located close to the genes encoding elongation factors Tu (tuf) and G (fus) in Rickettsia. An exceptional organization of the elongation factor genes was found in all 11 species examined, with complete or partial RPEs identified downstream of the tuf gene (RPE-tuf) in six species and of the fus gene (RPE-fus) in 10 species. A phylogenetic reconstruction shows that both RPE-tuf and RPE-fus have evolved in a manner that is consistent with the expected species divergence. The analysis provides evidence for independent loss of RPE-tuf in several species, possibly mediated by short repetitive sequences flanking the site of excision. The remaining RPE-tuf sequences evolve as neutral sequences in different stages of deterioration. Likewise, highly fragmented remnants of the RPE-fus sequence were identified in two species. This suggests that genome-specific differences in the content of RPEs are the result of recent loss rather than recent proliferation.
To study reductive evolutionary processes in bacterial genomes, we examine sequences in the Rickettsia genomes which are unconstrained by selection and evolve as pseudogenes, one of which is the metK gene, which codes for AdoMet synthetase. Here, we sequenced the metK gene and three surrounding genes in eight different species of the genus Rickettsia. The metK gene was found to contain a high incidence of deletions in six lineages, while the three genes in its surroundings were functionally conserved in all eight lineages. A more drastic example of gene degradation was identified in the metK downstream region, which contained an open reading frame in Rickettsia felis. Remnants of this open reading frame could be reconstructed in five additional species by eliminating sites of frameshift mutations and termination codons. A detailed examination of the two reconstructed genes revealed that deletions strongly predominate over insertions and that there is a strong transition bias for point mutations which is coupled to an excess of GC-to-AT substitutions. Since the molecular evolution of these inactive genes should reflect the rates and patterns of neutral mutations, our results strongly suggest that there is a high spontaneous rate of deletions as well as a strong mutation bias toward AT pairs in the Rickettsia genomes. This may explain the low genomic G + C content (29%), the small genome size (1.1 Mb), and the high noncoding content (24%), as well as the presence of several pseudogenes in the Rickettsia prowazekii genome.
Studies of neutrally evolving sequences suggest that differences in eukaryotic genome sizes result from different rates of DNA loss. However, very few pseudogenes have been identified in microbial species, and the processes whereby genes and genomes deteriorate in bacteria remain largely unresolved. The typhus-causing agent, Rickettsia prowazekii, is exceptional in that as much as 24% of its 1.1-Mb genome consists of noncoding DNA and pseudogenes. To test the hypothesis that the noncoding DNA in the R. prowazekii genome represents degraded remnants of ancestral genes, we systematically examined all of the identified pseudogenes and their flanking sequences in three additional Rickettsia species. Consistent with the hypothesis, we observe sequence similarities between genes and pseudogenes in one species and intergenic DNA in another species. We show that the frequencies and average sizes of deletions are larger than insertions in neutrally evolving pseudogene sequences. Our results suggest that inactivated genetic material in the Rickettsia genomes deteriorates spontaneously due to a mutation bias for deletions and that the noncoding sequences represent DNA in the final stages of this degenerative process.
Key innovations enable access to new adaptive zones and are often linked to increased species diversification. As such, innovations have attracted much attention, yet their concrete consequences on the subsequent evolutionary trajectory and diversification of the bearing lineages remain unclear. Water striders and relatives (Hemiptera: Heteroptera: Gerromorpha) represent a monophyletic lineage of insects that transitioned to live on the water-air interface and that diversified to occupy ponds, puddles, streams, mangroves and even oceans. This lineage offers an excellent model to study the patterns and processes underlying species diversification following the conquest of new adaptive zones. However, such studies require a reliable and comprehensive phylogeny of the infraorder. Based on whole transcriptomic datasets of 97 species and fossil records, we reconstructed a new phylogeny of the Gerromorpha that resolved inconsistencies and uncovered strong support for previously unknown relationships between some important taxa. We then used this phylogeny to reconstruct the ancestral state of a set of adaptations associated with water surface invasion (fluid locomotion, dispersal and transition to saline waters) and sexual dimorphism. Our results uncovered important patterns and dynamics of phenotypic evolution, revealing how the initial event of water surface invasion enabled multiple subsequent transitions to new adaptive zones on the water surfaces. This phylogeny and the associated transcriptomic datasets constitute highly valuable resources, making Gerromorpha an attractive model lineage to study phenotypic evolution.
Human herpesvirus 6A and 6B (HHV-6) can integrate into the germline, and as a result about 70 million people harbour the genome of one of these viruses in every cell of their body. Until now, it has been largely unknown if i) these integrations are ancient, ii) if they still occur, and iii) whether circulating virus strains differ from integrated ones. Here we used next generation sequencing and mining of public human genome datasets to generate the largest and most diverse collection of circulating and integrated HHV-6 genomes studied to date. In genomes of geographically dispersed, only distantly-related people, we identified clades of integrated viruses that originated from a single ancestral event, confirming this with fluorescent in situ hybridization to directly observe the integration locus. In contrast to HHV-6B, circulating and integrated HHV-6A sequences form distinct clades, arguing against ongoing integration of circulating HHV-6A or "reactivation" of integrated HHV-6A. Taken together, our study provides the first comprehensive picture of the evolution of HHV-6, and reveals that integration of heritable HHV-6 has occurred since the time of, if not before, human migrations out of Africa.
Protein synthesis elongation factor G (EF-G) is an essential protein with central roles in both the elongation and ribosome recycling phases of protein synthesis. Although EF-G evolution is predicted to be conservative, recent reports suggest otherwise. We have characterized EF-G in terms of its molecular phylogeny, genomic context and patterns of amino acid substitution. We find that most bacteria carry a single "canonical" EF-G, which is phylogenetically conservative and encoded in an str operon. However, we also find a number of EF-G paralogs. These include a pair of EF-Gs that are mostly found together and in an eclectic subset of bacteria, specifically delta-proteobacteria, spirochaetes and planctomycetes (the "spd" bacteria). These spdEFGs have also given rise to the mitochondrial factors mtEFG1 and mtEFG2, which probably arrived in eukaryotes before the eukaryotic last common ancestor. Meanwhile, chloroplasts apparently use an α-proteobacterial derived EF-G, rather than the expected cyanobacterial form. The long-term co-maintenance of the spd/mtEFGs may be related to their subfunctionalization for translocation and ribosome recycling. Consistent with this, patterns of sequence conservation and site-specific evolutionary rate shifts suggest that the faster evolving spd/mtEFG2 has lost translocation function, but, surprisingly, the protein also shows little conservation of sites related to recycling activity. On the other hand, spd/mtEFG1, although more slowly evolving, shows signs of substantial remodeling. This is particularly extensive in the GTPase domain, including a highly conserved three amino acid insertion in switch I. We suggest that sub-functionalization of the spd/mtEFGs is not a simple case of specialization for subsets of original activities. Rather the duplication allows the release of one paralog from the selective constraints imposed by dual functionality thus allowing it to become more highly specialized. Thus the potential for fine-tuning afforded by subfunctionalization may explain the maintenance of EF-G paralogs.
Whether protein evolution is mainly due to fixation of beneficial alleles by positive selection or to random genetic drift has remained a contentious issue over the years. Here, we use two genomewide polymorphism data sets collected from chicken populations, together with divergence data from >5,000 chicken-zebra finch gene orthologs expressed in brain, to assess the amount of adaptive evolution in protein-coding genes of birds. First, we show that estimates of the fixation index (FI, the ratio of fixed nonsynonymous-to-synonymous changes over the ratio of the corresponding polymorphisms) are highly dependent on the character of the underlying data sets. Second, by using polymorphism data from high-frequency alleles, to avoid the confounding effect of slightly deleterious mutations segregating at low frequency, we estimate that about 20% of amino acid changes have been brought to fixation through positive selection during avian evolution. This estimate is intermediate to that obtained in humans (lower) and flies as well as bacteria (higher), and is consistent with population genetics theory that stipulates a positive relationship between the efficiency of selection and the effective population size. Further, by comparing the FIs for common and all alleles, we estimate that approximate to 20% of nonsynonymous variation segregating in chicken populations represent slightly deleterious mutations, which is less than in Drosophila. Overall, these results highlight the link between the effective population size and positive as well as negative selection.
The field of ancient DNA (aDNA) is casting new light on many evolutionary questions. However, problems associated with the postmortem instability of DNA may complicate the interpretation of aDNA data. For example, in population genetic studies, the inclusion of damaged DNA may inflate estimates of diversity. In this paper, we examine the effect of DNA damage on population genetic estimates of ancestral population size. We simulate data using standard coalescent simulations that include postmortem damage and show that estimates of effective population sizes are inflated around, or right after, the sampling time of the ancestral DNA sequences. This bias leads to estimates of increasing, and then decreasing, population sizes, as observed in several recently published studies. We reanalyze a recently published data set of DNA sequences from the Bison (Bison bison/Bison priscus) and show that the signal for a change in effective population size in this data set vanishes once the effects of putative damage are removed. Our results suggest that population genetic analyses of aDNA sequences, which do not accurately account for damage, should be interpreted with great caution.
Identifying genes influenced by natural selection can provide information about lineage-specific adaptations, and transcriptomes generated by next-generation sequencing are a useful resource for identifying such genes. Here, we utilize a spleen transcriptome for the house finch (Haemorhous mexicanus), an emerging model for sexual selection and disease ecology, together with previously sequenced avian genomes (chicken, turkey, and zebra finch), to investigate lineage-specific adaptations within birds. An analysis of 4,398 orthologous genes revealed a significantly higher ratio of nonsynonymous to synonymous substitutions and significantly higher GC content in passerines than in galliforms, an observation deviating from strictly neutral expectations but consistent with an effect of biased gene conversion on the evolutionary rate in passerines. These data also showed that genes exhibiting signs of positive selection and fast evolution in passerines have functional roles related to fat metabolism, neurodevelopment, and ion binding.
Integration of a conjugative plasmid into a bacterial chromosome can promote the transfer of chromosomal DNA to other bacteria. Intraspecies chromosomal conjugation is believed responsible for creating the global pathogens Klebsiella pneumoniae ST258 and Escherichia coli ST1193. Interspecies conjugation is also possible but little is known about the genetic architecture or fitness of such hybrids. To study this, we generated by conjugation 14 hybrids of E. coli and Salmonella enterica. These species belong to different genera, diverged from a common ancestor >100 Ma, and share a conserved order of orthologous genes with similar to 15% nucleotide divergence. Genomic analysis revealed that all but one hybrid had acquired a contiguous segment of donor E. coli DNA, replacing a homologous region of recipient Salmonella chromosome, and ranging in size from similar to 100 to >4,000 kb. Recombination joints occurred in sequences with higher-than-average nucleotide identity. Most hybrid strains suffered a large reduction in growth rate, but the magnitude of this cost did not correlate with the length of foreign DNA. Compensatory evolution to ameliorate the cost of low-fitness hybrids pointed towards disruption of complex genetic networks as a cause. Most interestingly, 4 of the 14 hybrids, in which from 45% to 90% of the Salmonella chromosome was replaced with E. coli DNA, showed no significant reduction in growth fitness. These data suggest that the barriers to creating high-fitness interspecies hybrids may be significantly lower than generally appreciated with implications for the creation of novel species.
Despite its recent invasion into the marine realm, the sea otter (Enhydra lutris) has evolved a suite of adaptations for life in cold coastal waters, including limb modifications and dense insulating fur. This uniquely dense coat led to the near-extinction of sea otters during the 18th-20th century fur trade and an extreme population bottleneck. We used the de novo genome of the southern sea otter (E. l. nereis) to reconstruct its evolutionary history, identify genes influencing aquatic adaptation, and detect signals of population bottlenecks. We compared the genome of the southern sea otter with the tropical freshwater-living giant otter (Pteronura brasiliensis) to assess common and divergent genomic trends between otter species, and with the closely related northern sea otter (E. l. kenyoni) to uncover population-level trends. We found signals of positive selection in genes related to aquatic adaptations, particularly limb development and polygenic selection on genes related to hair follicle development. We found extensive pseudogenization of olfactory receptor genes in both the sea otter and giant otter lineages, consistent with patterns of sensory gene loss in other aquatic mammals. At the population level, the southern sea otter and the northern sea otter showed extremely low genomic diversity, signals of recent inbreeding, and demographic histories marked by population declines. These declines may predate the fur trade and appear to have resulted in an increase in putatively deleterious variants that could impact the future recovery of the sea otter.
Evolutionary trajectories are deemed largely irreversible. In a newly diverged protein, reversion of mutations that led to the functional switch typically results in loss of both the new and the ancestral functions. Nonetheless, evolutionary transitions where reversions are viable have also been described. The structural and mechanistic causes of reversion compatibility versus incompatibility therefore remain unclear. We examined two laboratory evolution trajectories of mammalian paraoxonase-1, a lactonase with promiscuous organophosphate hydrolase (OPH) activity. Both trajectories began with the same active-site mutant, His115Trp, which lost the native lactonase activity and acquired higher OPH activity. A neo-functionalization trajectory amplified the promiscuous OPH activity, whereas the re-functionalization trajectory restored the native activity, thus generating a new lactonase that lacks His115. The His115 revertants of these trajectories indicated opposite trends. Revertants of the neo-functionalization trajectory lost both the evolved OPH and the original lactonase activity. Revertants of the trajectory that restored the original lactonase function were, however, fully active. Crystal structures and molecular simulations show that in the newly diverged OPH, the reverted His115 and other catalytic residues are displaced, thus causing loss of both the original and the new activity. In contrast, in the re-functionalization trajectory, reversion compatibility of the original lactonase activity derives from mechanistic versatility whereby multiple residues can fulfill the same task. This versatility enables unique sequence-reversible compositions that are inaccessible when the active site was repurposed toward a new function.
Two competing hypotheses are at the forefront of the debate on modern human origins. In the first scenario, known as the recent Out-of-Africa hypothesis, modern humans arose in Africa about 100,000-200,000 years ago and spread throughout the world by replacing the local archaic human populations. By contrast, the second hypothesis posits substantial gene flow between archaic and emerging modern humans. In the last two decades, the young time estimates-between 100,000 and 200,000 years-of the most recent common ancestors for the mitochondrion and the Y chromosome provided evidence in favor of a recent African origin of modern humans. However, the presence of very old lineages for autosonnal and X-linked genes has often been claimed to be incompatible with a simple, single origin of modern humans. Through the analysis of a public DNA sequence database, we find, similar to previous estimates, that the common ancestors of autosomal and X-linked genes are indeed very old, living, on average, respectively, 1,500,000 and 1,000,000 years ago. However, contrary to previous conclusions, we find that these deep gene genealogies are consistent with the Out-of-Africa scenario provided that the ancestral effective population size was approximately 14,000 individuals. We show that an ancient bottleneck in the Middle Pleistocene, possibly arising from an ancestral structured population, can reconcile the contradictory findings from the mitochondrion on the one hand, with the autosomes and the X chromosome on the other hand.
The rate of recombination impacts on rates of protein evolution for at least two reasons: it affects the efficacy of selection due to linkage and influences sequence evolution through the process of GC-biased gene conversion (gBGC). We studied how recombination, via gBGC, affects inferences of selection in gene sequences using comparative genomic and population genomic data from the collared flycatcher (Ficedula albicollis). We separately analyzed different mutation categories ("strong"-to-"weak" "weak-to-strong," and GC-conservative changes) and found that gBGC impacts on the distribution of fitness effects of new mutations, and leads to that the rate of adaptive evolution and the proportion of adaptive mutations among nonsynonymous substitutions are underestimated by 22-33%. It also biases inferences of demographic history based on the site frequency spectrum. In light of this impact, we suggest that inferences of selection (and demography) in lineages with pronounced gBGC should be based on GC-conservative changes only. Doing so, we estimate that 10% of nonsynonymous mutations are effectively neutral and that 27% of nonsynonymous substitutions have been fixed by positive selection in the flycatcher lineage. We also find that gene expression level, sex-bias in expression, and the number of protein-protein interactions, but not Hill-Robertson interference (HRI), are strong determinants of selective constraint and rate of adaptation of collared flycatcher genes. This study therefore illustrates the importance of disentangling the effects of different evolutionary forces and genetic factors in interpretation of sequence data, and from that infer the role of natural selection in DNA sequence evolution.
Archaeozoological and genetic data indicate that taurine cattle were first domesticated from local wild ox (aurochs) in the Near East some 10,500 years ago. However, while modern mitochondrial DNA (mtDNA) variation indicates early Holocene founding event(s), a lack of ancient DNA data from the region of origin, variation in mutation rate estimates, and limited application of appropriate inference methodologies have resulted in uncertainty on the number of animals first domesticated. A large number would be expected if cattle domestication was a technologically straightforward and unexacting region-wide phenomenon, while a smaller number would be consistent with a more complex and challenging process. We report mtDNA sequences from 15 Neolithic to Iron Age Iranian domestic cattle and, in conjunction with modern data, use serial coalescent simulation and approximate Bayesian computation to estimate that around 80 female aurochs were initially domesticated. Such a low number is consistent with archaeological data indicating that initial domestication took place in a restricted area and suggests the process was constrained by the difficulty of sustained managing and breeding of the wild progenitors of domestic cattle.
The ratio of nonsynonymous to synonymous substitution rates (ω) is often used to measure the strength of natural selection. However, ω may be influenced by linkage among different targets of selection, that is, Hill-Robertson interference (HRI), which reduces the efficacy of selection. Recombination modulates the extent of HRI but may also affect ω by means of GC-biased gene conversion (gBGC), a process leading to a preferential fixation of G:C ("strong," S) over A:T ("weak," W) alleles. As HRI and gBGC can have opposing effects on ω, it is essential to understand their relative impact to make proper inferences of ω. We used a model that separately estimated S-to-S, S-to-W, W-to-S, and W-to-W substitution rates in 8,423 avian genes in the Ficedula flycatcher lineage. We found that the W-to-S substitution rate was positively, and the S-to-W rate negatively, correlated with recombination rate, in accordance with gBGC but not predicted by HRI. The W-to-S rate further showed the strongest impact on both dN and dS. However, since the effects were stronger at 4-fold than at 0-fold degenerated sites, likely because the GC content of these sites is farther away from its equilibrium, ω slightly decreases with increasing recombination rate, which could falsely be interpreted as a consequence of HRI. We corroborated this hypothesis analytically and demonstrate that under particular conditions, ω can decrease with increasing recombination rate. Analyses of the site-frequency spectrum showed that W-to-S mutations were skewed toward high, and S-to-W mutations toward low, frequencies, consistent with a prevalent gBGC-driven fixation bias.
The last common ancestor of the Gammaproteobacteria carried an important 40-kb chromosome section encoding 51 proteins of the transcriptional and translational machinery. These genes were organized into eight contiguous operons (rrnB-tufB-secE-rpoBC-str-S10-spc-alpha). Over 2 Gy of evolution, in different lineages, some of the operons became separated by multigene insertions. Surprisingly, in many Enterobacteriaceae, much of the ancient organization is conserved, indicating a strong selective force on the operons to remain colinear. Here, we show for one operon pair, tufB-secE in Salmonella, that an interruption of contiguity significantly reduces growth rate. Our data show that the tufB-secE operons are concatenated by an interoperon terminator-promoter overlap that plays a significant role regulating gene expression. Interrupting operon contiguity interferes with this regulation, reducing cellular fitness. Six operons of the ancestral chromosome section remain contiguous in Salmonella (tufB-secE-rpoBC and S10-spc-alpha) and, strikingly, each of these operon pairs is also connected by an interoperon terminator-promoter overlap. Accordingly, we propose that operon concatenation is an ancient feature that restricts the potential to rearrange bacterial chromosomes and can select for the maintenance of a colinear operon organization over billions of years.
Although previous studies have failed to detect an association between microsatellite polymorphism and broadscale recombination rates in the human genome, there are several possible reasons why such a relationship could exist. For instance, there might be a direct link if recombination is mutagenic to microsatellite sequences or if polymorphic microsatellites act as recombination signals. Alternatively, recombination could exert an indirect effect by uncoupling of natural selection at linked loci, promoting polymorphism. As recombination is concentrated in narrow hotspot regions in the human genome, we investigated the relationship between microsatellite polymorphism and recombination hot spots. By using data from a common allele frequency database, we found several polymorphism estimates to be similar for hot spots and the genomic average. However, this is likely explained by an ascertainment bias because markers with high polymorphism information content are usually selected for genotyping in human populations and pedigrees. In contrast, by using an unbiased set of shotgun sequence data, we found an excess of microsatellite polymorphism in recombination hot spots of 14%. However, when other genomic variables are taken into account in a generalized model and using wavelet analysis, the effect is no longer detectable and the only firm predictor of microsatellite polymorphism is the incidence of SNPs and indels. One possible neutral explanation to these observations is that there is a common denominator affecting the local rate of mutation in unique as well as in repetitive DNA, for example, base composition.
Dental calculus, the calcified form of the mammalian oral microbial plaque biofilm, is a rich source of oral microbiome, host, and dietary biomolecules and is well preserved in museum and archaeological specimens. Despite its wide presence in mammals, to date, dental calculus has primarily been used to study primate microbiome evolution. We establish dental calculus as a valuable tool for the study of nonhuman host microbiome evolution, by using shotgun metagenomics to characterize the taxonomic and functional composition of the oral microbiome in species as diverse as gorillas, bears, and reindeer. We detect oral pathogens in individuals with evidence of oral disease, assemble near-complete bacterial genomes from historical specimens, characterize antibiotic resistance genes, reconstruct components of the host diet, and recover host genetic profiles. Our work demonstrates that metagenomic analyses of dental calculus can be performed on a diverse range of mammalian species, which will allow the study of oral microbiome and pathogen evolution from a comparative perspective. As dental calculus is readily preserved through time, it can also facilitate the quantification of the impact of anthropogenic changes on wildlife and the environment.
Many theories favor a fusion of 2 prokaryotic genomes for the origin of the Eukaryotes, but there are disagreements on the origin, timing, and cellular structures of the cells involved. Equally controversial is the source of the nuclear genes for mitochondrial proteins, although the α-proteobacterial contribution to the mitochondrial genome is well established. Phylogenetic inferences show that the nuclearly encoded mitochondrial aminoacyl-tRNA synthetases (aaRSs) occupy a position in the tree that is not close to any of the currently sequenced α-proteobacterial genomes, despite cohesive and remarkably well-resolved α-proteobacterial clades in 12 of the 20 trees. Two or more α-proteobacterial clusters were observed in 8 cases, indicative of differential loss of paralogous genes or horizontal gene transfer. Replacement and retargeting events within the nuclear genomes of the Eukaryotes was indicated in 10 trees, 4 of which also show split α-proteobacterial groups. A majority of the mitochondrial aaRSs originate from within the bacterial domain, but none specifically from the α-Proteobacteria. For some aaRS, the endosymbiotic origin may have been erased by ongoing gene replacements on the bacterial as well as the eukaryotic side. For others that accurately resolve the α-proteobacterial divergence patterns, the lack of affiliation with mitochondria is more surprising. We hypothesize that the ancestral eukaryotic gene pool hosted primordial "bacterial-like" genes, to which a limited set of α-proteobacterial genes, mostly coding for components of the respiratory chain complexes, were added and selectively maintained.
Owing to its special mode of evolution and central role in the adaptive immune system, the major histocompatibility complex (MHC) has become the focus of diverse disciplines such as immunology, evolutionary ecology, and molecular evolution. MHC evolution has been studied extensively in diverse vertebrate lineages over the last few decades, and it has been suggested that birds differ from the established mammalian norm. Mammalian MHC genes evolve independently, and duplication history (i.e., orthology) can usually be traced back within lineages. In birds, this has been observed in only 3 pairs of closely related species. Here we report strong evidence for the persistence of orthology of MHC genes throughout an entire avian order. Phylogenetic reconstructions of MHC class II B genes in 14 species of owls trace back orthology over tens of thousands of years in exon 3. Moreover, exon 2 sequences from several species show closer relationships than sequences within species, resembling transspecies evolution typically observed in mammals. Thus, although previous studies suggested that long-term evolutionary dynamics of the avian MHC was characterized by high rates of concerted evolution, resulting in rapid masking of orthology, our results question the generality of this conclusion. The owl MHC thus opens new perspectives for a more comprehensive understanding of avian MHC evolution.
Gene duplication and neofunctionalization are known to be important processes in the evolution of phenotypic complexity. They account for important evolutionary novelties that confer ecological adaptation, such as the major histocompatibility complex (MHC), a multigene family crucial to the vertebrate immune system. In birds, two MHC class II β (MHCIIβ) exon 3 lineages have been recently characterized, and two hypotheses for the evolutionary history of MHCIIβ lineages were proposed. These lineages could have arisen either by 1) an ancient duplication and subsequent divergence of one paralog or by 2) recent parallel duplications followed by functional convergence. Here, we compiled a data set consisting of 63 MHCIIβ exon 3 sequences from six avian orders to distinguish between these hypotheses and to understand the role of selection in the divergent evolution of the two avian MHCIIβ lineages. Based on phylogenetic reconstructions and simulations, we show that a unique duplication event preceding the major avian radiations gave rise to two ancestral MHCIIβ lineages that were each likely lost once later during avian evolution. Maximum likelihood estimation shows that following the ancestral duplication, positive selection drove a radical shift from basic to acidic amino acid composition of a protein domain facing the α-chain in the MHCII α β-heterodimer. Structural analyses of the MHCII α β-heterodimer highlight that three of these residues are potentially involved in direct interactions with the α-chain, suggesting that the shift following duplication may have been accompanied by coevolution of the interacting α- and β-chains. These results provide new insights into the long-term evolutionary relationships among avian MHC genes and open interesting perspectives for comparative and population genomic studies of avian MHC evolution.
Endosymbiotic bacteria of aphids, Buchnera aphidicola, and tsetse flies, Wigglesworthia glossinidia, are descendents of free-living γ-Proteobacteria. The acceleration of sequence evolution in the endosymbiont genomes is here estimated from a phylogenomic analysis of the γ-Proteobacteria. The tree topologies associated with the most highly conserved genes suggest that the endosymbionts form a sister group with Escherichia coli, Salmonella sp., and Yersinia pestis. Our results indicate that deviant tree topologies result from high substitution rates and biased nucleotide patterns, rather than from lateral gene transfer, as previously suggested. A reinvestigation of the relative rate increase in the endosymbiont genomes reveals variability among genes that correlate with host-associated metabolic dependencies. The conclusion is that host-level selection has retarded both the loss of genes and the acceleration of sequence evolution in endocellular symbionts.
Analysis of bacterial genomes shows that, whereas diverse species share many genes in common, their linear order on the chromosome is often not conserved. Whereas rearrangements in gene order could occur by genetic drift, an alternative hypothesis is rearrangement driven by positive selection during niche adaptation (SNAP). Here, we provide the first experimental support for the SNAP hypothesis. We evolved Salmonella to adapt to growth on malate as the sole carbon source and followed the evolutionary trajectories. The initial adaptation to growth in the new environment involved the duplication of 1.66 Mb, corresponding to one-third of the Salmonella chromosome. This duplication is selected to increase the copy number of a single gene, dctA, involved in the uptake of malate. Continuing selection led to the rapid loss or mutation of duplicate genes from either copy of the duplicated region. After 2000 generations, only 31% of the originally duplicated genes remained intact and the gene order within the Salmonella chromosome has been significantly and irreversibly altered. These results experientially validate predictions made by the SNAP hypothesis and show that SNAP can be a strong driving force for rearrangements in chromosomal gene order.
A central question in evolutionary biology is why some species have more genetic diversity than others and a no less important question is why selection efficacy varies among species. Although these questions have started to be tackled in animals, they have not been addressed to the same extent in plants. Here, we estimated nucleotide diversity at synonymous, pi(S), and nonsynonymous sites, pi(N), and a measure of the efficacy of selection, the ratio pi(N)/pi(S), in 34 animal and 28 plant species using full genome data. We then evaluated the relationship of nucleotide diversity and selection efficacy with effective population size, the distribution of fitness effect and life history traits. In animals, our data confirm that longevity and propagule size are the variables that best explain the variation in pi(S) among species. In plants longevity also plays a major role as well as mating system. As predicted by the nearly neutral theory of molecular evolution, the log of pi(N)/pi(S) decreased linearly with the log of pi(S) but the slope was weaker in plants than in animals. This appears to be due to a higher mutation rate in long lived plants, and the difference disappears when pi(S) is rescaled by the mutation rate. Differences in the distribution of fitness effect of new mutations also contributed to variation in pi(N)/pi(S) among species.
Two-component signaling (TCS) is the primary means by which bacteria sense and respond to the environment. TCS involves two partner proteins working in tandem, which interact to perform cellular functions whereas limiting interactions with non-partners (i.e., cross-talk). We construct a Potts model for TCS that can quantitatively predict how mutating amino acid identities affect the interaction between TCS partners and non-partners. The parameters of this model are inferred directly from protein sequence data. This approach drastically reduces the computational complexity of exploring the sequence-space of TCS proteins. As a stringent test, we compare its predictions to a recent comprehensive mutational study, which characterized the functionality of 20 4 mutational variants of the PhoQ kinase in Escherichia coli. We find that our best predictions accurately reproduce the amino acid combinations found in experiment, which enable functional signaling with its partner PhoP. These predictions demonstrate the evolutionary pressure to preserve the interaction between TCS partners as well as prevent unwanted cross-talk. Further, we calculate the mutational change in the binding affinity between PhoQ and PhoP, providing an estimate to the amount of destabilization needed to disrupt TCS.
Genetic variation is instrumental for adaptation to changing environments but it is unclear how it is structured and contributes to adaptation in pelagic species lacking clear barriers to gene flow. Here, we applied comparative genomics to extensive transcriptome datasets from 20 krill species collected across the Atlantic, Indian, Pacific, and Southern Oceans. We compared genetic variation both within and between species to elucidate their evolutionary history and genomic bases of adaptation. We resolved phylogenetic interrelationships and uncovered genomic evidence to elevate the cryptic Euphausia similis var. armata into species. Levels of genetic variation and rates of adaptive protein evolution vary widely. Species endemic to the cold Southern Ocean, such as the Antarctic krill Euphausia superba, showed less genetic variation and lower evolutionary rates than other species. This could suggest a low adaptive potential to rapid climate change. We uncovered hundreds of candidate genes with signatures of adaptive evolution among Antarctic Euphausia but did not observe strong evidence of adaptive convergence with the predominantly Arctic Thysanoessa. We instead identified candidates for cold-adaptation that have also been detected in Antarctic fish, including genes that govern thermal reception such as TrpA1. Our results suggest parallel genetic responses to similar selection pressures across Antarctic taxa and provide new insights into the adaptive potential of important zooplankton already affected by climate change.
Evidence is accumulating that gene flow commonly occurs between recently diverged species, despite the existence of barriers to gene flow in their genomes. However, we still know little about what regions of the genome become barriers to gene flow and how such barriers form. Here, we compare genetic differentiation across the genomes of bumblebee species living in sympatry and allopatry to reveal the potential impact of gene flow during species divergence and uncover genetic barrier loci. We first compared the genomes of the alpine bumblebee Bombus sylvicola and a previously unidentified sister species living in sympatry in the Rocky Mountains, revealing prominent islands of elevated genetic divergence in the genome that colocalize with centromeres and regions of low recombination. This same pattern is observed between the genomes of another pair of closely related species living in allopatry (B. bifarius and B. vancouverensis). Strikingly however, the genomic islands exhibit significantly elevated absolute divergence (d(XY)) in the sympatric, but not the allopatric, comparison indicating that they contain loci that have acted as barriers to historical gene flow in sympatry. Our results suggest that intrinsic barriers to gene flow between species may often accumulate in regions of low recombination and near centromeres through processes such as genetic hitchhiking, and that divergence in these regions is accentuated in the presence of gene flow.
The Cape bee (Apis mellifera capensis) is a subspecies of the honeybee, in which workers commonly lay diploid unfertilized eggs via a process known as thelytoky. A recent study aimed to map the genetic basis of this trait in the progeny of a single capensis queen where workers laid either diploid (thelytokous) or haploid (arrhenotokous) eggs. A nonsynonymous single nucleotide polymorphism (SNP) in a gene of unknown function was reported to be strongly associated with thelytoky in this colony. Here, we analyze genome sequences from a global sample of A. mellifera and identify populations where the proposed thelytoky allele at this SNP is common but thelytoky is absent. We also analyze genome sequences of three capensis queens produced by thelytoky and find that, contrary to predictions, they do not carry the proposed thelytoky allele. The proposed SNP is therefore neither sufficient nor required to produce thelytoky in A. mellifera.
Dogs exhibit more phenotypic variation than any other mammal and are affected by a wide variety of genetic diseases. However, the origin and genetic basis of this variation is still poorly understood. We examined the effect of domestication on the dog genome by comparison with its wild ancestor, the gray wolf. We compared variation in dog and wolf genes using whole-genome single nucleotide polymorphism (SNP) data. The d(N)/d(S) ratio (omega) was around 50% greater for SNPs found in dogs than in wolves, indicating that a higher proportion of nonsynonymous alleles segregate in dogs compared with nonfunctional genetic variation. We suggest that the majority of these alleles are slightly deleterious and that two main factors may have contributed to their increase. The first is a relaxation of selective constraint due to a population bottleneck and altered breeding patterns accompanying domestication. The second is a reduction of effective population size at loci linked to those under positive selection due to Hill-Robertson interference. An increase in slightly deleterious genetic variation could contribute to the prevalence of disease in modern dog breeds.
Remarkably little is known about the population-level processes leading up to the extinction of the neandertal. To examine this, we use mitochondrial DNA sequences from 13 neandertal individuals, including a novel sequence from northern Spain, to examine neandertal demographic history. Our analyses indicate that recent western European neandertals (< 48 kyr) constitute a tightly defined group with low mitochondrial genetic variation in comparison with both eastern and older (> 48 kyr) European neandertals. Using control region sequences, Bayesian demographic simulations provide higher support for a model of population fragmentation followed by separate demographic trajectories in subpopulations over a null model of a single stable population. The most parsimonious explanation for these results is that of a population turnover in western Europe during early Marine Isotope Stage 3, predating the arrival of anatomically modern humans in the region.
It has been hypothesized that early enzymes are more promiscuous than their extant orthologs. Whether or not this hypothesis applies to the translation machinery, the oldest molecular machine of life, is not known. Efficient protein synthesis relies on a cascade of specific interactions between the ribosome and the translation factors. Here, using elongation factor-Tu (EF-Tu) as a model system, we have explored the evolution of ribosome specificity in translation factors. Employing presteady state fast kinetics using quench flow, we have quantitatively characterized the specificity of two sequence-reconstructed 1.3- to 3.3-Gy-old ancestral EF-Tus toward two unrelated bacterial ribosomes, mesophilic Escherichia coil and thermophilic Thermus thermophilus. Although the modern EF-Tus show clear preference for their respective ribosomes, the ancestral EF-Tus show similar specificity for diverse ribosomes. In addition, despite increase in the catalytic activity with temperature, the ribosome specificity of the thermophilic EF-Tus remains virtually unchanged. Our kinetic analysis thus suggests that EF-Tu proteins likely evolved from the catalytically promiscuous, "generalist" ancestors. Furthermore, compatibility of diverse ribosomes with the modern and ancestral EF-Tus suggests that the ribosomal core probably evolved before the diversification of the EF-Tus. This study thus provides important insights regarding the evolution of modern translation machinery.
In order to characterize the molecular bases of mineralizing cell evolution, we targeted type X collagen, a nonfibrillar network forming collagen encoded by the Col10a1 gene. It is involved in the process of endochondral ossification in ray-finned fishes and tetrapods (Osteichthyes), but until now unknown in cartilaginous fishes (Chondrichthyes). We show that holocephalans and elasmobranchs have respectively five and six tandemly duplicated Col10a1 gene copies that display conserved genomic synteny with osteichthyan Col10a1 genes. All Col10a1 genes in the catshark Scyliorhinus canicula are expressed in ameloblasts and/or odontoblasts of teeth and scales, during the stages of extracellular matrix protein secretion and mineralization. Only one duplicate is expressed in the endoskeletal (vertebral) mineralizing tissues. We also show that the expression of type X collagen is present in teeth of two osteichthyans, the zebrafish Danio rerio and the western clawed frog Xenopus tropicalis, indicating an ancestral jawed vertebrate involvement of type X collagen in odontode formation. Our findings push the origin of Col10a1 gene prior to the divergence of osteichthyans and chondrichthyans, and demonstrate its ancestral association with mineralization of both the odontode skeleton and the endoskeleton.
Gene content has been shown to contain a strong phylogenetic signal, yet its usage for phylogenetic questions is hampered by horizontal gene transfer and parallel gene loss and until now required completely sequenced genomes. Here, we introduce an approach that allows the phylogenetic signal in gene content to be applied to any set of sequences, using signature genes for phylogenetic classification. The hundreds of publicly available genomes allow us to identify signature genes at various taxonomic depths, and we show how the presence of signature genes in an unspecified sample can be used to characterize its taxonomic composition. We identify 8,362 signature genes specific for 112 prokaryotic taxa. We show that these signature genes can be used to address phylogenetic questions on the basis of gene content in cases where classic gene content or sequence analyses provide an ambiguous answer, such as for Nanoarchaeum equitans, and even in cases where complete genomes are not available, such as for metagenomics data. Cross-validation experiments leaving out up to 30% of the species show that approximately 92% of the signature genes correctly place the species in a related clade. Analyses of metagenomics data sets with the signature gene approach are in good agreement with the previously reported species distributions based on phylogenetic analysis of marker genes. Summarizing, signature genes can complement traditional sequence-based methods in addressing taxonomic questions.
Novel sequences (NSs), not present in the human reference genome, are abundant and remain largely unexplored. Here, we utilize de novo assembly to study NS in 1,000 Swedish individuals first sequenced as part of the SweGen project revealing a total of 46 Mb in 61,044 distinct contigs of sequences not present in GRCh38. The contigs were aligned to recently published catalogs of Icelandic and Pan-African NSs, as well as the chimpanzee genome, revealing a great diversity of shared sequences. Analyzing the positioning of NS across the chimpanzee genome, we find that 2,807 NS align confidently within 143 chimpanzee orthologs of human genes. Aligning the whole genome sequencing data to the chimpanzee genome, we discover ancestral NS common throughout the Swedish population. The NSs were searched for repeats and repeat elements: revealing a majority of repetitive sequence (56%), and enrichment of simple repeats (28%) and satellites (15%). Lastly, we align the unmappable reads of a subset of the thousand genomes data to our collection of NS, as well as the previously published Pan-African NS: revealing that both the Swedish and Pan-African NS are widespread, and that the Swedish NSs are largely a subset of the Pan-African NS. Overall, these results highlight the importance of creating a more diverse reference genome and illustrate that significant amounts of the NS may be of ancestral origin.
Mitigating trade-offs between different resource-utilization functions is key to an organism's ecological and evolutionary success. These trade-offs often reflect metabolic constraints with a complex molecular underpinning; therefore, their consequences for evolutionary processes have remained elusive. Here, we investigate how metabolic architecture induces resource-utilization constraints and how these constraints, in turn, elicit evolutionary specialization and diversification. Guided by the metabolic network structure of the bacterium Lactococcus cremoris, we selected two carbon sources (fructose and galactose) with predicted coutilization constraints. By evolving L. cremoris on either fructose, galactose, or a mix of both sugars, we imposed selection favoring divergent metabolic specializations or coutilization of both resources, respectively. Phenotypic characterization revealed the evolution of either fructose or galactose specialists in the single-sugar treatments. In the mixed-sugar regime, we observed adaptive diversification: both specialists coexisted, and no generalist evolved. Divergence from the ancestral phenotype occurred at key pathway junctions in the central carbon metabolism. Fructose specialists evolved mutations in the fbp and pfk genes that appear to balance anabolic and catabolic carbon fluxes. Galactose specialists evolved increased expression of pgmA (the primary metabolic bottleneck of galactose metabolism) and silencing of ptnABCD (the main glucose transporter) and ldh (regulator/enzyme of downstream carbon metabolism). Overall, our study shows how metabolic network architecture and historical contingency serve to predict targets of selection and inform the functional interpretation of evolved mutations. The elucidation of the relationship between molecular constraints and phenotypic trade-offs contributes to an integrative understanding of evolutionary specialization and diversification.
Methylation is a common posttranslational modification of arginine and lysine in eukaryotic proteins. Methylproteomes are best characterized for higher eukaryotes, where they are functionally expanded and evolved complex regulation. However, this is not the case for protist species evolved from the earliest eukaryotic lineages. Here, we integrated bioinformatic, proteomic, and drug-screening data sets to comprehensively explore the methylproteome of Giardia duodenalis-a deeply branching parasitic protist. We demonstrate that Giardia and related diplomonads lack arginine-methyltransferases and have remodeled conserved RGG/RG motifs targeted by these enzymes. We also provide experimental evidence for methylarginine absence in proteomes of Giardia but readily detect methyllysine. We bioinformatically infer 11 lysine-methyltransferases in Giardia, including highly diverged Su(var)3-9, Enhancer-of-zeste and Trithorax proteins with reduced domain architectures, and novel annotations demonstrating conserved methyllysine regulation of eukaryotic elongation factor 1 alpha. Using mass spectrometry, we identifymore than 200methyllysine sites in Giardia, including in species-specific gene families involved in cytoskeletal regulation, enriched in coiled-coil features. Finally, we use known methylation inhibitors to show that methylation plays key roles in replication and cyst formation in this parasite. This study highlights reduced methylation enzymes, sites, and functions early in eukaryote evolution, including absent methylarginine networks in the Diplomonadida. These results challenge the view that arginine methylation is eukaryote conserved and demonstrate that functional compensation of methylarginine was possible preceding expansion and diversification of these key networks in higher eukaryotes.
The nearly neutral theory of molecular evolution predicts that small populations should accumulate deleterious mutations at a faster rate than large populations. The analysis of nonsynonymous (dN) versus synonymous (dS) substitution rates in birds versus mammals, however, has provided contradictory results, questioning the generality of the nearly neutral theory. Here we analyzed the impact of life history traits, taken as proxies of the effective population size, on molecular evolutionary and population genetic processes in amniotes, including the so far neglected reptiles. We report a strong effect of species body mass, longevity, and age of sexual maturity on genome-wide patterns of polymorphism and divergence across the major groups of amniotes, in agreement with the nearly neutral theory. Our results indicate that the rate of protein evolution in amniotes is determined in the first place by the efficiency of purifying selection against deleterious mutations-and this is true of both radical and conservative amino acid changes. Interestingly, the among-species distribution of dN/dS in birds did not follow this general trend: dN/dS was not higher in large, long-lived than in small, short-lived species of birds. We show that this unexpected pattern is not due to a more narrow range of life history traits, a lack of correlation between traits and Ne, or a peculiar distribution of fitness effects of mutations in birds. Our analysis therefore highlights the bird dN/dS ratio as a molecular evolutionary paradox and a challenge for future research.
The Sahel/Savannah belt harbors diverse populations with different demographic histories and different subsistence patterns. However, populations from this large African region are notably under-represented in genomic research. To investigate the population structure and adaptation history of populations from the Sahel/Savannah space, we generated dense genome-wide genotype data of 327 individuals-comprising 14 ethnolinguistic groups, including 10 previously unsampled populations. Our results highlight fine-scale population structure and complex patterns of admixture, particularly in Fulani groups and Arabic-speaking populations. Among all studied Sahelian populations, only the Rashaayda Arabic-speaking population from eastern Sudan shows a lack of gene flow from African groups, which is consistent with the short history of this population in the African continent. They are recent migrants from Saudi Arabia with evidence of strong genetic isolation during the last few generations and a strong demographic bottleneck. This population also presents a strong selection signal in a genomic region around the CNR1 gene associated with substance dependence and chronic stress. In Western Sahelian populations, signatures of selection were detected in several other genetic regions, including pathways associated with lactase persistence, immune response, and malaria resistance. Taken together, these findings refine our current knowledge of genetic diversity, population structure, migration, admixture and adaptation of human populations in the Sahel/Savannah belt and contribute to our understanding of human history and health.
The type IV secretion system (TFSSs) is a multifunctional family of translocation pathways that mediate the transfer of DNA among bacteria and deliver DNA and proteins to eukaryotic cells during bacterial infections. Horizontal transmission has dominated the evolution of the TFSS, as demonstrated here by a lack of congruence between the tree topology inferred from components of the TFSS and the presumed bacterial species divergence pattern. A parsimony analysis suggests that conjugation represents the ancestral state and that the divergence from conjugation to secretion of effector molecules has occurred independently at multiple sites in the tree. The result shows that the nodes at which functional shifts have occurred coincide with those of horizontal gene transfers among distantly related bacteria. We suggest that it is the transfer between species that paved the way for the divergence of the TFSSs and discuss the general role of horizontal gene transfers for the evolution of novel gene functions.
Selection on codon usage bias is well documented in a number of microorganisms. Whether codon usage is also generally shaped by natural selection in large organisms, despite their relatively small effective population size (Ne), is unclear. In animals, the population genetics of codon usage bias has only been studied in a handful of model organisms so far, and can be affected by confounding, nonadaptive processes such as GC-biased gene conversion and experimental artefacts. Using population transcriptomics data, we analyzed the relationship between codon usage, gene expression, allele frequency distribution, and recombination rate in 30 nonmodel species of animals, each from a different family, covering a wide range of effective population sizes. We disentangled the effects of translational selection and GC-biased gene conversion on codon usage by separately analyzing GC-conservative and GC-changing mutations. We report evidence for effective translational selection on codon usage in large-Ne species of animals, but not in small-Ne ones, in agreement with the nearly neutral theory of molecular evolution. C- and T-ending codons tend to be preferred over synonymous G- and A-ending ones, for reasons that remain to be determined. In contrast, we uncovered a conspicuous effect of GC-biased gene conversion, which is widespread in animals and the main force determining the fate of AT↔GC mutations. Intriguingly, the strength of its effect was uncorrelated with Ne.
Experimental evolution is a powerful tool to study genetic trajectories to antibiotic resistance under selection. A confounding factor is that outcomes may be heavily influenced by the choice of experimental parameters. For practical purposes (minimizing culture volumes), most experimental evolution studies with bacteria use transmission bottleneck sizes of 5 x 10(6) cfu. We currently have a poor understanding of how the choice of transmission bottleneck size affects the accumulation of deleterious versus high-fitness mutations when resistance requires multiple mutations, and how this relates outcome to clinical resistance. We addressed this using experimental evolution of resistance to ciprofloxacin in Escherichia coli. Populations were passaged with three different transmission bottlenecks, including single cell (to maximize genetic drift) and bottlenecks spanning the reciprocal of the frequency of drug target mutations (10(8) and 10(10)). The 10(10) bottlenecks selected overwhelmingly mutations in drug target genes, and the resulting genotypes corresponded closely to those found in resistant clinical isolates. In contrast, both the 10(8) and single-cell bottlenecks selected mutations in three different gene classes: 1) drug targets, 2) efflux pump repressors, and 3) transcription-translation genes, including many mutations with low fitness. Accordingly, bottlenecks smaller than the average nucleotide substitution rate significantly altered the experimental outcome away from genotypes observed in resistant clinical isolates. These data could be applied in designing experimental evolution studies to increase their predictive power and to explore the interplay between different environmental conditions, where transmission bottlenecks might vary, and resulting evolutionary trajectories.
In the fungal kingdom, the evolution of mating systems is highly dynamic, varying even among closely related species. Rearrangements in the mating-type (mat) locus, which contains the major regulators of sexual development, are expected to underlie the transitions between self-sterility (heterothallism) and self-fertility (homothallism). However, both the genetic mechanisms and the direction of evolutionary transitions in fungal mating systems are under debate. Here, we present new sequences of the mat locus of four homothallic and one heterothallic species of the model genus Neurospora (Ascomycota). By examining the patterns of synteny among these sequences and previously published data, we show that the locus is conserved among heterothallic species belonging to distinct phylogenetic clades, while different gene arrangements characterize the four homothallic species. These results allowed us to ascertain a heterothallic ancestor for the genus, confirming the prediction of the dead-end theory on unidirectional transitions toward selfing. We show that at least four shifts from heterothallism to homothallism have occurred in Neurospora, three of which involve the acquisition of sequences of both mating types into the same haploid genome. We present evidence for two genetic mechanisms allowing these shifts: translocation and unequal crossover. Finally, we identified two novel retrotransposons and suggest that these have played a major role in mating-system transitions, by facilitating multiple rearrangements of the mat locus.
Accessory genes are variably present among members of a species and are a reservoir of adaptive functions. In bacteria, differences in gene distributions among individuals largely result from mobile elements that acquire and disperse accessory genes as cargo. In contrast, the impact of cargo-carrying elements on eukaryotic evolution remains largely unknown. Here, we show that variation in genome content within multiple fungal species is facilitated by Starships, a newly discovered group of massive mobile elements that are 110 kb long on average, share conserved components, and carry diverse arrays of accessory genes. We identified hundreds of Starship-like regions across every major class of filamentous Ascomycetes, including 28 distinct Starships that range from 27 to 393 kb and last shared a common ancestor ca. 400 Ma. Using new long-read assemblies of the plant pathogen Macrophomina phaseolina, we characterize four additional Starships whose activities contribute to standing variation in genome structure and content. One of these elements, Voyager, inserts into 5S rDNA and contains a candidate virulence factor whose increasing copy number has contrasting associations with pathogenic and saprophytic growth, suggesting Voyager's activity underlies an ecological trade-off. We propose that Starships are eukaryotic analogs of bacterial integrative and conjugative elements based on parallels between their conserved components and may therefore represent the first dedicated agents of active gene transfer in eukaryotes. Our results suggest that Starships have shaped the content and structure of fungal genomes for millions of years and reveal a new concerted route for evolution throughout an entire eukaryotic phylum.