Multiple introgressions shape mitochondrial evolutionary history in Drosophila paulistorum and the Drosophila willistoni group

Hybridization and the consequent introgression of genomic elements is an important source of genetic diversity for biological lineages. This is particularly evident in young clades in which hybrid incompatibilities are still incomplete and mixing between species is more likely to occur. Drosophila paulistorum, a representative of the Neotropical Drosophila willistoni subgroup, is a classic model of incipient speciation. The species is divided into six semispecies that show varying degrees of pre- and post-mating incompatibility with each other. In the present study, we investigate the mitochondrial evolutionary history of D. paulistorum and the willistoni subgroup. For that, we perform phylogenetic and comparative analyses of the complete mitochondrial genomes and draft nuclear assemblies of 25 Drosophila lines of the willistoni and saltans species groups. Our results show that the mitochondria of D. paulistorum are polyphyletic and form two non-sister clades that we name α and β. Identification and analyses of nuclear mitochondrial insertions further reveal that the willistoni subgroup has an α-like mitochondrial ancestor and indicate that both the α and β mitochondria of D. paulistorum were acquired through introgression from unknown fly lineages of the willistoni subgroup. We also uncover multiple mitochondrial introgressions across D. paulistorum semispecies and generate novel insight into the evolution of the species.


INTRODUCTION 20
Hybridization is increasingly recognized for its importance in promoting genetic diversity as 21 well as for its consequent influence on adaptation and speciation. Large-scale genome 22 sequencing is showing that the process is not only widespread but also relatively common,

34
The analysis of introgressed genetic elements can contribute to our understanding of past 35 evolutionary events that shaped the present biodiversity (Ottenburghs, 2020). Sequence     Turissini and Matute, 2017). In some cases, up to 10% of the genome is estimated 57 to have originated via introgression (Suvorov et al., 2022). Within Drosophila, young clades 58 represent particularly interesting study systems, as the lineages which form them are more 59 likely to hybridize or to have done so in the recent past (Coyne and Orr, 1989). One such 60 clade is the Drosophila willistoni species group, a recent radiation of Neotropical fruit flies 61 (Zanini et al., 2018) which comprises 24 species split into three subgroups: alagitans, 62 bocainensis, and willistoni. (Bächli, 2019). Among these, the willistoni subgroup is the best 63 studied and comprises several incipient species with varying degrees of inter-and intra-64 specific reproductive isolation (Burla et al., 1949;Ehrman and Powell, 1982;Winge, 1965).

5
In the present study, we investigate the evolutionary history of mitochondria in D. 94 paulistorum. We use complete mitochondrial genomes and draft whole-genome assemblies 95 of 25 Drosophila lines belonging to six species of the willistoni group and three species of 96 the closely related saltans group. By comparing mitochondrial and nuclear phylogenies we 97 uncover a complex evolutionary history involving multiple mitochondrial introgressions, both 98 ancient and recent. We discover that D. paulistorum mitochondria are polyphyletic and form 99 two distinct non-sister clades, which we name a and b. With our analyses of the a and b 100 genomes as well as of NUMTs found in the D. paulistorum genome, we show that the 101 willistoni subgroup has an a-like mitochondrial ancestor. Interestingly, we also find evidence 102 that both current D. paulistorum mitotypes are not native to the species and were likely 103 acquired through introgression. The current a mitotype appears to have introgressed prior to 104 the divergence of D. paulistorum and D. equinoxialis. As for the b mitotype, we suggest that      Freebayes with a range of parameters and converted to ace-format for manual curation in

165
Consed (Gordon et al., 1998). In cases where high-frequency variants existed, we used the 166 majority rule and base quality to select the consensus base.

167
The position and function of genes in the assembled mt-genomes were obtained by 168 extracting the gene sequences from the published D. willistoni mt-genome (GD-H4-1 line) 169 and using them as queries in a blastn search against each of our assembled mt-genomes.

170
The start and stop positions of each gene were subsequently checked manually in Artemis 171 (Rutherford et al., 2000) and

223
Additionally, contigs were classified as NUMTs if part of their sequence had similarity to the 224 Drosophila nuclear genome, as indicated by blast searches against the NCBI nt database.

225
The nuclear coverage of each line was defined as the average coverage of the 10 longest 226 BUSCO markers used in our nuclear phylogenetic analysis (Table S3). We also calculated 227 the percentage of bases of the a and b reference genomes that were covered by mapped 228 reads using SAMtools mpileup. The result of these analyses is summarized in Table S4.   considerably in the number of contigs, total length and quality (Table S1). major clades, which we designate a and b (Fig 1A). The mt-genomes of the two clades are

270
Branches marked by transverse lines were shortened and do not follow the scale bar.

271
We found all deep nodes of the mitochondrial tree to be highly supported, except one (Fig   272   1A). The unsupported node is essential for our understanding of the phylogenetic

280
We tested if recombination between mitotypes could be affecting the phylogenetic 281 signal by generating single gene trees for each of the 13 mitochondrial protein-coding genes 282 and looking for conflicts between their topologies. We found that most of the trees had poor 283 resolution due to the short sequence length of individual genes and low divergence between 284 lines. Only two gene trees (ND2 and ND5) had good support for the node defining the 285 relationship between a+eq and the willistoni clade (Bootstrap values > 80) (Fig S1). The

286
ND2 tree featured α+eq as sister to the willistoni clade (BS = 80%) (Fig S1A) while the ND5 287 tree placed α+eq as sister to the b clade (BS = 92%) (Fig S1B). Among all single gene   300   1A, Fig S2A). Using the first two codon positions of the nucleotide sequences of the protein-301 coding genes also resulted in considerably higher support for a+eq as sister to the willistoni 302 clade (ML 99%, Bayesian 1) (Fig 1A, Fig S2B). In contrast, the tree based only on the third 303 codon position showed high (Bayesian posterior probability 0.99) or fairly good (ML 304 bootstrap 90%) support for a+eq and b being sister clades (Fig 2). Thus, given the 305 conflicting topologies, we conclude that nucleotide composition bias is likely causing the low 306 support of the node defining the relationship between the a+eq and willistoni clades in the 307 mitochondrial phylogeny.

308
Since the third codon position is more prone to bias than the first and second

311
This is the same topology seen in the whole mt-genome tree (Fig 1A). Thus, we confirmed 12 that the mitochondria of D. paulistorum are polyphyletic and that a+eq is sister to the 313 willistoni clade rather than to b (Fig 1A). The result also indicates that the ancestral 314 mitochondrion of the willistoni subgroup is more similar to a (thus "a-like") than to b.  (Fig 1A, Fig S2B). The change in topology is likely a result of

325
To investigate the evolution of D. paulistorum mitochondria in more detail, we also inferred a 326 reliable nuclear phylogeny that reflects the evolutionary history of the species and its 327 semispecies (Fig 1B). The tree was based on amino acid sequences of 692 BUSCO 328 markers ('Diptera' dataset) which were recovered as complete single copies in all of our draft 329 nuclear assemblies ( Table S3). The resulting nuclear phylogeny is highly supported in most

341
equinoxialis occupy within the willistoni subgroup in the nuclear phylogeny (Fig 1). This

349
Finally, we note that D. tropicalis and D. insularis switch positions between the 350 nuclear and mitochondrial phylogenies (Fig 1).

378
We are currently investigating the genomic and biological properties of this NUMT and 379 intend to publish our results in a future publication.

380
The second NUMT of interest spans a genomic region that includes parts of the ND2 381 and CO1 genes. This NUMT was found in 13 of the 14 analyzed D. paulistorum lines and 382 using it for phylogenetic analysis revealed a relationship between lines that is similar to that 383 seen in the nuclear phylogeny (Fig 3, Fig 1B).  398   1B). The ancient origin of this NUMT combined with its phylogenetic position close to a and 399 the willistoni clade (Fig 3) corroborates the hypothesis of an a-like ancestral mitotype in D.

401
Finally, we observed that the genome assembly of D. equinoxialis contains a few b-402 mitotype-related contigs which were not found in D. insularis, D. tropicalis or D. willistoni.

403
The coverage of these contigs is similar to the nuclear average for the line but they do not 404 contain matches to non-mitochondrial sequences. By investigating the coverage of these 405 contigs, we saw that b-related reads present in our D. equinoxialis sample cover the whole 406 extent of a mt-genome and result in a full mt-genome when assembled (Table S4)

504
We also observe some differences between our results and those of previous studies 505 with regards to which mitotype is associated with certain semispecies, as inferred by an

573
The authors declare that they have no competing financial or personal relationships that 574 could have influenced the work reported in this paper.