Predicted genome size was positively correlated with TE content

The disruption in stoichiometry of highly dosage-sensitive components of macromolecular complexes and pathways, across regulatory, signaling and metabolic networks, can negatively affect fitness or be lethal. Thus, partial to complete dominance of one subgenome over the other subgenome may help resolve genetic incompatibilities. Previous studies of ancient allopolyploids revealed that one subgenome may be dominantly expressed and over millions of years retain a significantly greater number of genes. Subgenome dominance has been observed in many allopolyploids, to varying amounts, but not in all allopolyploids nor in any autopolyploids . Thus, the underlying genetic and/or epigenetic mechanisms driving expression dominance remains poorly understood. Previous studies have shown that densities of transposable elements near genes are predictive of which subgenome is more highly expressed. However, if and how much genetic divergence of the diploid progenitors contributes to subgenome expression dominance has yet been evaluated in allopolyploids and especially in vertebrates. An additional whole genome duplication, termed TGD or 3 R, occurred in the teleosts fish lineage, estimated 225–350 million years ago, at the base of the largest and most diverse group of vertebrates. Some clades including Salmonidae, Cyprinidae and Corydoradinae have undergone their own, independent fourth rounds of polyploidization. Cyprinids, the carp family, Grow bag for blueberry plants contain roughly 600 polyploid species derived from potentially at least thirteen polyploidization events.

The family is delineated into eleven subfamilies, including Cyprininae that consists of eleven tribes, of which seven are largely composed of polyploids, Thus, cyprinids are an ideal model family for investigating subgenome evolution following multiple independent polyploid events within vertebrates. To date, to the best of our knowledge, subgenome-resolved assemblies of only three allopolyploid species from the Cyprinini tribe are publicly available, including the common carp, goldfish, and the hexaploid Prussian carp. Some evidence for subgenome expression dominance was uncovered from the analysis of both the common carp and goldfish genomes. However, no evidence for subgenome dominance at the transcriptome level was observed following the analysis of the hexaploid Prussian carp genome. Comparative genomic analysis of the Prussian carp revealed biased duplicate gene retention of certain genes towards one subgenome. This suggests that the genomes of cyprinine allopolyploid cyprinid fishes may exhibit subgenome dominance to varying levels. In this context, the role of transposable element differences, parental effects and/or genetic divergence of diploid progenitor species contributing to observed subgenome expression dominance remains poorly understood. Therefore, the evaluation of multiple independently derived cyprinine allopolyploids can provide valuable new insights into the underlying mechanisms of subgenome dominance. A robust phylogenomic framework for the subfamily Cyprininae is needed to phylogenetically localize polyploidy events and investigate the underlying genetic mechanisms contributing to subgenome dominance in allopolyploid fishes. However, the maternal and paternal diploid progenitors of known polyploids in this group remain largely unknown.

A recent study tried to address this point within this group using three single-copy nuclear loci, but the phylogenetic history of these three genes may not reflect the true history of species relationships within this subfamily. Phylogenomic analyses based on hundreds of orthologous markers from across the genome should reflect a more accurate evolutionary history of the species and more likely to reveal the diploid progenitors of allopolyploids. In the present study, we thus aim to resolve the phylogenetic relationships among several key Cyprininae species, uncover the polyploid origin of three allopolyploid species, identify the closest extant relatives of their diploid progenitors and investigate subgenome dominance and its genetic basis in the allopolyploids. To accomplish these goals, we assemble de novo high-quality reference genomes of twenty-one cyprinid fishes from across five subfamilies using PacBio HiFi long reads. Furthermore, we generate transcriptome data from several distinct organs to investigate subgenome expression dominance in three allotetraploids. Our study provides new insights into the evolutionary history of Cyprininae, including the identification of maternal and paternal diploid progenitor lineages of three independently formed allopolyploids, the genetic basis of subgenome dominance in these allopolyploids, and new large-scale genomic resources for the community as a foundation for future studies.Whole genomes of 21 cyprinid fishes were sequenced with PacBio CCS reads with an average of 32.34-fold coverage and Illumina paired-end 150 bp reads with an average 66.86- fold coverage, in total yielding 2.24 trillion base pairs of raw read data . These datasets were de novo assembled using Hifiasm, yielding high-quality genomes with an average contig N50 size of 23 Mb . The new assemblies ranged in size from 0.81 to 1.83 Gbp, similar to the estimated genome sizes obtained from k-mer analysis of Illumina reads .

A high percentage of Illumina reads aligned against the assembled contigs and high BUSCO scores , suggesting that the biggest proportion of the genomes was assembled . Previous phylogenetic work using three single-copy nuclear loci suggested that three species Procypris rabaudi , Spinibarbus sinensis and Luciobarbus capito are likely tetraploids. To generate chromosome-level genomes, high-throughput chromosome conformation capture reads, at ~100-fold coverage per haplotype, were obtained and scaffolded for each tetraploid with the ALLHiC algorithm. In total, 94.43%, 97.56% and 98.83% of all bases corresponding to S. sinensis, P. rabaudi and L. capito genomes were assigned to 50 pseudo-molecules after manual curation . Strong contact signals of the Hi-C data for all chromosomes of each genome suggest high quality of chromosome-level scaffolding . Homology-based and RNA sequence-based gene predictions were used to annotate all genomes after masking transposable elements , simple sequence repeats , and tandem repeats. The final annotated gene numbers for the three allopolyploids, P. rabaudi, L. capito and S. sinensis, were 45,857, 43,211 and 49,999 , respectively, which were comparable to those of two famous cyprinid fishes common carp and goldfish. The gene number of the rest eighteen species ranged from 23,658 to 32,381, which are similar to the 24,770 for Onychostoma macrolepis and 27,263 for grass carp. BUSCO analysis was conducted to evaluate the completeness of these annotations, which contain an average of 91.6% complete BUSCO gene sets .The overall TE content in the 21 sequenced species ranged from 40.87% to 59.18% . The most abundant repeat class of all species was DNA transposons , of which TC1/mariner, hAT, and CMC were the three top enriched superfamilies . Long terminal repeats account for an average of 11.09% of the genomes, which is higher than reported for zebrafish. Most of our sequenced fishes contained similar long interspersed nuclear element content with that of zebrafish but fewer short interspersed nuclear elements  than zebrafish .We also observed that the median age of DNA transposon families in our sequenced genomes were typically older than those of both LTR and LINE families , which was also found in the zebrafish.Multiple alignments of orthologous genes between each tetraploid and O. macrolepis successfully identified two subgenomes, each of which included 25 chromosomes . To assign each chromosome to a subgenome, a method similar to SubPhaser, a novel subgenome-phasing algorithm using subgenome-specific k-mers as markers, was applied. The allopolyploid origin of several previously determined allopolyploid plants as well as the common carp and African clawed frog Xenopus laevis was supported using this strategy. Therefore, the presence of repetitive kmers, blueberry grow bag which are exclusively or highly enriched towards one subgenome, were sought for each of the three polyploids. We confirmed that two distinct subgenomes, termed ‘subP’ and ‘subM’ , of each tetraploid could be determined based on a suite of 15-mers with unique distribution patterns along each homoeologous chromosome pair, supporting an allotetraploid origin of these three species . To further verify the polyploid origin , we adapted another strategy that involves analyzing TE types and abundances that has been successfully employed to confirm the polyploid history of the African clawed frog, blueberry, sterlet sturgeon, the goldfish and Prussian carp.

This approach is based on the hypothesis that relics of unique transposon types and abundances specific to the two parental species can be used as markers to partition each chromosome to a particular subgenome in an allopolyploid. Frequency analyses of TEs identified between 8 and 16 transposon types in each polyploid genome that were enriched differentially in the subP and subM . These results collectively support an allopolyploid origin for these three polyploid fishes.To estimate the divergence time of each subgenome, we established one-to-one ortholog gene sets from two putative diploid ancestors and the subP and subM genomes of three allotetraploids and calculated the pairwise synonymous substitutions . The divergence-time of diploid progenitors , served as the upper bound estimate of the polyploid event, and can be deduced based on the Ks age distributions of the orthologous pairs . We found that the two subgenomes of L. capito diverged approximately 7.5 to 13.9 million years ago , which is the most recent dateestimate among the allopolyploids examined in this study . In comparison, the divergence of the P. rabaudi subgenomes is estimated at ~15 to 28 Mya. This estimate is similar to the previous divergence times estimates of the subgenomes of common carp and goldfish. The results from our phylogenetic analyses further confirmed that P. rabaudi, common carp and goldfish likely share a common polyploid event, with subP and subM of each species in monophyletic clades . Lastly, the divergence of the subgenomes of S. sinensis was estimated at 10 to 18.6 Mya . Therefore, these three allopolyploid cases, with varying divergence estimates among subgenomes , provides a suitable framework to examine whether genetic divergence of the diploid progenitors contributes to subgenome expression dominance. Mitochondrial genomes are almost exclusively inherited from maternal progenitors, whereas nuclear protein-coding genes are biparentally inherited. Therefore, a comparison of the mtDNA phylogenetic tree and nuclear gene trees enables the identification of maternal and paternal diploid progenitors for allopolyploids. Our phylogenetic analyses using Triplophysa bleekeri or zebrafish as an outgroup provide strongly supported estimates for species relationships and the monophyly of Cyprininae . Furthermore, these analyses revealed three independent polyploidization events: one shared by P. rabaudi, common carp, and goldfish , one in S. sinensis and one in L. capito , consistent with a previous study. Based on the aforementioned phylogenetic analyses and the mitochondrial tree , the subP and subM of these five species denotes the paternal and maternal subgenome, respectively. These analyses also supported three independent allopolyploid origins. The maternal subM of common carp, goldfish and P. rabaudi is most closely related to Tribe Barbini or Acrossocheilini, and the paternal subP is most closely related to Tribe Labeonini. Similarly, a closely related species of Acrossocheilini could have served as the diploid progenitor of the S. sinensis subM, whereas its subP was the descendent of an ancestral fish much older than Smiliogastrini. The formation of L. capito was probably the result of hybridization of two diploid relatives from Barbini. To further confirm the above conclusion, phylogenetic analyses with the whole-genome alignment of 13 species, the fourfold degenerate sites in 1669 genes and CDS of 1669 individual genes were performed. The topologies of all these trees were congruent with each other . Meanwhile, we also observed the differences between overall consesnus species tree and individual gene trees , implying that these topological conflicts may be as a result of incomplete lineage sorting and introgression.Generally, there are four major evolutionary fates for duplicated genes derived from polyploidy events, including 1. duplicate gene retention due to dosage-balance constraints or selection favoring increased dosage of gene products, 2. gene loss or pseudogenization of one duplicate copy, 3. subfunctionalization, the partitioning of ancestral gene functions among the two duplicate gene copies and 4. neofunctionalization, the evolution of novel gene functions in one or both duplicate gene copies. To investigate the frequency of each fate among ohnologs, we analyzed the expression levels across six tissues for a set of positionally conserved syntenic ohnologs that were present in single copy in the genomes of two diploids and retained in duplicate in all three allotetraploid genomes. We identified 4884 to 5,345 gene pairs that had expression patterns consistent with duplicate retention due to dosage-selection, 226 to 348 due to non-functionalization, 9 to 14 due to subfunctionalization, and 223 to 420 dueto neofunctionalization . Examples of expression divergence consistent with subfunctionalization and neofunctionalization for each allotetraploid are shown in Supplementary Fig. 20. However, we should notice that the low level of subfunctionalization inferred could be due to the relatively small number of tissues examined.