The cuticular and epidermal layers contain nearly all of the phytonutrients in the fruit such as anthocyanins, proanthocyanidins, and flavonols. Previous studies on blueberry have reported that these groups of compounds may have diverse health-promoting properties, including controlling diabetes, improving cognitive function, and inhibiting tumor growth. With the growing awareness of the potential health benefits of blueberry and increasing consumer demand, a primary goal of the blueberry research community is to develop cultivars with improved antioxidant levels along with other important fruit quality traits. However, despite its economic importance and health benefit potential, breeding efforts to improve fruit quality traits in blueberry have been slow due, in large part, to the lack of genomic resources. A draft genome for a wild diploid species of blueberry was previously assembled. However, that draft genome consists of a large number of scaffolds , high percentage of gaps in a ∼393.16 Mb assembly, and, most importantly, does not reflect the genome complexity of the economically important and cultivated tetraploid high bush blueberry. Here, we present the first chromosome-scale genome assembly of tetraploid high bush blueberry. The haplotype-phased assembly consists of 48 pseudomolecules with ∼1.68 Gb of assembled sequence, ∼1.29% gaps, and an average of 32,140 protein coding genes per haplotype . A haplotype is the complete set of DNA within the nucleus of an individual that was inherited from one parent. We leveraged this genome to examine the origin of the polyploid event, gain insights into the underlying genetics of fruit development,square pot and identify candidate genes involved in the biosynthesis of metabolites contributing to superior fruit quality. Furthermore, we examined gene expression patterns among the four haplotypes in high bush blueberry.
This analysis uncovered the presence of spatial-temporal specific dominantly expressed sub-genomes. These findings and the reference genome will serve as a powerful platform to further investigate ”sub-genomes dominance” , facilitate the discovery and analysis of genes encoding economically important traits, and ultimately enable molecular breeding efforts in blueberry.Our goal was to obtain a high-quality reference genome for the high bush blueberry cultivar ”Draper,” which is widely grown around the world due to its excellent fruit quality. We sequenced the genome using a combination of both 10× Genomics and Illumina , totaling 324X coverage of the genome . These data were assembled and scaffolded using the software package DenovoMAGIC3 . The genome was further scaffolded to chromosome-scale using Hi-C data with the HiRise pipeline . The total length of the final assembly is 1,679,081,592 bases distributed across 48 chromosome-level pseudomolecules . The final assembly size falls within the estimated genome size of ”Draper” based on flow cytometry . The genome was annotated using a combination of evidence based and ab initio gene prediction using the MAKER-P pipeline. RNA sequencing data from 13 different gene expression libraries, representing unique organs, developmental stages, and treatments , and publicly available transcriptome and expressed sequence tags data of V. corymbosum in theNational Center for Biotechnology Information were used as transcript evidence. Protein sequences from Arabidopsis thaliana , Actinidia chinensis, and UniprotKB plant database were also used as evidence for genome annotation. We predicted a total of 128,559 protein-coding genes. Benchmarking Universal SingleCopy Orthologs analysis v.3 was performed to assess the completeness of the assembly and qual-ity of the genome annotation. The annotated gene set contains 1,394 out of 1,440 BUSCO genes .
Functional annotation was assigned using Basic Local Alignment Search Tool 2GO to reference pathways in the Kyoto Encyclopedia of Genes and Genomes database. Comparative genomic analyses assigned genes to 16,909 orthogroups shared by six phylogenetically diverse plant species including five eudicots , each with distinct fruit types, and Zea mays as the outgroup. Transposable elements , both Class I and II, were identified and classified in the genome using the protocol described by Campbell et al.. Overall, 44.3% of the blueberry genome is composed of TEs . Consistent with previous reports, the most abundant Class I TEs were long terminal repeat retrotransposons , specifically the super family LTR/Gypsy followed by LTR/Copia, while for Class II transposons, the miniature inverted repeat super family hAT was the most abundant. The quality of the genome was further assessed by examining the assembly continuity of repeat space using the LTR Assembly Index deployed in the LTR retriever package. The adjusted LAI score of this blueberry genome is 14, and based on the LAI classification, this score is within the range of ”reference” quality . Estimation of the regional LAI in 3 Mb sliding windows also showed that assembly continuity is uniform and of high quality across the entire genome.The origin of high bush blueberry from either a single or multiple diploid progenitor species is a long-standing question. Previous reports have suggested that high bush blueberry may be an autotetraploid based on the segregation ratios of certain traits. However, an analysis of chromosome pairing among different cultivars revealed largely bivalent pairing during metaphase I, similar to patterns observed in known allopolyploids. To gain further insights into the polyploid history of high bush blueberry, we calculated sequence similarity and synonymous substitution rates between genes in homoeologous regions across the genome. The average sequence similarity is ∼96.3% among syntenic homoeologous genes. The average Ks divergence between syntenic homoeologous genes is ∼0.036 per synonymous site. The average Ks divergence between homoeologous genes can be used to not only identify polyploid events but also to estimate the divergence of the diploid progenitors from their most recent common ancestor .
The Ks divergence between homoeologs in high bush blueberry is six times higher than that between orthologs of two A. thaliana lines that diverged roughly 200,000 years ago. Based on the relatively high Ks rate between homoeologous regions across the genome, this suggests that tetraploid blueberry is unlikely an autopolyploid that was formed from somatic doubling or failure during meiosis involving a single individual . Furthermore, comparative genomics revealed that homoeologous regions are highly collinear, except a few notable chromosome-level translocations . These translocations were manually inspected and verified with both the raw sequence and Hi-C data. Rapid changes among homoeologous chromosomes is known to occur in newly formed allopolyploids. We also assessed the level of similarity and content of LTR transposable elements among the four haplotypes. As the most prevalent transposable elements in plants, LTR-RTs undergo continual ”bloat and purge” cycles within most plant genomes, resulting in a unique signature that may distinguish sub-genomes in an allopolyploid. To examine the evolutionary history of LTR-RTs in the high bush blueberry genome,drainage collection pot we calculated the mean sequence identity of LTR sequences among each of the four haplotypes . This analysis revealed that the majority of more recent LTRs are sub-genomes specific in high bush blueberry. In other words, the data suggest that LTRs proliferated independently in the genomes of each diploid progenitor , following the divergence from their MRCA, but prior to polyploidy. The pair-wise LTR difference of the two ancestors is 2.4%–2.6%. With Jukes-Cantor correction and synonymous substitution rate of , the estimated time of divergence for the diploid progenitors from their MRCA is between 0.94 to 1.02 million years ago. These date estimates and the average speciation rate for temperate angiosperms suggests that high bush blueberry is either an allopolyploid derived from two closely related species or an autopolyploid derived from the hybridization of two highly divergent populations of a single species. To date the most recent polyploid event in high bush blueberry, we analyzed the unique LTR insertions present in each haplotype. Based on the pair-wise LTR difference between the four haplotypes, which is of 0.81%–0.89%, the polyploid event occurred approximately 313 to 344 thousand years ago. The substitution rate of LTR sequences is likely different from that of protein coding genes. Thus, more accurate date estimates will be possible once the LTR substition rate in high bush blueberry becomes available from future studies. After allopolyploidization, one of the parental genomes often emerges with significantly greater gene content and a greater number of more highly expressed genes. The emergence of a dominant sub-genomes in an allopolyploid is hypothesized to resolve genetic and epigenetic conflicts that may arise from the merger of highly divergent sub-genomes into a single nucleus. However, classic autopolyploids, formed by somatic doubling, are not expected to face these challenges or exhibit sub-genome dominance since all genomic copies were contributed by a single parent. This was recently supported by genome-wide analyses of a putative ancient autopolyploid. It’s important to note that sub-genome expression dominance could still be observed in intraspecific hybrids and autopolyploids formed by parents with highly differentiated genomes. To explore this in high bush blueberry, we compared gene content and expression-level patterns between homoeologous chromosomes .
While gene content levels were largely similar among homoeologous chromosomes, with a few notable exceptions , gene expression levels were highest for one of the four chromosome copies in the majority of gene expression libraries . Noteworthy, in the three fruit libraries, the most dominantly expressed often became the least expressed among the four homoeologous chromosomes or among the two lowest expressed copies . The most dominantly expressed in other tissues remained so in developing fruit for only two of the chromosomes . These homoeologous chromosome sets have undergone the most structural variation, which may have modified gene expression patterns . These analyses are based on a single biological replicate from a plant grown in a growth chamber. Thus, the findings reported here should be considered as preliminary. Future studies should further explore sub-genome expression dominance in high bush blueberry, including at the individual homoeolog level, with additional biological replicates and across multiple environments.The progression of fruit development in blueberry is marked with visible external and internal morphological changes including in size and color . We profiled gene expression in fruit across seven developmental stages from the earliest stage through the final stage to identify genes differentially expressed during fruit development. Distinctive transitions in gene expression were observed between early fruit growth to start of color development and complete color change to ripened fruit. We found that the majority of genes upregulated during early fruit development were involved in phenylpropanoid biosynthesis, nitrogen metabolism, as well as cutin, suberin, and wax biosynthesis . In contrast, genes involved in starch and sugar metabolism were highly expressed at the onset of and during fruit ripening . Moreover, principal component analysis showed the first two components accounted for 84% of the variation and separated the developmental stages into three groups: early developmental stages, petal fall and small green fruit; middle developmental stages, expanding green and pink fruit; and ,late developmental stages, complete fruit color change, unripe and ripe fruit . Genes associated with cell division, cell wall synthesis, and transport were found to be expressed the highest during the earliest developmental stages , which is consistent with previous work on other fruit species. In addition to genes regulating cell proliferation, defense response-related genes were also highly upregulated during the earliest developmental stages. During the middle developmental stages, genes regulating cell expansion, seed development, and secondary metabolite biosynthesis were highly expressed. During late developmental stages and as the berry transitions to ripening, late embryogenesis, transmembrane transport, defense, secondary metabolite biosynthesis, and abscisic acid related genes were highly over represented. Blueberry is considered a climacteric fruit; however,unlike the ethylene-driven fruit ripening in other climacteric species, abscisic acid has been demonstrated to regulate fruit ripening in blueberry. In summary, global gene expression patterns mirror the morphological and physiological changes observed during blueberry development .We assessed the total antioxidant capacity in mature fruit across a blueberry diversity panel and the abundance of secondary metabolites responsible for its antioxidant activity in developing fruit. A diversity panel, composed of 71 high bush blueberry cultivars and 13 wild Vaccinium species, was evaluated for total antioxidant capacity in mature fruit using the oxygen radical absorbance capacity assay. Similar to previous reports, we observed a wide range in antioxidant capacity across cultivars, with ”Draper” having the highest levels of antioxidants . The observed variation in antioxidants among high bush blueberry, consistent with our results, were previously shown not to correlate with fruit weight or size. However, in another study, a correlation between fruit size and total anthocyanin levels was identified within a few select high bush blueberry cultivars but not across other Vaccinium species or blackberry. This inconsistency is likely due to sample size differences between studies.