Superior fruit quality is also associated with sugar levels. During fruit ripening, sugar levels of the endocarp increase by importing hexose symplastically and/or apoplastically. Sugar transporters , sucrose transporter, and tonoplast sugar transporter have been demonstrated to regulate intercellular sugar transport in phloem and fruit.To the best of our knowledge, we are the first to report on the potential role of these genes during blueberry fruit development. In addition, homologs of A. thaliana TST1 and watermelon ClTST1 and ClTST3 were expressed during fruit ripening in blueberry. Elevated expression of a ClTST1 homolog was observed throughout fruit development, but the ClTST3 homolog showed very low expression. Another gene that is highly expressed during fruit maturation is vacuolar invertase. As described in other systems, its upregulation during fruit ripening coincided with the breakdown of starch to sucrose or a mixture of glucose and fructose, suggesting that it may be involved in the regulation of sugar accumulation in blueberry fruit. It was previously reported that vacuolar invertase modulates the hexose to sucrose ratio in ripening fruit. In addition, there are also two sugar transport protein homologs that exhibited developmental specific expression. However, their function remains largely unknown, thus, their potential role in sugar accumulation in the developing berry requires further investigation.Tandemly duplicated genes arise as a result of unequal crossing over or template slippage during DNA repair, exhibit high birth-death rates, grow bucket and typically are in co-regulated clusters in the genome.
Smaller scale duplications, which include tandem duplicates, are highly biased toward certain gene families including those involved in specialized metabolism. Furthermore, tandem duplications often results in the increased dosage of gene products and may improve the metabolic flux of ratelimiting steps in certain bio-synthetic pathways. Most genes associated with the biosynthesis of antioxidants have at least one tandem duplicate present in the highbush blueberry genome, with tandem array sizes ranging from 2 to 10 gene copies . The largest tandem arrays were found for HQT and HCT genes, which are co-regulated and involved in the CGA pathway . Differences in tandem array sizes were also observed between homoeologous chromosomes for various genes. For example, the C3H gene, which is involved in CGA biosynthesis , was present on all four homoeologous chromosomes but with varying tandem array sizes. One of the homoeologous chromosomes had two copies of C3H, while the other three homoeologous chromosomes had four copies. This suggests that copy number differences of C3H among subgenomes may be due to either selection for gene duplication or loss or, in the case of allopolyploidy, may be due to preexisting gene content differences among the diploid progenitor species. Genes in the anthocyanin pathway with other unique duplication patterns include CHS, CHI, OMT, and UFGT. The gene CHS, involved in the conversion of 4-coumaryl-CoA to naringenin chalcone, has two copies, and both have tandem duplicates in at least three of the homoeologous chromosomes. Interestingly, the gene CHI has a single preserved tandem gene duplicate on only one of the homoeologous chromosomes. However, additional copies of CHI were also identified more distantly away from the syntenic ortholog on another homoeologous chromosome, likely involving a transposition event following tandem duplication.
The OMT and UFGT genes all have tandem duplicates on all of the homoeologous chromosomes, although with varying array sizes, while the ANR gene involved in the conversion of anthocyanidin to proanthocyanidin is single copy on all homoeologous chromosomes. DFR gene, which is involved inthe conversion of dihydroquercetin/dihyromyricetin to leucoanthocyanidin, has a single tandem duplicate on only one of the homoeologous chromosomes. These findings suggest that there may have been greater selective pressure to retain tandem duplicates for genes encoding enzymes involved in anthocyanin production than conversion to proanthocyanidins. The vast majority of tandem duplicates are eventually lost ; however, in rare instances, some may undergo functional diversification. Gene expression analysis revealed that 83.4% of the tandem duplicates were expressed in at least one transcriptome library with 73.5% expressed in at least one of the fruit developmental stages. This suggests that a subset of these duplicate genes have non-functionalized, subfunctionalized, or neofunctionalized. Future studies are needed to more thoroughly investigate the functions of these genes with more diverse libraries and additional transcriptome analyses.Despite the economic importance of blueberry, molecular breeding approaches to produce superior cultivars have been greatly hampered by inadequate genomic resources and a limited understanding of the underlying genetics encoding important traits.
This has resulted in breeders having to solely rely on traditional approaches to generate new cultivars, each with widely varying fruit quality characteristics. For example, our analysis of a diversity panel consisting of 84 cultivars and wild species revealed that ”Draper” has antioxidant levels that are up to 19x higher than other cultivars. Thus, the genome of ”Draper” should serve as a powerful resource to the blueberry community for guiding future breeding efforts aimed at improving antioxidant levels among other important fruit quality traits. Furthermore, to our knowledge, this is not only the first genome assembly of the cultivated highbush blueberry but is also the first chromosome-scale and haplotype-phased genome for any species in the order Ericales. Ericales includes several other high-value crops and wild species with unique life history traits . Thus, we anticipate that this reference genome, plus associated datasets, will be useful for a wide variety of evolutionary studies. Here, we also leveraged the genome to identify candidate genes and pathways that encode superior fruit quality in blueberry, including those associated with pigmentation, sugar, and antioxidant levels. Furthermore, we found that genes encoding key biosynthetic steps in various antioxidant pathways are enriched with tandem gene duplicates. For example, tandem gene duplications have expanded gene families that are involved in the biosynthesis of anthocyanins. This suggests that, in addition to a recent whole genome duplication, tandem duplications may have greatly contributed to the metabolic diversity observed in blueberry . These tandem duplicates may have evolved new functions , possibly involved in the biosynthesis of novel compounds, and/or were selected to improve the metabolic flux of specific biosynthetic steps that alter the dosage of certain endpoint metabolites. Future studies are needed to further investigate the possible role of tandem duplications in having modified metabolite levels and composition in wild and cultivated blueberry. Our analyses also revealed that highbush blueberry, a tetraploid, likely arose from the hybridization of two distinct parents, possibly allopolyploidy, based on the sequence divergence, unique transposable element insertions, and subgenome expression patterns. Our analyses revealed that the subgenomes in highbush blueberry may be controlling a distinct set of genetic programs . The dominantly expressed subgenome in most surveyed tissues becomes the lowest expressed during fruit development. This observation is similar to findings in allopolyploid wheat where developmental and adaptive traits were shown to be controlled by different subgenomes. For example, cell type- and stage dependent subgenome expression dominance was observed in the developing wheat grain. We argue that both highbush blueberry and hexaploid wheat, each now with high-quality reference genomes, make excellent systems to further investigate these underlying mechanisms of subgenome dominance. Subgenome dominance has far-reaching implications to numerous research areas including breeding efforts. For example, marker-assisted breeding needs to target the correct set of dominant homoeologs given the trait in polyploids that exhibit subgenome dominance. Thus, we anticipate that this genome, combined with improved insights into subgenome dominance, dutch bucket for tomatoes will greatly accelerate molecular breeding efforts in the cultivated highbush blueberry.The genome of ”Draper” was assembled using the DeNovoMAGIC software platform , which is a de Bruijn graph-based assembler designed for higher polyploid, heterozygous, and/or repetitive genomes.
The Chromium 10X data were utilized to phase, elongate, and validate haplotype scaffolds. Four Dovetail Hi-C libraries were prepared as described previously and sequenced on an Illumina HiSeq X system with paired-end 150 bp reads to a total of 90.7X physical coverage of the genome . The de novo genome assembly, raw genomic reads, and Dovetail Hi-C library reads were used as input data for HiRise, a software pipeline designed specifically for using proximity ligation data to scaffold genome assemblies. Illumina genomic and Dovetail Hi-C library sequences were aligned to the draft input assembly using a modified SNAP read mapper. The separations of Dovetail Hi-C read pairs mapped within draft scaffolds were analyzed by HiRise to produce a likelihood model for genomic distance between read pairs, and the model was used to identify and break putative misjoins and to make joins to close gaps between contigs.Plant tissue samples were collected from blueberry cv. Draper grown in the growth chamber . For the fruit developmental series, three biological replicates each of berries at seven developmental stages were collected from cv. Draper in a field at the Horticulture Teaching and Research Center, Michigan State University, in July 2017. All plant tissues were immediately flash frozen in liquid nitrogen, and total RNA isolation was performed using the KingFisher Pure RNA Plant kit . Isolated total RNA was quantified using a Qubit 3 fluorometer . All samples were submitted to the Michigan State University Research Technology Support Facility Genomics core and sequenced with paired-end 150 bp reads on an Illumina HiSeq 4000 system .The draft genome of V. corymbosum cv. Draper was annotated using the MAKER annotation pipeline. Transcript and protein evidence used in the annotation included protein sequences downloaded from A. thaliana and UniprotKB plant databases, V. corymbosum ESTs from NCBI, and transciptome data assembled with StringTie from different blueberry tissues . A custom repeat library and Repbase were used to mask repetitive regions in the genome using Repeatmasker. Ab initio gene prediction was performed using gene predictors SNAP and Augustus. The resulting MAKER Max gene set was filtered to select gene models with Pfam domain and annotation edit distance <1.0. The filtered gene set was further scanned for transposase coding regions. The amino acid sequence of predicted genes was searched against a transposase database. The alignment between the genes and the transposases was further filtered for those caused by the presence of sequences with low complexity. The total length of genes matching transposases was calculated based on the output from the search. If more than 30% of gene length aligned to the transposases, the gene was removed from the gene set. Furthermore, to assess the completeness of annotation, the V. corymbosum Maker standard gene set was searched against the BUSCO v.3 plant dataset . Genes were annotated with pfam domains using InterProScan v5.26–65.0.To identify and classify repetitive elements in the genome, LTR retrotransposon candidates were searched using LTRharvest and LTR finder and further identified and classified using LTR retriever . A nonredundant LTR library was also produced by LTR retriever. Miniature inverted transposable elements were identified using MITE-Hunter. MITEs were manually checked for target site duplications and terminal inverted repeats and classified into superfamilies . Those with ambiguous Target Site Duplication and Terminal Inverted Repeats were classified as ”unknowns.” Using the MITE and LTR libraries, the V. corymbosum genome was masked using Repeatmasker. The masked genome was further mined for repetitive elements using Repeatmodeler. The repeats were then categorized into two groups: sequences with and without identities. Those without identities were searched against the transposase database; if they had a match, they were considered a transposon. The repeats were then filtered to exclude gene fragments using ProtExcluder and summarized using the ‘fam coverage.pl’ script in the LTR retriever package. The assembly continuity of repeat space was assessed using the LLAI deployed in the LTR retriever package. LAI was calculated based on either 3 Mb sliding windows or the whole assembly using LAI = /Total LTR-RT length. For the sliding window estimation, a step of 300 Kb was used . To account for dynamics of LTR retrotransposons, LAI was adjusted by the mean identity of LTR sequences in the genome based on all-versus-all blastn search, which was also performed by the LAI program.Illumina adapters were removed from the raw reads using Trimmomatic/0.33, and trimmed reads were filtered using FASTX Toolkit. After quality assessment using FastQC , the filtered reads were then aligned to the V. corymbosum genome using STAR. For the samples that were used for annotation, transcript assembly was performed de novo using StringTie.