Our results indicate that the evolutionary trajectories of both the core and the pan-genomes allow for a bacterial species with an extensive host range to specialize many times over a broad array of plant hosts. We see this system as an example of one that “leaps”, with host genera seemingly changing not via phylogenetic signal to related plant hosts, but switching across large regions of plant host phylogenies . Prior to this study, we have not been able to trace a pattern of underlying genetic origins of host specificity in X. fastidiosa. In this way, our study shows that the phylogeny and gene gain/loss are connected to the adaptations that diversify host specificity in X. fastidiosa. Phylogenies for MLST, pan-genome, core-genome, and non-recombinant core-genome data were topologically similar, but not identical. While the subspecies relationships are not important to predicting host range, they are frequently used in management decisions and our ability to converse about outbreaks, so we are including our findings alongside our data on host use. In terms of taxonomic subspecies, there are differences between the four trees in whether the two debated subspecies, X. fastidiosa subsp. morus and subsp. sandyi, are contained within subsp. fastidiosa or subsp. multiplex, or if they should be considered their own subspecies. While there are pairs of strains that are consistently close to each other like the morus strains MulMD and Mul0034, the uncertainty in their position from phylogeny to phylogeny likely reflects large gaps in diversity that we have not yet sequenced or horizontal gene transfer more intensely affecting the pan-genome and particular genes used for MLST than the core genes, plastic plant pot leading to issues recreating the vertical descent we aim for in a phylogeny . The subspecies morus has been documented to have up to 15.30% of its core genome undergoing intersub species homologous recombination, which could account for its uncertain placement in the four phylogenies .
The two strains that have been described as sandyi-like, CO33 and CFBP8356, both clustered within subspecies fastidiosa, not with the other potential sandyi strains Ann1 and RAAR8_XF70, supporting previous work showing that there is not a strong distinction between subspecies sandyi and fastidiosa . The core genome tree also has very low bootstrap support for subsp. pauca, which is the most diverse and oldest of the three main subspecies that could be potentially due to conflicting histories between horizontal and vertical descent, or alternatively reflect that this group is simply not well supported as one subspecies . In terms of the poor resolution in the OQDS clade, an analysis has recently been conducted to increase resolution within these strains . Given the diversity of subspecies pauca, the Hib4 strain, the outgroup of the subspecies, could be a potentially inTheresting strain in terms of both function and evolutionary history . It is difficult to know which phylogenies are more accurate than others, however we assume that the core genome is the most accurate at depicting the descent of this bacterial species and the topology should be robust to even high levels of recombination . While the non-recombinant core-genome might reduce some issues with horizontal gene transfer, the lack of resolution because of too many identical sequences makes it difficult to use. While more data are not intrinsically better, there are known issues with the MLST genes used for X. fastidiosa phylogenetics and having a larger set of unbiased homologous regions should be able to lend data to support nodes that are difficult to differentiate using the smaller MLST dataset . Using the core genome phylogeny, the most likely ancestral host was inferred from the phylogeny. These results show us that the phylogenetic history of X. fastidiosa is significantly correlated with the agricultural plant host that the strains were isolated from. While the coregenome phylogeny depicts mainly vertical descent within this bacterial species, the pan-genome phylogeny likely combines vertical descent with horizontal gene transfer. This is due to the pangenome’s inclusion of the accessory genome, which are genes not shared by all members of the group .
Based on this, we speculate that there is both adaptation and convergence depicted in these results. Potentially, both convergent horizontal descent via gene gain and loss as well as vertical descent in the core leads to our modern distribution of traits. While the ancestral state reconstruction did not show a classic host-parasite story of cospeciating or phylogenetically conserved host specificity, the phylogeny and gene presence/absence are predictive of the hosts from which the strains were isolated from, and thus hypothetically, host specificity as well. While the four ancestral state reconstructions do not show identical histories, they all infer high likelihood of ancestral hosts at many key branch points of the three subspecies. The pan and core-genome reconstructions predict the genus Vaccinium as the most likely ancestral host of the subspecies multiplex, which supports the overall reliability of the reconstruction as blueberry, like subsp. multiplex, is native to the eastern North America . Subsp. pauca, subsp. multiplex, and subsp. fastidiosa all exhibit host shifts from another genus to Prunus, suggesting potential for increased vulnerability in this genus toinfection from varied alternative hosts. All four reconstructions also support the genus Coffea as the most likely ancestor of the introduced subsp. fastidiosa strains from Central American to California. This supports a previous hypothesis made by Nunney et al. wherein coffee plants that were imported from Central America to southern California in the mid 1800s might have brought X. fastidiosa subsp. fastidiosa along with them. Given the potential role of imported Coffea in devastating global outbreaks of disease caused by X. fastidiosa , it should be much more carefully monitored or restricted in global trade. Given the current policy emphasis on eradication, trade restrictions, it is vital to identify genera such as Coffea that are especially relevant to global outbreaks and that should be monitored carefully. The relationship between X. fastidiosa and Coffea should be further explored as a model host to aid our understanding of the molecular mechanisms of this complex interaction. A potential alternative hypothesis for these nodes could also be that Coffea and Vaccinium are permissive hosts. From a parsimony perspective, they could be akin to ‘universal hosts’ so that it takes very little change for X. fastidiosa strains to switch to Coffea or Vaccinium from other infected plants. This could be investigated by further interrogating the genes shown to be uniquely absent in Coffea-infecting strains. Phylogenetically, this would reflect deep homology in which the underlying genetic framework of the pathogens make it easy to shift from other plant hosts to Coffea or Vaccinium . The two plant genera with genes significantly correlated with them, Vitis and Coffea, had 179 and 20 whole genome sequences from diverse sampling regions. The larger clades of Proteales, Asterid, and Rosid were also used to look for convergent gene presence absence and again the two groups with the majority of samples, Asterid and Rosid had genes correlated with them, while Proteales did not. Unfortunately, out of these 23 genes, 20 are hypothetical proteins, the ones with known functions could have very interesting implications for host range. fitB_1 has been known to be involved in in-host migration and metal binding, similar genes are also frequently gained and lost in other Xanthomondaceae and are hypothesized to affect both gene regulation and resistance mechanisms . vhbT is an interbacterial effector protein, facilitating bacterial conjugation, another process with potential for large genomic and functional changes . Another significant gene contains a helix-turn-helix region, a DNA binding-domain that has been found to control metal resistance bacteria generally and biofilm growth in X. fastidiosa specifically . These genes should be explored further through fitness tests with the presence and absence of these non-essential accessory genes in multiple host environments to further evaluate if their presence and absence is adaptive or due to drift. Future research pertaining to host range should focus on both convergent gene gain and loss as well as the adaptive vertically descended genetics underlying host range. As both genomic assays have identified the pan-genome to be linked to host association, it would be beneficial to our understanding of host specificity to pursue this further. This study has identified a group of candidate genes associated with particular hosts, nursery pots and they can be tested in the lab to determine if they are significantly linked to fitness in their particular hosts.
The study has also identified Coffea as an especially relevant host in global plant trade in terms of spreading infection across borders and oceans. Using these data, we can start identifying patterns of likely host shifts that can help make decisions on when eradication and quarantine is necessary based on the historical likelihood of host shifts. However, we should also carry out further whole genome sequencing of strains outside of the classic agricultural settings. To truly understand abiological system, we not only need to understand the relevant biological components but also how they interact both inside and outside of agricultural landscapes.The first step in creating all phylogenies was building a nucleotide or gene alignment of the genomic regions of interest. Four alignments were created all using the same set of taxa : a core genome alignment, non-recombinant core alignment, a multilocus sequence type alignment, and a pan-genome alignment. The core genome was built with Roary to identify nucleotide regions shared by at least 99% of all taxa. We ran Roary with the parameters -s -ap to cluster paralogs and allow them in the core genome. The non-recombinant core alignment was based on the core genome, but recombinant sites identified with ClonalFrameML were removed from the alignment using an in-house R script . The MLST alignment was based on a nucleotide alignment of the 7 MLST housekeeping genes commonly used for X. fastidiosa with reference sequences acquired from the X. fastidiosa MLST database . We then searched each MLST reference sequence against all whole genomes using the Basic Local Alignment Search Tool at an E value of 10-3 in BLAST +, with a database created for each whole genome . We concatenated all MLST gene sequences for individual taxa and aligned them to all other taxa using MAFFT v. 7 . The pan-genome alignment was made using Roary’s gene presence-absence output by constructing a matrix of all genes as characters with binary presence or absence of that gene in a strain as the character state. As each character represented a known genetic region and there were no gaps in this matrix, no additional alignment algorithm was used. In total, this alignment contained 17,024 characters, representing the 17,024 total genes that make up the pan-genome of Xylella spp. sequences. The outgroup used for all trees was Xylella taiwanensis strain Wufong1 isolated in Taiwan in 2014 from Pyrus pyrifolia . We constructed four maximum likelihood phylogenies using RAxML v8.2.11 under a generalized time reversible model. Node support was measured with 1,000 bootstrap replicates . Trees were visualized in FigTree v1.4.4 and the Interactive Tree of Life . Phylogenetic diversity was calculated as the summation of total branch lengths for each phylogeny using the R package adephylo .To conduct ancestral state reconstructions, we used an extant distribution of characters , in this case, the genera of plants from which we isolated the bacteria. Using that distribution, we constructed the most likely history of hosts across the phylogeny at all internal nodes. We are in essence seeking parameter values that maximize the probability of the data given the hypothesis . Based on available data, the identity of the host plant from which each strain was isolated in the field is identifiable to at least the genus level. This value is used as a point proxy for the true state of inTherest – potential host range. Since host range must be experimentally determined, in this study we use the host each strain was isolated from as a point representative of an unknown range of susceptibility. Due to this, any subsequent results cannot infer specificity to a given host but imply the ability to infect said host. Because sampling is heavily biased towards symptomatic agricultural crops in the case of X. fastidiosa, we interpret each ancestral state as the most likely agriculturally relevant host that the pathogen would have been isolated from. All taxa were coded based on plant host genus and super order/order . This included two super orders , one order and 26 genera that were potential hosts for X. fastidiosa’s hypothetical ancestors at each internal node of the phylogenies.