Collections from Italy were made from January to April 2015 in multiple locations

Among 1605 phylogenies analyzed , the lowest Smap values were for highly conserved CDSs such as ribosomal and cell division proteins. On the other hand, the highest Smap values found were ~0.5 belonging to CDSs encoding TonB-dependent receptor and the hypothetical protein PD0014 . Only 9 orthologous CDSs previously identified or predicted to be virulence and pathogenicity factors were among the 100 CDSs with Smap values greater than 0.44 with confidence >90% . These 9 CDSs include two related to adhesion , two related to polysaccharide hydrolysis , two related to polysaccharide synthesis and three that encode, respectively, quorum sensing response regulator , multidrug efflux pump and lipase/esterase . However, we reasoned that these medium Smap scores do not provide strong support to consider these CDSs as candidates to host specificity determinants.The enrichment of accessory genome with mobilome-associated CDSs prompted us to explore the full set of MGEs in X. fastidiosa strains. Using a combination of prediction tools, we identified a comprehensive set of sequences related to the MGEs in the 94 genome assemblies analyzed here. The content of MGEs varies considerably among the strains, ranging from 3.8% to 27.76% of the genome, with a mean value of 13.92% ± 5.77%. Among the strains with the higher MGE content are Dixon, U24D, 3124, Ann-1, MUL0034 and 9a5c . It is important to note that the strains whose genome assemblies are in contigs showed the lower percentages of MGE content than the strains with complete genomes, blueberry containers possibly due to a reduced efficiency of the programs to predict MGEs in fragmented genomes.

Overcoming this limitation will have to wait for the availability of complete versions of these genomes which, in most cases, requires resequencing with long-read technologies. X. fastidiosa genome assemblies harbor 11.6 ± 2.71 prophage-related regions. Among the complete genomes, the strains RH1 and LM10 of subspecies multiplex have the greatest number of prophage regions while those with the least prophage regions are the subspecies pauca strains Pr8x, Salento-2, De Donno . We found 5 intact, 2 incomplete, 1 questionable and 3 remnant prophages in 9a5c strain , and 4 intact, 5 incomplete, 3 questionable and 1 remnant prophages in Temecula1 strain . The genomes of X. fastidiosa also harbor on average 6.47 ± 2.57 genomic island regions. The strains U24D and 9a5c have the greatest number of genomic islands while the strains IVIA5235 and Bakersfield-11 have only 5 regions each. We found on average 6 ± 1.53 insertion sequences within certain prophages, genomic islands, chromosomes, or, occasionally, in plasmids.We performed a screening of the known immunity systems in X. fastidiosa to explore the strategies used by this bacterium to deal with their numerous MGEs . The screening of 94 X. fastidiosa genome assemblies detected only CDSs belonging to Restriction Modification , Toxin-Antitoxin , Cyclic-oligonucleotide-based antiphage signaling systems , Gabija and Wadjet systems. For each detected system, the CDS neighborhood was evaluated. The prediction of R-M systems showed that all strains possess at least one of the three main R-M system types previously reported for 9a5c and Temecula1 strains. The type II was usually found in multiple operons per genome, while the type III was observed in a single operon per genome. R-M type I and II were frequently found in all strains, and in most instances more than one subunit homolog was observed.

In contrast, R-M type III was mainly found among strains of subspecies pauca and fastidiosa. Curiously, the strains lacking R-M type III , have more homologs of the R-M type II subunit . The TA type II system was found mainly in the strains from the subspecies pauca from South America. This TA system is widely distributed among prokaryotes and has been confirmed to be involved in diverse biological processes including plasmid maintenance, phage inhibition, stress response, and others. The CBASS phage defense system is composed of an oligonucleotide cyclase, which generates signaling cyclic oligonucleotides in response to phage infection, and an effector that is activated by the cyclic oligonucleotides and promotes cell death. This system was found in strains from the subspecies pauca from Europe, and also in strains from the subspecies fastidiosa.The comparative analyses of 94 publicly available whole-genome sequence assemblies of X. fastidiosa strains revealed a pangenome comprising 4549 orthologous CDSs and a core genome of 954 CDSs . These values are somewhat different than previously reported because we have used different algorithms for genome annotation and clustering of orthologous CDSs as well as a larger number of genomes in the analyses. We found that the vast majority of the CDSs previously identified or predicted to be virulence and pathogenicity factors for X. fastidiosa belong either to the core or soft-core genomes. A core genome-scale phylogeny grouped the 94 X. fastidiosa strains in three major clades defined by strains from the subspecies fastidiosa , multiplex , and pauca consistent with previous k-mers based phylogeny of 72 X. fastidiosa strains as well as with phylogenetic reconstructions from 349 X. fastidiosa genomes. While several of the subclades sharing ST groups are congruent with country of origin of the strains, plant species from which strains were isolated are less congruent with these subclades. Although some strains isolated from Citrus, Olea, Vitis, and Morus group in separated subclades, other strains mainly isolated from Coffea, Prunus, and Nerium are distributed into the three distinct major clades .

It has been shown that citrus and coffee strains from subspecies pauca seem to be limited to their original hosts, despite crop proximity and the presence of insect vectors. In addition, there is experimental evidence of host specialization for certain X. fastidiosa strains. On the other hand, it is known that some strains can infect multiple hosts and that intersubspecific homologous recombination has been associated to X. fastidiosa adaptation to novel hosts. The factors that drive X. fastidiosa host-specificity or adaptation to new hosts have not been clearly elucidated despite recent evidence of a genetic basis to the host range of X. fastidiosa. Here we have explored the soft-core and core genomes for potential candidates related to this trait using comparative genomics, an approach that has been applied for some bacterial pathogens. Using a mapping metrics applied to phylogenetic trees for 1605 orthologous CDSs we found no CDS with Smap values that would provide strong support to point a CDS as candidate to host specificity determinant. The highest Smap values found were ~0.5, and among these CDSs only a few CDSs were related to virulence, including two related to adhesion , two related to polysaccharide hydrolysis , two related to polysaccharide synthesis and three encode, respectively, quorum sensing response regulator , multidrug efflux pump and lipase/esterase that present medium Smap scores. We call attention to CDS PD0815 related to LPS biosynthesis. It has been shown that O-antigen delays plant innate immune recognitionin grapevine and as such the heterogeneity of O-antigen composition may be related to X. fastidiosa host range. In summary, the approach we have used did not provide strong supporting evidence for CDSs that would contribute to X. fastidiosa host-specificity. It has been suggested that the X. fastidiosa pangenome is linked to host association and the presence/absence of a few genes in strains isolated specific plant genera have been correlated to host-specificity. However, at the present time some limitations for an experimental study of X. fastidiosa host-specificity should be considered such as prompt availability of sequenced isolates as well as the difficult genetic manipulation of some strains. Our comparative analyses revealed that the content of MGEs varies among X. fastidiosa strains and includes a considerable diversity of sequences related to prophages, GIs, ISs and plasmids with variable sizes . While several MGE sequences are conserved among X. fastidiosa strains some are unique MGEs, belonging to a single strain among the ones we analyzed here. The X. fastidiosa 94 genome assemblies harbor 11.6 ± 2.71 prophagerelated regions and 6.47 ± 2.57 genomic island regions.

A previous study reported 6 and 8 prophage-like elements respectively in genomes of 9a5c and Temecula1 strains, and a comparison of 72 X. fastidiosa genomes revealed an average of 9.5, 9.3 and 8.5 prophage regions, respectively, for strains from subsp. fastidiosa, multiplex and pauca. It remains to be investigated whether multiple prophage regions confer any fitness advantage to X. fastidiosa, as has been observed for Pseudomonas aeruginosa, best indoor plant pots where multiple prophage carriage seems to be beneficial during mixed bacterial infections. It is worth noting that inoviruses sequences are found in most of the analyzed strains and that they encode a Zot protein. Inoviruses have a relevant role in the structure in P. aeruginosa biofilm and have been reported to encode Zot in several Vibrio species. Zot protein seems to play a dual function as it is essential for inovirus morphogenesis and has also been reported to contribute for Vibrio cholerae pathogenesis. This toxin has been postulated as virulence factor for plant pathogens, including X. fastidiosa. Interestingly EB92-1, a proposed X. fastidiosa biocontrol strain, lacks both Zot homologous genes found in Temecula1 strain. Moreover, a X. fastidiosa Zot protein was shown to elicit cell death-like responses in the apoplast of some Nicotiana tabacum cultivars. Besides Zot, other prophage-encoded genes may play a role in the biology of X. fastidiosa as observed in other bacteria, where the so called “moron” loci have been related to virulence, stress resistance, phage resistance and host adaptation. More studies are necessary to understand the contribution of “moron” loci, such as Zot genes, as well as events of prophage induction to X. fastidiosa biology. There is experimental evidence X. fastidiosa releases phage particles but the impact of prophage induction in host colonization is unknown. To cope with the MGEs, bacteria have developed a diversity of immunity systems. The numerous immunity systems of some genomes protect the cell from a broad range of MGEs, and the MGEs themselves encode defense systems, which tend to be different across strains of a species. Although X. fastidiosa strains are devoid of most of these systems, R-M systems and one conserved cluster with genes of Gabija system were found widely distributed among the genome assemblies analyzed in this work. TA type II system and CBASS immunity systems were found only in some strains. It should be mentioned that the R-M systems have been reported to impact the stable acquisition of foreign plasmid DNA by X. fastidiosa. The low amount and diversity of immunity systems found in X. fastidiosa genomes, with the notable absence of important immune systems, especially CRISPR-Cas, gives a hint to understanding the high amount of MGEs found in this bacterial species. It seems that R-M, Gabija and CBASS systems are not enough to protect X. fastidiosa against phage acquisition. For instance, Temecula1, one of the most studied X. fastidiosa strains, carries 12 prophage regions, but only three immunity systems. This lower amount of immunity systems relative to high number of prophages differs from the positive correlation between the number of prophage and families of antiphage systems observed at species level. Therefore, we do not exclude the possibility that X. fastidiosa genomes might encode immunity systems yet to be discovered. The comprehensive comparative analyses of 94 whole-genome sequences from X. fastidiosa strains from diverse hosts and geographic regions contribute to a better understanding of the diversity of phylogenetically close genomes, explores candidates to host specificity determinants for this phytopathogen as well as greatly expands the knowledge of its mobile genetic elements content and of its immunity systems.The phylum Negarnaviricota, composed of viruses with negative-stranded RNA genome, includes species characterized by non-segmented or segmented genomes, the presence or absence of a membrane enveloping the capsid, and a diverse host range including plants and animals. Examples of nsRNA viruses associated with economically important diseases in plants are rose rosette virus , rice stripe virus, citrus psorosis virus, and blueberry mosaic associated virus. Historically, only a relatively small number of nsRNA viruses infecting plants as their primary host have been reported. Recently, however, more novel viruses infecting plants have been discovered around the world. In the last few years, the use of high throughput sequencing technology has allowed the identification and characterization of new nsRNA viruses in pistachio, citrus, watermelon , and apple. Interestingly, most of these novel nsRNA viruses were classified under the family Phenuiviridae . To date, there are fifteen recognized generaintegrating the family Phenuiviridae : Banyangvirus, Beidivirus, Goukovirus, Horwuvirus, Hudivirus, Hudovirus, Kabutovirus, Laulavirus, Mobuvirus, Phasivirus, Phlebovirus, Pidchovirus, Tenuivirus, Wenrivirus, and Wubeivirus. Except for members of the genus Tenuivirus that are plant-infecting viruses, the members of the other genera infect vertebrates, including humans, and arthropods.