Airborne dust was ever-present while we sampled, and despite extensive containment efforts, some exposure of the apparatus to dust occurred . However, the rest of the DOM metagenome is remarkably distinct from the other samples, suggesting that, with the exception of the known contaminants, it is still representative of the deep groundwater. We eliminated all suspected contaminant genomes from further analysis.We assembled genomic bins using the following pipeline: IDBA_UD initial assembly, REAPR breaking of misassemblies, VizBin binning of contigs into draft genomes using 5-mer frequency and coverage information for visual aid, additional manual bin cleanup based on contig coverage distribution , taxonomic assignment of bins based on RAPSEARCH to the UniProt UniRef100 database. Selected bins were assembled further by PRICE targeted assembly and REAPR correction of misassemblies. Genome quality was assessed throughout using CheckM. Further details are in the methods. Overall we assembled and refined 79 unique genomic bins from the three groundwater samples . An additional 51 bins were made from the LAG sample and were used to determine probable contaminants in other samples, but no refinement of these bins was attempted, as we were interested in the properties of the groundwater communities. Contamination was only detected for the DOM sample as previously discussed. No evidence of overlap with the surface water was seen in either MW5 or MW6. The most abundant taxa in the partial genomic bins were from the CPR Parcubacteria , followed by the DPANN Woesearchaeota , and the CPR Microgenomates . While the OD1 and OP11 lineages have previously been found in association with anammox communities. The high relative abundance and diversity of CPR and DPANN genomes at over 50% of the community is notably higher than previous reports . The next highest abundance of our genomic bins were from the Planctomycete Brocadiaceae family . We made genomic bins of many of the other key members of the wastewater anammox community, including Omnitrophica , Nitrospirae , Chloroflexi , Chlorobi , Bacteroidales , Acidobacteria , and γ-Proteobacteria . Additionally, we assembled partial genomes from the Nitrospinae/Tectomicrobia group , Spirochaete , a Planctomycete from the OM190 family, an archaeon from the TACK radiation, a Firmicute that was dominant in the DOM well, a Bacteroidetes , an α-Proteobacterium, and a β- Proteobacterium. Finally, we assembled numerous phage bins.
Only one is included here ,hydroponic gutter but evidence of bacteriophage was abundant.While many potentially interesting and novel genomes were isolated from this community, we focus here on the Brocadiaceae Planctomycetes, which oxidize ammonium under anaerobicconditions in a specialized organelle called the anammoxosome that protects the cells from the toxic hydrazine intermediate products of the biochemical reaction. Anammox genomes are highly prevalent in the MW5 and MW6 samples and also occur in the DOM sample . We were able to make several near-complete assemblies of these genomes , however, the genome size of 2 Mb for many of these genomes was well below the 4 Mb seen in other members of this family. The coverage of some of these small genomes is rather high , making it seem plausible that the true genome size is reduced. In an attempt to resolve this discrepancy, we rebinned the MW5 anammox genomes using less stringent criteria, and found increased but still incomplete coverage of the Brocadiaceae reference genomes . These composite genomes were multi-strain chimeras, as indicated by conserved single copy gene occurrences increasing above one . The fact that merging multiple strains of the same species did not give complete coverage of the single copy genes is in agreement with the hypothesis that the true genome size is small but further sampling would be needed to confirm the hypothesis. In any event, we have not been able to resolve the discrepancy in genome size in the present study. The phylogenetic placement is apparent by homology with the Brocadia sinica and Jettenia caeni reference genomes . The separation of genomic bins shown by pentanucleotide clustering suggests multiple Brocadia-like genomes coexist in MW5, MW6 and DOM. A 16S phylogeny supports this observation . We refer to these genomic bins herein as MW5-59_1, MW5-59_2, MW6-02, MW6-03, MW6-13, DOM-02, DOM-03, and DOM-40 . We also assembled an 8.3 Mb Planctomycete genome with similarity to the Planctomycetaceae family within the Planctomycetes . The larger genome size indicates the genome is not of the Brocadiaceae, which have genome sizes around 4 Mb. Whole genome sequence comparison of MW6-09 to the available reference Planctomycetes showed highest similarity to Singulisphaera acidiphilus , however, the similarity even to Singulisphaera was not especially high, indicating that this genome is truly diverged from the reference genomes. Examining the 16S alignment suggests the genome could be from the OM190 group of Planctomycetaceae , a group with no sequenced genomes . We caution, however, that while we could link the EMIRGE-assembled OM190 16S gene with theMW6-09 genome using targeted assembly , multiple 16S fragments could be linked to the genome, thus our placement MW6-09 as an OM190 Planctomycetaceae should be revisited when new, related genomes are discovered.
To determine whether the anammox strains were unique to their respective bins or overlapping, we used VizBin to perform additional kmer distribution-based clustering of all the Planctomycete contigs together. Six distinct clusters are apparent , with MW5-59_1, MW5- 59_2, and MW6-13 overlapping. We next refined the genomic bins by combining the anammox genomes from MW5, MW6, and DOM and repicking chimeric bins. We then performed a single round of assembly using PRICE in order to merge the contigs. Overall, little improvement in bins was made. However, inter-strain contamination was reduced, and the DOM-02 bin was substantially improved by adding 20% more contigs from MW6 that coclustered . The 16S phylogengy indicates the bins come from three distinct lineages . The abundant MW5 and MW6 bins come from a new lineage that is intermediate between Jettenia and Kuenenia. The abundant DOM bins and one of the MW6 bins come from two different lineages within the Brocadiaceae W4 group. As there are no sequenced members of these lineages to use as reference, we aligned our bins to the closest available reference draft genomes of Brocadia, Jettenia, Kuenenia and Scalindua species . Reflecting the 16S phylogeny, the best homology was to Jettenia for the abundant MW5 and MW6 bins, and lower homology was seen for the DOM bins and MW6-02. As previous reports have noted low diversity of anammox genomes within a given sample , we find it noteworthy that as many as three distinct anammox genomes coexist within a single groundwater well. To confirm that all of these genomes were true anammox metabolizers, we checked for hydrazine conversion genes by BLASTX. Confirming that they are indeed anammox organisms, all Brocadiaceae genomes showed good coverage of the hydrazine database, and MW6-09, which is phylogenetically places as a nonanammox Planctomycete, did not have BLASTX hits.We analyzed the biochemical potential of the genomic bins in two ways focusing on pathways and modules rather than on individual proteins . This analysis was based on taxonomic placement rather than on the well of origin. First, we mapped the contigs from each draft assembly to the database of KEGG orthologs and used KEGG Mapper to visualize the results. Second, we used antiSMASH to detect potential secondary metabolite bio-synthetic gene clusters. The results reveal variation between genomic bins as well as pathways for potential community interactions linking nitrogen and sulfur metabolic pathways in the groundwater.To determine what functional genes are present in the water microbiota, we first aligned all of our contigs in each of the dairy water samples to the KEGG prokaryote database and evaluated trends at the whole metagenome level. Overall, we see enrichment of phosphotransferase systems, two-component systems, ABC transporters, and terpenoid production. The PTS systems are particularly high in the nutrient poor DOM sample,hydroponic nft channel consistent with the idea that there is a selective pressure driving acquisition of nutrients in nutrient-poor environments. However, no clear signature of different modes of nitrogen metabolism is indicated when examining the aggregated data for each sample. Thus, we examined the individual genomic bins to get a broad understanding of their biochemical potential. We focused on the KEGG pathways for nitrogen metabolism, sulfur metabolism, flagellar assembly, chemotaxis, ABC transporters, two-component systems, terpenoid synthesis, ATPase family, secretion systems, cofactor F420 , and B12 production since these pathways showed the most variability across the genomic bins. We include nucleotide synthesis as a positive control, since all of the complete bins have good coverage of the nucleotide metabolism pathways.
Because many of the genomic bins were partially incomplete, we aggregated KEGG maps from related species in order to get a more coherent picture of the pathway representation as a function of phylogeny . Overall, we see sparse coverage of nitrogen metabolism by the CPR, and DPANN genomes, while Methylomirabilis, Omnitropica, Nitrospira, Brocadia, and Nitrospinae had high coverage. The Bacteroidales also had sparse coverage of nitrogen metabolism, and the Chlorobi had intermediate coverage of the pathway, indicating that not all genomes in the community are directly involved in nitrogen metabolism. The same pattern was true of sulfur metabolism, with the exception that OP11 has the module for assimilatory sulfate reduction, which is consistent with the work of Canfield showing that sulfur and nitrogen redox pathways are coupled in the oxygen minimum zone of the oceans. Methane metabolism, indicated by presence of the coenzyme F420, was present in one DPANN bin, one OD1 bin, Methylomirabilis, and Nitrospinae. Intermediate coverage of this module was seen for Chlorobi, supporting the observations of Speth et al that species have diverse and overlapping niches within the anammox community and Shen et al that methane oxidation co-occurs with anammox. For oxidative phosphorylation, distinct ATPases were seen between the phyla. OP11, OD1, Methylomirabilis, Omnitrophica, Chlorobi, and Nitrospinae have the F-type ATPase, while DPANN, has the A-type ATPase, and Nitrospira, Brocadiaceae, and Bacteroidales have both F-type and A-type ATPases.In terms of acquiring nutrients from the environment, the CPR genomes were deficient in ABC transporters besides phosphate. The DPANN have slightly more, but the rest of the genomes each have significant coverage of 15 ABC transporters each. Coverage of two-component systems was consistent across all genomes for phosphate, while either nitrogen or nitrate was present for all except OP11. Twitching motility was indicated for OP11 as well as for OD1, and Chlorobi. High coverage of the chemotaxis pathway was seen only in the Nitrospira, Brocadiaceae, and Nitrospinae, with moderate coverage seen in the DPANN, OP11, OD1, Chlorobi, and Bacteroidales. Methylomirabilis and Omnitrophica appear to lack pathways for both chemotaxis and flagellar assembly, whereas Nitrospira, Brocadiaceae, and Nitrospinae have the complete pathways, and Chlorobi and Bacteroidales have most of the chemotaxis pathway but Chlorobi has the complete flagellar pathway and Bacteroidales lack it entirely. In terms of biosynthetic capabilities, Omnitrophica, Nitrospira, the Brocadiaceae, and Nitrospinae have complete vitamin B12 pathways, while Chlorobi and Bacteroidales have the latter half of the pathway as does one OP11 genome, while the other OP11 genomes as well as the OD1 genomes lack B12 production entirely. Methylomirabilis has sparse coverage of the pathway, and the DPANN genomes have genes for converting B12 to the active form. These results indicate that B12 sharing is likely an active part of the anammox community metabolism.Methylomirabilis, Omnitrophica, Nitrospira, Brocadiaceae, Bacteroidales, and Nitrospinae all have only the non-mevalonate pathway, whereas Chlorobi has both pathways, and DPANN use either one pathway or the other but not both. Within the CPR, OP11 has the mevalonate pathway, and OD1 has neither pathway. Wide variation was seen in the secretion systems, with the DPANN, OP11, and OD1 using only the SecSRP system, Bacteroidales using the SecSRP and Tat systems, Methylomirabilis and Omnitrophica using the SecSRP, Tat, and Type II systems, and Nitrospira using the SecSRP, Tat, Type I and II systems. The Brocadiaceae and Chlorobi use the SecSRP, Tat, Type II, IV and VI systems. And finally, Nitrospinae uses the the SecSRP, Tat, Type I, II, IV and VI systems. Comparison of KEGG pathway coverages with the available reference genomes showed similar results, supporting the hypothesis of niche specialization in a shared community metabolism.