Terpenoid synthesis showed clear segregation into the mevalonate and the MEP/DOXP pathways

Airborne dust was ever-present while we sampled, and despite extensive containment efforts, some exposure of the apparatus to dust occurred . However, the rest of the DOM metagenome is remarkably distinct from the other samples, suggesting that, with the exception of the known contaminants, it is still representative of the deep groundwater. We eliminated all suspected contaminant genomes from further analysis.We assembled genomic bins using the following pipeline: IDBA_UD initial assembly, REAPR breaking of misassemblies, VizBin binning of contigs into draft genomes using 5-mer frequency and coverage information for visual aid, additional manual bin cleanup based on contig coverage distribution , taxonomic assignment of bins based on RAPSEARCH to the UniProt UniRef100 database. Selected bins were assembled further by PRICE targeted assembly and REAPR correction of misassemblies. Genome quality was assessed throughout using CheckM. Further details are in the methods. Overall we assembled and refined 79 unique genomic bins from the three groundwater samples . An additional 51 bins were made from the LAG sample and were used to determine probable contaminants in other samples, but no refinement of these bins was attempted, as we were interested in the properties of the groundwater communities. Contamination was only detected for the DOM sample as previously discussed. No evidence of overlap with the surface water was seen in either MW5 or MW6. The most abundant taxa in the partial genomic bins were from the CPR Parcubacteria , followed by the DPANN Woesearchaeota , and the CPR Microgenomates . While the OD1 and OP11 lineages have previously been found in association with anammox communities. The high relative abundance and diversity of CPR and DPANN genomes at over 50% of the community is notably higher than previous reports . The next highest abundance of our genomic bins were from the Planctomycete Brocadiaceae family . We made genomic bins of many of the other key members of the wastewater anammox community, including Omnitrophica , Nitrospirae , Chloroflexi , Chlorobi , Bacteroidales , Acidobacteria , and γ-Proteobacteria . Additionally, we assembled partial genomes from the Nitrospinae/Tectomicrobia group , Spirochaete , a Planctomycete from the OM190 family, an archaeon from the TACK radiation, a Firmicute that was dominant in the DOM well, a Bacteroidetes , an α-Proteobacterium, and a β- Proteobacterium. Finally, we assembled numerous phage bins.

Only one is included here ,hydroponic gutter but evidence of bacteriophage was abundant.While many potentially interesting and novel genomes were isolated from this community, we focus here on the Brocadiaceae Planctomycetes, which oxidize ammonium under anaerobicconditions in a specialized organelle called the anammoxosome that protects the cells from the toxic hydrazine intermediate products of the biochemical reaction. Anammox genomes are highly prevalent in the MW5 and MW6 samples and also occur in the DOM sample . We were able to make several near-complete assemblies of these genomes , however, the genome size of 2 Mb for many of these genomes was well below the 4 Mb seen in other members of this family. The coverage of some of these small genomes is rather high , making it seem plausible that the true genome size is reduced. In an attempt to resolve this discrepancy, we rebinned the MW5 anammox genomes using less stringent criteria, and found increased but still incomplete coverage of the Brocadiaceae reference genomes . These composite genomes were multi-strain chimeras, as indicated by conserved single copy gene occurrences increasing above one . The fact that merging multiple strains of the same species did not give complete coverage of the single copy genes is in agreement with the hypothesis that the true genome size is small but further sampling would be needed to confirm the hypothesis. In any event, we have not been able to resolve the discrepancy in genome size in the present study. The phylogenetic placement is apparent by homology with the Brocadia sinica and Jettenia caeni reference genomes . The separation of genomic bins shown by pentanucleotide clustering suggests multiple Brocadia-like genomes coexist in MW5, MW6 and DOM. A 16S phylogeny supports this observation . We refer to these genomic bins herein as MW5-59_1, MW5-59_2, MW6-02, MW6-03, MW6-13, DOM-02, DOM-03, and DOM-40 . We also assembled an 8.3 Mb Planctomycete genome with similarity to the Planctomycetaceae family within the Planctomycetes . The larger genome size indicates the genome is not of the Brocadiaceae, which have genome sizes around 4 Mb. Whole genome sequence comparison of MW6-09 to the available reference Planctomycetes showed highest similarity to Singulisphaera acidiphilus , however, the similarity even to Singulisphaera was not especially high, indicating that this genome is truly diverged from the reference genomes. Examining the 16S alignment suggests the genome could be from the OM190 group of Planctomycetaceae , a group with no sequenced genomes . We caution, however, that while we could link the EMIRGE-assembled OM190 16S gene with theMW6-09 genome using targeted assembly , multiple 16S fragments could be linked to the genome, thus our placement MW6-09 as an OM190 Planctomycetaceae should be revisited when new, related genomes are discovered.

To determine whether the anammox strains were unique to their respective bins or overlapping, we used VizBin to perform additional kmer distribution-based clustering of all the Planctomycete contigs together. Six distinct clusters are apparent , with MW5-59_1, MW5- 59_2, and MW6-13 overlapping. We next refined the genomic bins by combining the anammox genomes from MW5, MW6, and DOM and repicking chimeric bins. We then performed a single round of assembly using PRICE in order to merge the contigs. Overall, little improvement in bins was made. However, inter-strain contamination was reduced, and the DOM-02 bin was substantially improved by adding 20% more contigs from MW6 that coclustered . The 16S phylogengy indicates the bins come from three distinct lineages . The abundant MW5 and MW6 bins come from a new lineage that is intermediate between Jettenia and Kuenenia. The abundant DOM bins and one of the MW6 bins come from two different lineages within the Brocadiaceae W4 group. As there are no sequenced members of these lineages to use as reference, we aligned our bins to the closest available reference draft genomes of Brocadia, Jettenia, Kuenenia and Scalindua species . Reflecting the 16S phylogeny, the best homology was to Jettenia for the abundant MW5 and MW6 bins, and lower homology was seen for the DOM bins and MW6-02. As previous reports have noted low diversity of anammox genomes within a given sample , we find it noteworthy that as many as three distinct anammox genomes coexist within a single groundwater well. To confirm that all of these genomes were true anammox metabolizers, we checked for hydrazine conversion genes by BLASTX. Confirming that they are indeed anammox organisms, all Brocadiaceae genomes showed good coverage of the hydrazine database, and MW6-09, which is phylogenetically places as a nonanammox Planctomycete, did not have BLASTX hits.We analyzed the biochemical potential of the genomic bins in two ways focusing on pathways and modules rather than on individual proteins . This analysis was based on taxonomic placement rather than on the well of origin. First, we mapped the contigs from each draft assembly to the database of KEGG orthologs and used KEGG Mapper to visualize the results. Second, we used antiSMASH to detect potential secondary metabolite bio-synthetic gene clusters. The results reveal variation between genomic bins as well as pathways for potential community interactions linking nitrogen and sulfur metabolic pathways in the groundwater.To determine what functional genes are present in the water microbiota, we first aligned all of our contigs in each of the dairy water samples to the KEGG prokaryote database and evaluated trends at the whole metagenome level. Overall, we see enrichment of phosphotransferase systems, two-component systems, ABC transporters, and terpenoid production. The PTS systems are particularly high in the nutrient poor DOM sample,hydroponic nft channel consistent with the idea that there is a selective pressure driving acquisition of nutrients in nutrient-poor environments. However, no clear signature of different modes of nitrogen metabolism is indicated when examining the aggregated data for each sample. Thus, we examined the individual genomic bins to get a broad understanding of their biochemical potential. We focused on the KEGG pathways for nitrogen metabolism, sulfur metabolism, flagellar assembly, chemotaxis, ABC transporters, two-component systems, terpenoid synthesis, ATPase family, secretion systems, cofactor F420 , and B12 production since these pathways showed the most variability across the genomic bins. We include nucleotide synthesis as a positive control, since all of the complete bins have good coverage of the nucleotide metabolism pathways.

Because many of the genomic bins were partially incomplete, we aggregated KEGG maps from related species in order to get a more coherent picture of the pathway representation as a function of phylogeny . Overall, we see sparse coverage of nitrogen metabolism by the CPR, and DPANN genomes, while Methylomirabilis, Omnitropica, Nitrospira, Brocadia, and Nitrospinae had high coverage. The Bacteroidales also had sparse coverage of nitrogen metabolism, and the Chlorobi had intermediate coverage of the pathway, indicating that not all genomes in the community are directly involved in nitrogen metabolism. The same pattern was true of sulfur metabolism, with the exception that OP11 has the module for assimilatory sulfate reduction, which is consistent with the work of Canfield showing that sulfur and nitrogen redox pathways are coupled in the oxygen minimum zone of the oceans. Methane metabolism, indicated by presence of the coenzyme F420, was present in one DPANN bin, one OD1 bin, Methylomirabilis, and Nitrospinae. Intermediate coverage of this module was seen for Chlorobi, supporting the observations of Speth et al that species have diverse and overlapping niches within the anammox community and Shen et al that methane oxidation co-occurs with anammox. For oxidative phosphorylation, distinct ATPases were seen between the phyla. OP11, OD1, Methylomirabilis, Omnitrophica, Chlorobi, and Nitrospinae have the F-type ATPase, while DPANN, has the A-type ATPase, and Nitrospira, Brocadiaceae, and Bacteroidales have both F-type and A-type ATPases.In terms of acquiring nutrients from the environment, the CPR genomes were deficient in ABC transporters besides phosphate. The DPANN have slightly more, but the rest of the genomes each have significant coverage of  15 ABC transporters each. Coverage of two-component systems was consistent across all genomes for phosphate, while either nitrogen or nitrate was present for all except OP11. Twitching motility was indicated for OP11 as well as for OD1, and Chlorobi. High coverage of the chemotaxis pathway was seen only in the Nitrospira, Brocadiaceae, and Nitrospinae, with moderate coverage seen in the DPANN, OP11, OD1, Chlorobi, and Bacteroidales. Methylomirabilis and Omnitrophica appear to lack pathways for both chemotaxis and flagellar assembly, whereas Nitrospira, Brocadiaceae, and Nitrospinae have the complete pathways, and Chlorobi and Bacteroidales have most of the chemotaxis pathway but Chlorobi has the complete flagellar pathway and Bacteroidales lack it entirely. In terms of biosynthetic capabilities, Omnitrophica, Nitrospira, the Brocadiaceae, and Nitrospinae have complete vitamin B12 pathways, while Chlorobi and Bacteroidales have the latter half of the pathway as does one OP11 genome, while the other OP11 genomes as well as the OD1 genomes lack B12 production entirely. Methylomirabilis has sparse coverage of the pathway, and the DPANN genomes have genes for converting B12 to the active form. These results indicate that B12 sharing is likely an active part of the anammox community metabolism.Methylomirabilis, Omnitrophica, Nitrospira, Brocadiaceae, Bacteroidales, and Nitrospinae all have only the non-mevalonate pathway, whereas Chlorobi has both pathways, and DPANN use either one pathway or the other but not both. Within the CPR, OP11 has the mevalonate pathway, and OD1 has neither pathway. Wide variation was seen in the secretion systems, with the DPANN, OP11, and OD1 using only the SecSRP system, Bacteroidales using the SecSRP and Tat systems, Methylomirabilis and Omnitrophica using the SecSRP, Tat, and Type II systems, and Nitrospira using the SecSRP, Tat, Type I and II systems. The Brocadiaceae and Chlorobi use the SecSRP, Tat, Type II, IV and VI systems. And finally, Nitrospinae uses the the SecSRP, Tat, Type I, II, IV and VI systems. Comparison of KEGG pathway coverages with the available reference genomes showed similar results, supporting the hypothesis of niche specialization in a shared community metabolism.

Environmental pressures will further limit the possibility for land expansions

The mean annual precipitation is below 250 mm in about 70% of the country and only 3% of Iran, i.e. 4.7 million ha, receives above 500 mm yr−1 precipitation . The geographical distribution of Iran’s croplands shows that the majority of Iran’s cropping activities take place in the west, northwest, and northern parts of the country where annual precipitation exceeds 250 mm . However, irrigated cropping is practiced in regions with precipitations as low as 200 mm year−1 , or even below 100 mm year−1 . To support agriculture, irrigated farming has been implemented unbridled, which has devastated the water scarcity problem. challenges: providing domestic food to a rapidly growing population on a thirsty land.When land suitability was evaluated solely based on the soil and topographic constraints , 120 million ha of land was found to have a poor or lower suitability ranks . Lands with a medium suitability cover 17.2 million ha whilst high-quality lands do not exceed 5.8 million ha . The spatial distribution of suitability classes shows that the vast majority of lands in the center, east and, southeast of Iran have a low potential for agriculture irrespective of water availability and other climate variables . As shown in Fig. 2, the potential agricultural productivity in these regions is mainly constrained by the low amount of organic carbon and high levels of sodium contents . Based on soil data, Iran’s soil is poor in organic matters with 67% of the land area estimated to have less than 1% OC. Saline soils, defined by FAO as soils with electrical conductivity >4 dS/m and pH<8.2, are common in 41 million ha of Iran. Although many plants are adversely affected by the saline soils , there are tolerant crops such as barley and sugar beet that can grow almost satisfactorily in soils with ECs as high as 20 dS/m,nft growing system which was used as the upper limit of EC in this analysis .

Although sodic soils are less abundant in Iran , soils that only have high ESP covers ~30 million ha . We used an ESP of 45% as the upper limit for cropping since tolerant crops such as sugar beet and olive can produce acceptable yield at such high ESP levels. As shown in Fig. 2, EC is not listed among the limiting factors, while we know soil salinity is a major issue for agriculture in Iran. This discrepancy can be explained by the higher prevalence of soils with ESP>45% compared to those with EC>20 dS/m, which can spatially mask saline soils. That is, the total area of soils with EC>20 dS/m was estimated to be about 6.4 million ha , while soils exceeding the ESP threshold of 45 constituted ~12 million ha i.e. almost double the size of saline soils. Iran’s high-quality lands for cropping are confined to a narrow strip along the Caspian Sea and few other provinces in the west and northwest . In the latter provinces, the main agricultural limitations are caused by high altitude and steep slopes as these regions intersect with the two major mountain ranges in the north and west .Thus far, the land suitability analysis was based on soil and terrain conditions only, reflecting the potential agricultural productivity of Iran’s without including additional limitations imposed by the water availability and climatic variables. However, Iran is located in one of the driest areas of the world where water scarcity is recognized as the main constraint for agricultural production. Based on aridity index , our analysis showed that 98% of Iran could be classified as hyper-arid, arid, or semi-arid . August and January are the driest and wettest months in Iran, respectively, as shown in Fig. 3. Over half of the country experiences hyper-arid climate conditions for five successive months starting from June . This temporal pattern of aridity index has dire consequences for summer grown crops as the amount of available water and/or the rate of water uptake by the crop may not meet the atmospheric evaporative demand during these months, giving rise to very low yields or total crop failure. It must be noted that the high ratio of precipitation to potential evapotranspiration in humid regions could also result from low temperature rather than high precipitation.

There is a high degree of overlap between regions that experience wetter conditions for most of the year and those identified as suitable for agriculture based on their soil and terrain conditions . This spatial overlap suggests that some of the land features relevant to cropping might be correlated with the climate parameters. In fact, soil organic carbon has been found to be positively correlated with precipitation in several studies. To incorporate climate variables into our land suitability analysis, we used monthly precipitation and PET as measures of both overall availability and temporal variability of water. We derived, from monthly precipitation and PET data, the length of the growing period across Iran . Growing period was defined as the number of consecutive months wherein precipitation exceeds half the PET. As shown in Fig. 3, areas where moisture conditions allow a prolonged growing period are predominately situated in the northern, northwestern, and western Iran with Gilan province exhibiting the longest growing period of 9 months. For over 50% of the lands in Iran, the length of the moist growing period is too short to support any cropping unless additional water is provided through irrigation . However, these areas, located in the central, eastern, and southeastern Iran, suffer from the shortage of surface and groundwater resources to support irrigated farming in a sustainable manner. Taking into account daily climate data and all types of locally available water resources can improve the accuracy of the length of growing period estimation. The productivity of rainfed farming is also affected by the selection of planting date, which often depends on the timing of the first effective rainfall events. For this joint soil-terrain-climate analysis, all regions with a growing season of two months or shorter were assigned a suitability value of zero and thus classified as unsuitable for agriculture. We then evaluated the capacity of land for rainfed farming by using a precipitation cut-of of 250 mm year−1 ,vertical hydroponic nft system which is often regarded as the minimum threshold for the rainfed farming . As shown in Table 1, the inclusion of the length of growing period and precipitation threshold into the analysis only slightly reduced the total area of high-quality lands from 5.8 to 5.4 million ha. This implies that most lands with suitable soil and terrain conditions also receive sufficient amount of moisture to sustain rainfed agriculture.

On the contrary, the area of unsuitable lands increased from 39.7 to 112.9 million ha when precipitation and duration of growing season thresholds were superimposed on the soil and topographic constraints. This increase in unsuitable acreage was mainly driven by the demotion of lands from the very poor class to the unsuitable class . The addition of moisture constraints also reduced the area of medium suitability lands by 4.8 million ha. In summary, for the rainfed farming suitability analysis, 125 million ha of Iran’s land might be classifed as poor or lower ranks whilst only 18 million ha meet the required conditions for the medium or higher suitability classes . Te geographical distribution of these land classes is mapped in Fig. 4. Almost the entire central Iran , and the vast majority of land area in the eastern , southeastern and southern provinces were found to be unsuitable for rainfed farming. Almost half the area of Khuzestan and three-quarters of Fars provinces were also characterized unsuitable. Over the entire east, only in the northern part of Khorasan Razavi province, is there a belt of marginally suitable lands satisfying the requirements of a potentially prosperous rainfed agriculture .In the next step of the analysis, the suitability of land was scaled with the annual precipitation over the range of 100 to 500 mm year−1 . The lower limit is deemed to exclude the desert areas for agricultural use whilst the upper limit represents a benign moisture environment for the growth of many crops . This last analysis, here after referred to as precipitation scaling method, makes no assumption as to whether the cropping practices rely on rainfall or irrigation to satisfy crop water requirement and may thus represent a more comprehensive approach for agricultural suitability assessment. The same minimum length of growing period and soil/topographic constraints as with the two previous methods were used in this analysis. Compared to the rainfed agriculture analysis, the precipitation scaling method mainly changed the distribution of lands within the lower suitability classes . For example, a great proportion of lands within the unsuitable class was shifted up to the very poor and poor classes. This implies that, to a limited extent, irrigation can compensate for the below threshold precipitation . Nevertheless, water availability cannot necessarily justify agriculture in areas with low soil and topographic suitability. This has an important implication for water management in Iran that has a proven record of strong desire for making water available to drier areas through groundwater pumping, water transfer, and dam construction. The majority of high-quality lands , which also retains sufficient levels of moisture are located in the western and northern provinces of Iran . Kermanshah province accommodates the largest area of such lands followed by Kurdistan .

High-quality lands were estimated to cover 33% and 21% of these two provinces, respectively. Other provinces with high percentages of high quality lands were Gilan , Mazandaran , West Azerbaijan , and Lorestan . For 17 provinces, however, high-quality lands covered less than 1% of their total area .To estimate the total area of croplands within each suitability class, we visually inspected 1.2 million ha of Iran’s land by randomly sampling images from Google Earth . The proportion of land used for cropping increased almost linearly with the suitability values obtained from the precipitation scaling method . Total cropping area in Iran was estimated to be about 24.6 million ha, which is greater than the reported value by the Iran’s Ministry of Agriculture. This authority reports the harvested area; hence, the fallow or abandoned lands are not included in their calculation of active agricultural area. Our visual method, however, captures all lands that are currently under cultivation or had been used for cropping in the near past that are now in fallow or set-aside . The relative distribution of croplands amongst the suitability classes shows that about 52% of the croplands in Iran are located in areas with poor suitability or lower ranks as identified by the precipitation scaling method. Particularly concerning are the 4.2 million ha of lands that fall within the unsuitable class. Approximately 3.4 million ha of cropping areas occur in good and very good lands . However, no agricultural expansion can be practiced in these areas as all available lands in these suitability classes have already been fully exploited. Medium quality lands comprise 12.8 million ha of Iran’s land surface area , of which about 8.6 million ha have been already allocated to agriculture . Nevertheless, due to their sparse spatial distribution and lack of proper access, only a small portion of the unused lands with medium suitability can be practically deployed for agriculture. Using FAO’s spatial data on rainfed wheat yield in Iran, we estimated the mean yield for wheat cropping areas located within each of the six suitability classes. As shown in Fig. 7, the yield of the rainfed wheat increased proportionally with improving suitability index, showing that our suitability index adequately translates to crop yield. Using the observed yield-suitability relationship , we estimated that 0.8 million ton of wheat grain might be produced per year by allocating 1 million ha of the unused lands from the medium suitability class to rainfed wheat cropping.Whilst the insufficiency of water resources has long been realized as a major impediment to developing a productive agriculture in Iran, our study highlights the additional limitations caused by the paucity of suitable land resources.That is, Iran as a member of Convention on Biological Diversity is obliged to fulfil Aichi Biodiversity Targets whose Target 11 requires Iran to expand its protected area to 17% by 2020, which is almost double the size of the current protected areas in Iran .