Some of the pitfalls of this assumption are highlighted in Figure 5.2a. Consider a scenario where we want to compare samples 1-3. An analysis schema that does not account for the chemical relationships among the molecules in these samples , will assume that the sugars in samples 2 and 3 are as chemically related to the lipids in sample 1 as they are to each other. This would lead to the naive conclusion that samples 1 and 2, and samples 2 and 3 are equally distinct, yet they are not from a chemical perspective. On the other hand, if we account for the fact that sugar molecules are more chemically related to one another than they are to lipids, we can obtain a chemically-informed sample-to-sample comparison. Sedio and coworkers developed the chemical structural compositional similarity metric to account for relationships between molecules based on the similarity of their fragmentation spectra. While CSCS compares samples based on modified cosine scores obtained from molecular networking, we calculate chemical relationships based on structurally-informed molecular fingerprints. We express these relationships in the form of a hierarchy which enables the use of other tree-based tools for downstream data analyses. For example, in Figure 5.2a, we show that by using a tree of structural relationships between molecules in samples 1-3, we can apply UniFrac , a tree-informed distance metric and demonstrate that the composition of sample 1 is distinct from samples 2 and 3.The importance of comparing samples by accounting for their molecular relatedness is highlighted when we contrast the results from ignoring the tree structure to those which integrate it . With the structural context provided by Qemistree,square planter pots the differences between replicates across batches are comparable to the within-batch differences .
The retention time shift in this dataset leads to a strong technical signal that obscures the biological relationships among the samples pseudo-F=120.75, p=0.001 vs. tree informed pseudo-F=18.2239, p=0.001. We observed and remediated a similar pattern originating from plate-to-plate variation in a recently published study investigating the metabolome and microbiome of captive cheetahs . In this study, placing the molecules in a tree using Qemistree reduced the observed technical variation , and highlighted the dietary effect that was expected . These results show how systematic and spurious molecular differences can be mitigated in an unsupervised manner using chemically-informed distance measures based on a tree structure. As a case study, we used Qemistree to explore chemical diversity in a set of food samples collected as a part of the Global FoodOmics initiative . We selected a diverse range of food ingredients to represent animal, plant, and fungal groupings. We first performed feature-based molecular networking using MZmine to obtain spectral library matches for a subset of the chemical features . Understanding the chemical relationships between different foods is challenging because most molecules within foods are unannotated. Using Qemistree, we collated GNPS spectral library matches and in silico predictions from CSI:FingerID to annotate ~91% of the chemical features with molecular structures. Using ClassyFire , we assigned a chemical taxonomy to 60% of these structures; the remaining 40% returned no ClassyFire taxonomy. Labeling annotations allowed us to retrieve subtrees of distinct chemical classes such as flavonoids, alkaloids, phospholipids, acyl-carnitines, and O-glycosyl compounds in food products. We propagated ClassyFire annotations of chemical features to each internal node of the tree and labeled the nodes by pie charts depicting the distribution in chemical superclasses and classes of its tips. The molecular fingerprint-based hierarchy of chemical features agreed well with ClassyFire taxonomy assignment, further demonstrating that molecular fingerprints can meaningfully capture structural relationships among molecules in a hierarchical manner.
Furthermore, Qemistree coupled the chemical tree to sample metadata, revealing distinct chemical classes expected for each sample type. In contrast, honey, although categorized as an animal product, shared most of its chemical space with plant products, reflective of the plant nectar and pollen-based diet of honey bees. We observed a clade of flavonoids in both plant products and honey , but no other animal-based foods.While it is expected that a complex food such as blueberry kefir contains molecules from both blueberries and dairy, we can now visualize how individual ingredients and food preparation contribute to the chemical composition of complex foods. We noted that metabolite signatures that stem directly from particular ingredients, such as phosphoethanolamine from eggs, are present in egg scramble , but not in the other two foods highlighted . We can also observe the addition of ingredients in foods that were not listed as present in the initial set of ingredients. We were able to retrieve that there is black pepper in the egg scramble with chorizo and orange chicken, but that this signal is absent from the blueberry kefir . We show that our tree-based approach coherently captures chemical ontologies and relationships among molecules and samples in various publicly available datasets. Qemistree depends on representing chemical features as molecular fingerprints, and shares limitations with the underlying fingerprint prediction tool CSI:FingerID. For example, fingerprint prediction depends on the quality and coverage of MS/MS spectral databases available for training the predictive models, and these will improve as databases are enriched with more compound classes. Qemistree is also applicable in negative ionization mode; however, less molecular fingerprints can be confidently predicted due to less publicly available reference spectra, resulting in less extensive trees. In summary, we introduce a new tree-based approach for computing and representing chemical features detected in untargeted metabolomics studies. A hierarchy enables us to leverage existing tree-based tools, and can be augmented with structural and environmental annotations, greatly facilitating analysis and interpretation.
We anticipate that Qemistree, as a data organization strategy, will be broadly applicable across fields that perform global chemical analysis, from medicine to environmental microbiology to food science, and well beyond the examples shown here.We use SIRIUS , ZODIAC and CSI:FingerID to predict molecular substructures within mass spectrometry features in the MGF files imported as Mass Spectrometry Features. SIRIUS computes fragmentation trees for each molecular formula candidate of a feature and ranks these by score. SIRIUS uses MS1 spectrum in the MGF file to determine the candidate ion adduct to be used for the fragmentation tree computation of each feature. ZODIAC takes the top SIRIUS candidates as input and re-ranks molecular formula candidates considering reciprocal compound similarities in the dataset to increase correct molecular formula assignments. Subsequently, CSI:FingerID predicts molecular fingerprints for each feature based on the molecular formula with the highest ZODIAC score. Note that all spectra provided to the Qemistree pipeline do not necessarily produce a fingerprint. Indeed, SIRIUS does not compute fragmentation trees for multiply charged compounds and CSI:FingerID does not predict molecular fingerprints from spectra with less than 3 explained peaks. To ensure that high confidence molecular formulas are used in Qemistree, we only consider compounds with a ZODIAC score above 0.98 .Samples were analyzed using ultra high performance liquid chromatography coupled to a quadrupole-Orbitrap mass spectrometer . The quadrupole Orbitrap mass spectrometer was fitted with an electrospray source operating in positive ionisation mode. The source used the following parameters: spray voltage, +3500 V; heater temperature, 437.5°C; capillary temperature, 268.75°C; S-lens RF, 50 ; sheath gas flow rate, 52.5 ; and auxiliary gas flow rate, 13.75 . The samples were acquired in non-targeted MS2 acquisition mode,growing blackberries in containers with up to four MS2 scans of the most abundant ions per MS1 scan. The spectra were recorded from 0.48 to 17 min. The following parameters were used for full MS scan: resolution , Automatic Gain Control target , maximum injection time , scan range . For the datadependent in MS2 , the following parameters were used: resolution , AGC target , maximum injection time , loop count , isolation window fixed first mass . and up to four MS/MS scans of the most abundant ions per duty cycle. Higher energy collision induced dissociation was performed with a normalized collision energy of 30 . The data-dependent settings were set as follows: minimum AGC , apex trigger 3 to 15 s, charge exclusion 3-8 and > 8, exclude isotopes , dynamic exclusion .
Samples were collected, extracted, and MS data were acquired as a part of the Global FoodOmics project according to the sampling and data acquisition protocols described in Gauglitz et al., 2020 Food Chemistry. Briefly, 126 food samples were selected from the Global FoodOmics dataset. 119 simple food samples were selected to cover a broad spectrum of fruits, vegetables, meat and fungi. Each food was represented in at least triplicate in the data subset. Additionally 7 complex samples were selected that contained simple foods from the simple food subset in their ingredient lists. The complex foods were from two separate meals of orange chicken, a cooked cucumber and the sauce from a meal , sour cream, blueberry kefir, and egg scramble with chorizo. Sample metadata describes the food samples based on a food hierarchy beginning with plant vs. animal vs. fungus and increasing in detail down to persian cucumber vs. cherry tomato etc. . Briefly, samples were extracted in 95% LC-MS grade Ethanol; 5% LC-MS grade water. Samples were analyzed using the same LC-MS/MS setup and software as described above for the maXis II QTOF mass spectrometer , using a Phenomenex Kinetex C18 1.7 µm 100 x 2.1 column equipped with a guard cartridge . The instrument tuning and internal calibrant remained the same as described above. MS spectra were acquired in a positive ion mode in the range m/z 50–1,500. The mobile phases consisted of A and B , and the flow rate was set to 0.5 µL/min throughout the experiment, and the column maintained at 40℃.AT conceived the concept and managed the project. AT and YVB developed the algorithm and wrote the code for Qemistree. AT and YVB contributed equally to the work. LFN, RK, PCD supervised method implementation. KD, MW, JJJvdH, ME, DM, and AG tested and provided suggestions on how to improve the method. MW managed the deployment of Qemistree on GNPS. AT and MW developed the GNPS-Qemistree Dashboard. DA and AT wrote the documentation for the GNPS-Qemistree workflow. YVB, QZ, and AT developed Qemistree-iTOL visualization. LFN and MNE performed the mass-spectrometry for the evaluation dataset. AT, YVB, and LFN analyzed and interpreted the evaluation data. JMG performed mass spectrometry of the Global Foodomics samples. AT, JMG analyzed and interpreted the Global Foodomics data. KD, MF, ML, and SB supported the integration of SIRIUS, Zodiac, and CSI:FingerID. AT, YVB, PCD, and RK wrote the manuscript. LFN, JMG, MNE, JJJvdH, ME, KD, QZ, DM, AG, JH, MF, ML, SB, and RK improved the manuscript. The co-authors listed above supervised or provided support for the research and have given permission for the inclusion of the work in this dissertation.Flavor chemicals in electronic cigarette fluids , which may negatively impact human health, have been studied in a limited number of countries/locations. To gain an understanding of how the composition and concentrations of flavor chemicals in ECs are influenced by product sale location, we evaluated refill fluids manufactured by one company and purchased worldwide. Flavor chemicals were identified and quantified using gas chromatography-mass spectrometry . We then screened the fluids for their effects on cytotoxicity and proliferation and tested authentic standards of specific flavor chemicals to identify those that were cytotoxic at concentrations found in refill fluids. One hundred twenty-six flavor chemicals were detected in 103 bottles of refill fluid, and their number per/bottle ranged from 1 – 50 based on our target list. Two products had none of the flavor chemicals on our target list, nor did they have any non-targeted flavor chemicals. Twenty-eight flavor chemicals were present at concentrations ≥ 1 mg/mL in at least one product, and 6 of these were present at concentrations ≥ 10 mg/mL. The total flavor chemical concentration was ≥ 1 mg/mL in 70% of the refill fluids and ≥ 10 mg/mL in 26%. For sub-brand duplicate bottles purchased in different countries, flavor chemical concentrations were similar and induced similar responses in the in vitro assays . The levels of furaneol, benzyl alcohol, ethyl maltol, ethyl vanillin, corylone, and vanillin were significantly correlated with cytotoxicity. The margin of exposure calculations showed that pulegone and estragole levels were high enough in some products to present a non-trivial calculated risk for cancer.