A recent commentary by Giera et al. (link) sparked a discussion about in-source fragmentation (ISF). The authors claimed that “ISF could account for over 70% of the peaks observed in typical LC–MS/MS metabolomic datasets”. This claim, echoed by other outlets, even went as far as suggesting that “over 70% of the dark metabolome could be an experimental artefact” (link).
Many metabolomics practitioners reacted with skepticism. We shared some doubts but could not reproduce the results due to the lack of detail and access to representative data or the spectral database used in the study.
From what we can gather, a catalogue of potential ISF was created using MS2 spectra from nearly 1 million standards at an energy level of 0 V (as described by Hoang et al). They then determined how many detectable MS1 features in a typical metabolomics dataset matched in terms of m/z values.
If our understanding is correct, the “>70%” claim becomes trivial because of the metabolites’ modular and scaffolded structure. Some colleagues pointed out that the analysis seemed to overlook basic checks on intensities, coelution, and MS2, which are all commonly used to identify ISFs or dereplicate features in real experiments. As the fraction of features that pass these disambiguation filters typically exceeds 50% of the detected ones, the estimate provided by Giera et al. seems inflated.
The path is the goal
The exact number is of marginal interest. Focusing on the exact percentage misses the larger issue: the methodology behind counting these features. Different algorithms can produce very different numbers. What really matters is how we estimate the impact of unwanted fragmentation on the features detected in complex, natural samples.
Together with Adriano Rutz (who did all the analyses), we tried to estimate the nature of MS1 centroids. Our goal wasn’t to provide a definitive answer but to contribute to a transparent discussion. To that end, we’ve made all our scripts and results publicly available in a repository (link). Large portions of the analyses were done with a reusable MZmine module (link). We have to thank Robin Schmid and Steffen Heuckeroth from MZio for their support in coding, testing, and sharing.
How to distinguish unwanted artefacts from true, intact chemicals and metabolites?
One key realization from our analysis was the importance of accounting for coelution when quantifying how many MS1 peaks are likely due to unwanted fragments. To avoid the bias of spurious MS1-MS2 matches across unrelated scans, we compared only matching pairs of MS1 and DDA-MS2 scans.
For each MS1 scan, we enumerated the MS1 centroids and analyzed how many can be isotopes, adducts, etc. For this step, we used the isotope module from MZmine and 6 of the most occurring mass shifts corresponding to isotopes or neutral adducts as stated by Nash et al. (link). For matching peaks between MS1 and MS2 centroids, we used the maximum between 0.01 Da and 20 ppm.
For each MS1 centroid, we can test whether (i) it can be an adduct of another MS1 centroid in the same scan, (ii) it can be an isotope of a neighbouring centroid, or (iii) it matches with an MS2 centroid of a coeluting species. It is important to notice that the classification is not mutually exclusive. The question is how to count MS1 centroids which have multiple possible annotations.
We reasoned that the highest confidence is with isotope detection because of their excellent predictability, both in terms of m/z and intensity (green shaded area in the figure below). Given the simplicity of detecting isotopic peaks, one would even opt to deisotope both MS1 and MS2 spectra before matching. We decided to keep them to obtain an analysis that relates to all centroids detected in MS1 scans.
Next, in terms of confidence, is the assignment of adducts (in red) and MS2 fragments (blue). Both groups represent a “worst case scenario” that includes all possible candidates upon matching solely by m/z. Both sets could be pruned by setting additional constraints on abundance, e.g. by checking that the abundance of the MS1 precursor that supposedly generates an MS2 fragment is enough to observe the fragment, or verifying that the abundance of a putative adduct is in a particular range compared to the protonated ion, and so on. We wanted to be inclusive and only considered m/z to obtain an upper bound. The remaining features outside any sets are “unexplained”. These are the unique features and contain known and unknown monoisotopic peaks that can’t be matched to an MS2 fragment.
Benchmarking data
Our analysis requires datasets in which MS2 spectra were collected for a large portion of MS1 centroids. We analyzed 3 such datasets from different labs, sample types, and instruments:
- DI-OT: A direct infusion-Tribrid Orbitrap data set collected by Corinna Brungs including 626 files of the MCE Scaffold data (both modes) by (link). Direct infusion allowed to obtain MS2 data for a large number of centroids MS1 data. The use of concentrated standards enabled the observation of low abundant fragments.
- DI-TOF: An internal MS2 library collected by Mario Povoa Correa by direct infusion on a SCIEX 7600 ZenoTOF system, using untargeted DDA. The library includes MS2 for 2582 chemically diverse standards at different activation voltages. As in in case of DI-OT, direct infusion allowed to obtain MS2 data for a large number of centroids MS1 data. The use of concentrated standards enabled the observation of low abundant fragments.
- LC-AT: On Yasin’s recommendation, we integrated data acquired by Bashar Amer from Thermo Fisher on an Orbitrap Astral (MSV000093526). Given the blazing speed of the AT detector, this dataset offers deep MS2 data of chromatographic peaks.
A big thanks to the scientists who acquired and made the data available!
A note on the depth of MS2 data: for DI-TOF and LC-AT datasets, we obtained MS2 data for precursors that, together, account for >75% of the TIC of the MS1 scan. For the DI-OT study, the coverage is about 50% of the TIC.
A note on activation energy: we decided to base our analysis on MS2 data generated at a low activation energy, typically 20 V. This deviates from the approach of Giera et al., who focused on data collected with an activation energy of 0 V (Hoang). It must be noted that “0 V CE” is merely a nominal value that hides numerous voltage drops that are necessary to isolate and move ions toward the detector. For example, most instruments employ a baseline difference of 5-7 V between the ends of collision cells to prevent crosstalk. Most instruments also employ lenses and potential drops to split non-covalent clusters. The specifics vary depending on instrument tuning and source settings. We wanted to consider the worst-case scenario and included all fragments observed with low activation, i.e. 20 V.
ISFs account for no more than 30% of the detectable MS1 centroids.
If we apply this logic to hundreds of MS1 spectra (and thousands of associated MS2 spectra) we obtain the following results:
Across all data sets and with the aforementioned metrics, we observe that the unique features account for at least 20-39% (on average) of the MS1 centroids (pink data). ISFs (green data) account for max. 7-34% (on average) of the MS1 centroids could be associated with an MS2 fragment.
The result stays mostly the same if we introduce intensity cutoffs to remove the lawn of small centroids close to the detection limit. For the records, if we neglect the existence of adducts and isotopes, the fraction of MS1 centroids that can be matched to MS2 fragments is 18-60%. Considering that we used data obtained at 20 V and neglected any threshold on the abundance of MS1 and MS2, the obtained numbers are conservative estimates.
Our take
Our analysis of matching MS1 and MS2 spectra indicates that less than 34% of MS1 peaks can be attributed to MS2 fragments, regardless of whether it’s a known or unknown feature (note: we don’t need to know and didn’t try to elucidate structures). Given the procedure we adopted, we believe that this is a very conservative upper limit. Based on these data, we conclude that the contribution of unwanted fragmentation in the MS is marginal and can’t justify the multitude of features we detect in an LC-MS experiment.
Our result is in contrast with the claim made by Giera et al. that 70% or more of the features commonly found in a metabolomics experiment are likely fragments, or with statements such that “this finding disrupts the prevailing assumption that the majority of peaks in mass spectra correspond to unique metabolites”. The difference, however, is easy to explain and resides in the fact that, next to m/z, we considered coelution a necessary precondition for peak matching.
This conversation is far from over, and there’s plenty of room for further refinement. We encourage the community to join in, contributing datasets and insights to advance the discussion. To foster this transparency, we’ve shared our complete results, scripts, and figures on our the following GitHub repository: https://github.com/zamboni-lab/ion-type-analysis.
Thanks for sharing this. Can this much lower number be due to the particular set you used?. Like too similar type of classes of standards, or only positive mode of acquision (I havent looked at the three data). Design seems great indeed. Just questioning the million standards against your numbers..
No, beyond a certain point size doesn’t matter. In fact, analyzing a library has pros and cons, but it’s not a precondition as we see with the LC-ASTRAL dataset. The crux is how you do the counting, and having matching MS1 and MS2.
All data sets we used are chemically very heterogeneous, in particular the libraries which were picked exactly because of their chemodiversity. We repeated the analysis on data generated in 3 different labs using very different MS analyzers and fragmentation techniques. We always come to the same conclusion. We are not sharing everything here, and I hope this will become clearer once we have the manuscript.
Nice work Nicola et al. I look forward to reading the full manuscript! I do have one bone of contention:
“Focusing on the exact percentage misses the larger issue: the methodology behind counting these features”. I like the first half of this sentence, but the last half seems off point. the methods are important, of course. But the largest issue is ensuring that we get the correct annotation answer with the highest frequency possible. Whether it is 10% or 70%, we need to know the problem exists and use methods to recognize ISF when it occurs. Also worth noting there are other in-source phenomenon which also deviate from the canonical ESI model which can add to the false positive annotation rate.