On the benefits and role of electron-induced dissociation in lipidomics

NOTE: After the initial publication of this post, we added more details and results on the correct (!) annotation of lipids with multiple double bonds. It’s at the end.

One of the grand challenges in metabolomics using mass spectrometry (MS) is the limited and somewhat redundant information provided by collision-induced dissociation (CID). CID is incredibly versatile and quick but tends to fragment weaker bonds, such as C-O bonds. This bias allows us to easily observe fragments indicative of specific functional groups—such as ethanolamine, phosphatidic groups, and glucuronic conjugates—that are diagnostic for certain compound classes.

Unfortunately, CID struggles to break stronger bonds like C-C or C=C, which are crucial for resolving the structure in parts of the molecule dominated by hydrocarbons, partially hydroxylated substances, and potentially ring formations. Practically, this means that no fragments are available to distinguish between isomers within certain compound classes, such as steroids or oxidised lipids.

Two complementary approaches can bridge this information gap. The first is chemical derivatisation, which “activates” strong bonds so they fragment more readily under CID. Examples include the analysis of C=C bonds, which can be opened photochemically through the Paternò-Büchi reaction or cleaved by ozonolysis.

The second approach involves using alternative dissociation techniques. Over the past two decades, mainly two such techniques have emerged: photodissociation by UV light (at 193 and 213 nm) and electron-induced dissociation (EID), which operates through various mechanisms depending on the kinetic energy that is imposed to the electrons. While low energy of few eV is sufficient to dissociate multiply charged cations (in ECD and ETD), more is necessary for singly charged organic molecules (in EIEIO, see below). Both techniques are orthogonal to CID, producing informative fragments that CID cannot observe. We were highly intrigued by the unique possibilities offered by these techniques and have invested in commercial systems capable of both CID/UVPD and CID/EID.

Over the past two years, we have been using the SCIEX ZenoTOF 7600 system equipped with Electron-Activated Dissociation (EAD). For those unfamiliar, the EAD cell is positioned before the CID cell and allows for adjusting the electron beam’s kinetic energy across a broad range (0-25 eV). This system’s sensitivity and scan rates are excellent, making data acquisition a mostly enjoyable process. We have been using the 7600 productively for untargeted lipidomics and metabolomics primarily via CID but have also conducted extensive testing with EID.

The heart of our ZenoTOF: Q1, EAD cell, CID quad, and Zeno trap on the far left.

We initially focused on lipids because they are conceptually more straightforward (no need to model rearrangements) and conducted thousands of analyses using both standards and actual samples. The credit for this pioneering work essentially goes to three lab members: Vincen Wu, Abraham Moyal, and in the early stages Alaa Othman.

Meanwhile, I have delivered about a dozen talks and webinars on this subject. I must admit that the content and the key takeaways have changed significantly every six months. This evolution reflects our journey of continuous learning and adaptation. I dare to say that now we have a much better understanding of both the benefits and limitations of the EAD system in lipid analysis and insights into how to effectively synergize CID and EID.

Since much of this story won’t be published in traditional paper form, I wanted to share some key lessons we’ve learned throughout our journey with lipids.

Early expectations

The power of the EAD system has been showcased in many publications by Dr. Takashi Baba from SCIEX. I’d like to highlight two of them which are of relevance for the story. The first iconic publication is Campbell and Baba, Analytical Chemistry 2015, 87, 11, 5837-5845 DOI. In short, they show how an electron beam of 8-15 eV can fragment virtually all C-C bonds in the fatty cid chain of singly charged lipids. The specific term for this is electron impact excitation of ions from organics, EIEIO.

In a saturated fatty acyl chain, the EIEIO spectrum is characterized by a ladder with steps of 14.015 Th, corresponding to a series associated with -CH2-. Mainly, two fragments are produced at each bond: the homolytic-cleaved radical peak (referred to as 0) and a hydrogen-loss, non-radical peak (-1). The radical ion tends to dominate below < 10 eV, and the -1 loss increases considerably above > 12 eV. In some circumstances, we can also observe a +1 proton gain, even for the n-electron fragment, but its abundance is generally much lower.

If a double bond is present, there will be a shift of 2H, split among the two carbon atoms at the C=C bond. This will be reflected in two consecutive, shorter shifts of the radical fragments in the spectrum by 13.008 Th (instead of 14 Th). As the abundance of fragments with cleavage at the double bond is lower, it is sometimes easier to observe a larger gap of 26.016 Th (instead of 28 Th).

A double bond also affects the fragments’ intensity. It is lower for the cleavage at the double bond and higher for the C-C bonds that are two “steps” away because of the double bond’s stabilising effect on radicals on neighbouring carbon atoms. If the spectrum is of good quality and has low noise, these effects manifest as a V shape in the spectrum (as shown in Figure 1).

AAdditionally, EIEIO can cleave the C1-C2 bond in the glycerol backbone, enabling it to distinguish regioisomers at positions sn-1 and sn-2.

Figure 1. EIEIO spectrum of LysoPC 18:1(9Z). Reprinted with permission from Analytical Chemistry. 2015, 87, 11, 5837-5845. Copyright 2015 American Chemical Society

These findings were generalized to a wide array of lipid classes and ions (DOI), to demonstrate the power of terrific power of the EIEIO in characterizing the structure of glycerophospholipids, sphingolipids, and acylglycerols in complex samples. Overall, these publications (and many more) hold promise for the localization of C=C bonds in lipids.  

We could easily confirm all of the above in our lab, but we wanted to establish EIEIO in our routine LC-MS workflows. For this to happen, two key questions had to be addressed:
• Is it possible to collect informative EAD data in LC-MS?
• Is it possible to automatize the interpretation of EAD spectra?

Collecting informative EAD spectra in an LC-MS timescale

Compared to CID, EAD/EIEIO poses additional challenges. One is its low fragmentation efficiency, which is proportional to the square of the molecule’s charge and, thus, is lowest with singly charged analytes metabolites/lipids. As a consequence of the low yield, it’s quite common in EIEIO spectra to observe that (i) a large fraction of the precursor ion is unfragmented (as in Figure 1), and that (ii) the intensity of the fragments pertaining to the fatty acyl chains is abysmally smaller (100-500x). For visualization, the spectra have to be rescaled to display all fragments.

Most demonstrators adopt two tricks to compensate for the low efficiency: acquiring MS2 spectra for several minutes using direct infusion and using concentrated solutions of lipid standards. These tricks, however, don’t hold for routine analysis of real samples by LC-MS. We tend to use short LC gradients of 2-8 minutes, which produce chromatographic peaks of 1-2 seconds followed by a tail where it’s inconvenient to fragment. Assuming that we could perfectly time MS2 acquisition by DDA or narrow window DIA, is this enough to collect high-quality EAD spectra of lipids?

This question gains even more relevance if we consider that EAD (und UVPD) require considerably longer than CID to activate and dissociate metabolites, typically between 10s to 100s milliseconds. On the ZenoTOF system, this is called “reaction time” and has a minimum duration of 30 ms. Upon reaction, it takes an additional 5 ms to analyze the fragments in the TOF section. The instrument allows the improvement of signal-to-noise by adding/averaging across multiple cycles. In the SCIEX world, this is called “accumulation time”, and it is a multiple of (reaction time + 5 ms). For 3 cycles, the accumulation time is ca. 105 ms. The number of repetitions can be adjusted easily (yay!), but not in real-time or in a data-dependent mode (yes, this is an official feature request). 

Side note: with CID, the ZenoTOF acquires MS2 scans at 200 Hz, 5 ms. The fastest we can do by EAD is 35 ms, which is 7 times slower. Importantly, this apparently large difference has only minor consequences in a real-life DDA sequence because (i) there is a considerable and constant overhead for measuring the MS1 scan (50-100 ms) and processing it and (ii) even with 35 ms, the instrument remains blazingly fast to fragment almost everything we have in our complex samples. In our test with a complex lipid extract and the shortest LC gradient of 2.2 min, the gap between CID and EAD is only 20% (Figure 2, last two lines). The difference is likely to be even less pronounced if we use a slower gradient. One other way of interpreting this, is that scanning at 200Hz/5 ms is unnecessarily fast for metabolomics/lipidomics… but this is a discussion for another day.

Figure 2. Comparison of DDA runs for the analysis of a complex lipid mix with a 2.2. min LC-MS method. The lipid mix consist of a mix of 4-5 different extracts of different sample types obtained from Avanti, and is extremely rich in features. Analysis by Alaa Othman.

Back to the key question: Can informative EIEIO spectra be collected in LC-MS? This is not addressed by the above analysis because we wondered about the diagnostic fragments that are unique to EIEIO, and we also needed to know the ground truth. To address this question, Vincen acquired hundreds of analyses of a lipid standard (Avanti’s LightSPLASH) varying multiple factors: (i) the concentration of lipids, (ii) the properties of the electron beam (filament current and kinetic energy), (iii) the reaction time from 30 ms to 200 ms, (iv) and the total EAD scan time up to ca. 1000 ms. He acquired all data in DDA using a 2.2 min LC gradient. He then went through all the files, spectra, and classes of lipids to verify whether the diagnostic peaks described by Baba, Campbell, et al. (DOI) could be recovered. He then created dozens of heatmaps to identify what settings were critical to obtaining the desired information. An example is shown below, but the analysis was extended to representatives of most glycerophospholipid classes.

Figure 3. Examples of recovery of diagnostic fragments in EAD spectra of PC 15:0/18:1(9Z) collected by LC-MS. Data and slide are courtesy of Vincen Wu.

We are finalizing a paper with all the results, but the key lessons results are:

  • Concentration is important, but it’s possible to obtain the key diagnostic fragments also at low concentrations.
  • A reaction time of 30 to 60 ms is optimal. Longer intervals create additional, hard-to interpret fragments.
  • Increasing accumulation time with multiple repetitions provides marginal improvements.
  • The filament current and kinetic energy are quite flexible: they can be adjusted over a wide range without critical outcomes.
  • Overall: yes, we obtain the key regiospecific information over a wide range of concentrations and lipid classes in as short as 35 ms.

So far, great news! Next we wanted to test whether the same EIEIO spectra contained the information necessary to correctly locate double bond position. To fairly evaluate the information context of the spectrum, we had to automatize the analysis of the ladder to infer the C=C position. This was the objective of a more tortuous adventure…

Automatic interpretation of EIEIO spectra

Our main interest was originally to learn from lipid analysis before moving on to metabolites and natural products. Therefore, we approached the problem of elucidation from spectra with the classical metabolomics paradigm: enumerating all candidates in full structural detail, scoring the affinity of the hypothetical candidate with the measured spectrum, ranking, and thresholding.

1st generation: combinatorial fragmenters

The first attempt was done with a combinatorial fragmenter, i.e. with MetFrag. We thought that the approach was well suited because rearrangements in lipids are negligible, and we could easily generate SMILES of candidates for rapid testing. The idea was that if a lipid is predicted to have a number of double bonds, we would test all isomers and identify a thresholding procedure to retain all candidates.

A side note on the number of isomers: we neglect stereoisomers, but consider all chemically plausible variants, including C=C and sn-positions. For a LPC 18:1/0:0, there are 16 possible isomers to test. The number would be smaller if we introduce prior knowledge on naturally occurring forms. For example, that a double bond can only be at predefined positions. This is, for example, what MS-DIAL 5.1 (DOI) does to restrict the options.

The inclusion of prior knowledge is debatable, and risks limiting new discoveries. To avoid any controversy or bias, we decided to stick to a totally agnostic approach and test all chemically plausible isomers, with the hope that the data would indicate that only specific positions are found in natural extracts. After all, EAD is advocated to be able to resolve any C=C location, and therefore, we accepted the challenge. This was also motivated by the need to think of solutions that would also work for different classes of compounds.

We needed to adapt the native scoring scheme that MetFrag uses to determine the likelihood of a fragment because it was heavily dependent on the bond strength but didn’t apply to EAD/EIEIO. We modified the code in several ways to promote C-C cleavage but kept on struggling with speed. Speed was an issue because we were aiming for a setup capable of analysing thousands of spectra and testing up to thousands of isomers for each spectrum. Importantly, the problem of speed was not caused by a bad implementation but by the design that lies at the core of a combinatorial fragmenter that explores an immense tree of variants. This problem was further amplified by making C-C breaks more likely. Therefore, we had to think of a better strategy that is simpler and doesn’t try to start from scratch every time.

2nd generation: the SMARTS way

To bypass the fully combinatorial approach, we switched to a deterministic approach that exploits the well-defined fragmentation reactions documented by Baba et al. Technically, this is done by translating the rules into a syntax that can be applied to structures. One such syntax is given by SMARTS, which defines a language to identify substructures in SMILES strings and create the putative products. We created two dictionaries for EAD reactions: one with the class-specific reactions involving the headgroups, and a second with the generic rules for acyl chains. The former was particularly tedious because we had to include the whole head group to avoid the rule being applied to the wrong class. The latter was just three reactions: the homolytic cleavage, the heterolytic split, and the additional proton transfer that happens in the proximity of a double bond. Importantly, and unlike common practice when dealing with SMARTS in chemistry, we had to explicitly model hydrogen atoms to correctly model radical reactions.

The SMARTS dictionary for EAD allowed us to simulate fragments for virtually any lipid and to perform the calculation only once, providing a massive speed-up compared to the repeated on-the-fly computation done by MetFrag. The beauty of this approach is that it is fully compatible with any kind of isotopic tracing (2H or 13C), and it generates full structural identifiers for every predicted fragment. We built a long list of SMILES for lipids, including a pretty extensive list of chain lengths and number of double bonds,

We generated a long list of lipids permuting different types, chain lengths, chain positions, double bound numbers and positions. For each, we had a SMILE, which was processed by RDKit to obtain the list of putative fragments including their structure. Comparing measured with simulated data of standard leads to a very good match.

Figure 4. Comparison of experimental and SMARTS-predicted fragmentation spectrum for LysoPC 17:0 (d5). Note that the experimental spectrum is plotted in log scale to visualize all fragments. Green bar indicate predicted peaks that matched to experimental fragments. Red are predicted fragments that were not found in experimental data. The intensity is determined by heuristic rules.

We then started annotating features by comparing the predicted spectrum with the measured one. We did so also for incomplete spectra because of the precursor’s low abundance, or the short accumulation time. Because this scenario is commonly occurring in the LC-MS analysis of real samples, we wanted a method that correctly deals with missing and non-ideal data. Unless the expected fragments with the expected mass gaps of 13 or 26 Th can be detected, the uncertainty in the precise localization of the double bond should be reflected in the report. One such example is reported in Figure 5. In principle, it works, but we found it prohibitive to come up with a scoring scheme that allowed us to derive thresholds, FDRs, or any measure of confidence.

Figure 5. Example of matching between spectrum of unknown lipid and predicted spectra of all suited candidates from library. The higher score of the two candidates with 18:2(2,4) over all of the others (9,11), (9,12), (10,12), … suggests that the double bonds are located close to the headgroup.

In principle, everything works and is blazingly fast, but we found it prohibitively difficult to come up with a scoring scheme that allowed us to derive thresholds, FDRs, or any measure of confidence. The challenge was that we were trying to match simultaneously (a) a few large fragments that are associated with the class of the head group and the intact acyl chains and (b) a multitude of small peaks that indicate the position of C=C.

The canonical similarity measures that are employed for CID spectra (like the modified cosine similarity) fail to reward the fatty acyl fragments. We tried several hybrid scores, weightings, etc. but never got around the problem that tiny differences in either the measured or the simulated section of the large fragments had strong influence on the total score.

3rd generation: hierarchical annotation

The logical consequence was that, instead of trying to solve the full structure, we opted to address the problem in sequential, specialized steps. This approach allows using different scoring at each step, and is well suited to tackle lipid annotation. It comes with the disadvantage that it’s harder to determine confidence scores or FDRs, but this didn’t seem an insurmountable issue.

The resulting workflow (Figure 6) uses m/z and class-specific fragments to initially determine class, number of carbon atoms, and double bonds. At this level, one can use both the EAD-specific rules from the SMARTS dictionary or one of the many fragment libraries that are available for CID, like LipidBlast, LipiDex, AdipoAtlas, etc. Next, we screen for fragments that indicate acyl chain lengths. As EAD is typically run in positive mode (negative is possible, but won’t be discussed here), this step boils down to searching for neutral losses or acylium fragments of the fatty acids. In some cases, we might also have already fragments that indicate sn-positions. Because of the principles mentioned above, we avoid relying on prior knowledge or databases to shrink the possibilities based on databases.

The hierarchical approach outlined below ensured that the annotation of EAD spectra is at least as detailed as if one would use CID. Only when this level is guaranteed, we seek to increase the granularity of annotation by exploiting EAD-specific peaks in two subsequent stages. The way to fix – if possible – the fatty acid chain at position sn2 by relying on the fragmentation that can occur between C1-C2 (and C2-C3 in TGs). If the fragment is found, it is possible to determine all sn-positions for glycerophospholipids and only partially for glycerolipids.

The final stage was to seek for C=C position. Based on the information inherited from the previous steps, and regardless of whether sn-regioisomers could be determined, we created all possible isomers that exist given the chains and number of double bonds. Again, the only limit we imposed on the position of double bonds is that they can’t be consecutive. This is very conservative but will become handy in the calculation of confidence. For the example shown in Figure 6, PE 18:0/22:6, there is a total of 5005 isomers to be evaluated.

Figure 6. Hierarchical annotation of EAD spectra.

We generated the list of fragments expected to originate from the fatty acid chains for all isomers according to the rules defined above with SMARTS. All spectra were precalculated for all chain lengths, C=C numbers and positions. This enabled substantially increased speed, such that the matching of all candidates could be kept below 10 s – even when 10’000 candidates had to be tested. Without entering the details, we obtained a matching score for each candidate, we applied an adaptive cutoff to select all candidates that matched experimental data. Finally, we counted how frequently each bond of the acyl chain was a double across all matching candidates. This produced percentages that indicate the confidence we have in the assignment.

As a sanity check, we looked into spectra of poor quality because very low in abundance. One such example is shown in Figure 7 for a standard that we injected at low concentration. The “ladder” is hardly visible, and the spectrum seems contaminated by noise. Despite the low quality of the spectra (compared to the idea cases obtained by direct infusion), 4 out of 6 positions were correctly identified (10, 13, 16, 19). For two (10, 16), we also found high scores for neighbours (11, 17), which could be expected if the precise fragments were missing. For two positions (4 and 7), we found a neighbour. We will explain later why low positions are more difficult to assign. Overall, the results with non-ideal were very encouraging and suggested that the entire workflow was correctly quantifying confidence. 

Figure 7 – C=C position prediction from a crappy EAD spectrum of PE 18:0/22:6

We were finally happy with the speed, flexibility, and “fairness” of the workflow. We started using it for the unattended analysis of LC-MS datasets. In the case of the complex lipid mix analyzed with an aggressive 2.2 min gradient with DDA-Top7-EAD, with a short 35 ms accumulation time, the full computational analysis of > 1000 MS2 spectra took < 5 min on a single CPU. The species could be assigned in 60% of the cases, sn-position in 15% of the cases, and we performed the C=C estimation for the ca. 550 cases in which the acyl chains could be determined in the initial stages.

Figure 8 – Annotation rate for LC-MS analysis of lipid mix in 2.2 min LC-DDA-EAD-MS method.

The problem with proton transfer

Did the 3rd generation workflow solve the problem of systematically annotating EAD spectra? No, it didn’t. Even for high quality EIEIO spectra of lipid standards with only 1 double bond, we never really obtained high confidence on the C=C position (<50%). Why are we so uncertain in double bound assignment? The reason will be hopefully trivial at the end of this section, but deserves an articulated explanation.

The 3rd generation workflow was instrumental to understanding the underlying issue: since we are testing all isomers, a low confidence (%) indicates that numerous false candidates with double-bond at the wrong position had spectra that are equally compatible to the measured one. Given that the simulation seemed fine since gen 2, we wondered about what could have misled the scoring.

The first step was to identify what aspects of an EIEIO spectrum are important for C=C localization. In principle, a double bond can identified either by (i) a double, consecutive shift of 13 Th between radical fragments (0) or by (ii) a shift of 2 Th “downstream” of the C=C. The 2 Th shift is presented frequently in primary literature and brochures to visually convey how easily different chains can be resolved. This should be taken with a grain of salt because the cases that are shown look nice but have little to do with the localization of double bonds. Specifically, they often cherry-picked glycerophospholipids with a fully saturated chain and an unsaturated one, like PC 18:0/18:1(9). In this scenario, the spectral differences between the two chains are always striking because half of the spectrum is shifted. The real-life problem, however, is not to compare chains with different numbers of double bonds but to discern whether the double bond in the FA18:1 is at position …, 6, 7, 8, 9, 10, … . As all candidate’s FAs have the same grade of saturation, the shifts remain local, and the precise determination of the C=C position depends on the ability to detect every single 2 Th shift in the proximity of the double bond. I hope this is can be clear without an extensive graphical explanation. For now, it’s just important to recognize that to determine C=C positions, it’s key to look at local shifts of 1 and 2 protons across the full spectrum.

This is what Vincen did for all compounds in the LightSPLASH mix over a wide range of settings. To reduce noise, he took long accumulation times and averaged across many MS2 spectra. I illustrate his results for the simplest case of LysoPC 18:1(9) [M+H]+ (m/z 522, Figure 9). We extracted the intensities of all homolytic radical fragments over the full chain (m/z -15 = 507, -29 = 493, -43 = 479, … in green bars). They are on a series of Δ14 Th, which at the double bond is twice Δ13 Th (409, 396, 383). In addition, we extracted the intensities for proton losses (orange,-1 and blue, -2) and proton gains (red, +1 and violet, +2). We show the results for an electron beam of 8, 10, 12, and 24 eV.


We learned that:

  • Proton losses (-1) are inevitable. The radical fragment (0, green) is predominant up to 10 eV, but (-1) takes over in almost all lipids at 12 eV and beyond.
  • Proton gains (+1) are visible close to the double bond or close to the ester bond at low energy (8 eV). They increase overall with energy and become dominant at very high energy (25 eV).
  • At the double bond position, we observe -2 losses (blue bars). Double gains are observed only at 25 eV, in line with the trend that (+1) is also more abundant.
  • It’s very hard to predict the intensity of any fragment. Even at the lowest tested energy, the measured intensities of the radical fragments vary a lot. Across all tested compounds (not shown here), we observe that fragment intensity varies depending on the position in the chain, chain length, lipid type, distance to double-bonds, proximity to ester bond, beam energy.

Without flooding this post with detailed simulations, I hope it is intuitive to recognize that the numerous proton transfers we observe in all spectra are a problem for the correct assignment of double bonds. Just to cite an example, the abundant (-1) proton loss fragment causes an “overestimation” of the double bond position because it introduces an additional Δ13-Δ13 pattern between the (0)=(0)-(-1) fragments, next to the correct Δ13-Δ13 pattern of radical fragments (0)-(0)=(0). For example, a n-6 is confused with a n-5.

The bottom line is that the uncertainty is caused by proton transfer and even-electron fragments that naturally occur with as low as 8 eV. This is neither new nor unexpected. For example, it is described in detail by Campbell and Baba (Figure 2b in their paper). The consequences, however, are often neglected. All reports that use standards with a known structure, and only list the expected, diagnostic, radical fragments are not indicative of whether the spectra are informative enough to identify the structure (or double-bond position). This information content of a spectrum can only be evaluated by testing for alternative matches, quantifying sensitivity-specificity, or precision-recall.

How can we improve the confidence?

One possibility is to mitigate the occurrence of unwanted fragments, in particular the proton loss fragment (-1) which is as abundant as the radical ions and, therefore, hard to filter. In our experience, this is only possible by using 8 eV and the shortest reaction time of 30 ms. Anything else will substantially increase the neighbourhood. This obviously comes at a cost: efficiency. If we compare the average intensities of fragments, at 8 eV, we obtain 3x less ions than at 10 eV, and ~ 10x less than at 12 eV (Figure 9). This is important because it can only be compensated by longer acquisition time, which further aggravates the challenges of performing EIEIO in a chromatographic time scale.

The second, more promising opportunity is to account for the peaks that emerge because of proton transfer when simulating EIEIO spectra. This is more than a mitigation strategy to prevent biases in the interpretation of spectra. We can actually demonstrate that adding proton transfers to the simulation brings a tangible benefit to the identification of the correct double bond position, both in theory and in practice. This is the result of the adamant work of Abraham Moyal in the past six months. The challenge in formally accounting for proton transfer is that we can easily predict what fragments are expected, but none of the intensities. This is not overly relevant if we consider that also the measurement during LC-MS is anyway noisy and incomplete. Therefore, it’s advisable to not rely too much on intensities for the scoring.

Add-on: 3rd generation scoring

I was asked about the scoring of candidates. There are two components: the simulation of EIEIO spectra for candidates and the similarity metric. We played extensively with both. The simulation can be determined from the knowledge gained with standards. For example, the lessons derived from LPC 18:1(9) (Figure 9) can be translated in heuristic rules that account for the effect of double bonds on radical ions and proton transfers. If we apply this scoring to the spectrum collected with 12 eV, which has a good yield but it crowded with proton transfers (Figure 9), we obtain an almost perfect match (Figure 10). Not surprisingly, we can correctly identify the position of single double bonds in most cases between 8-12 eV. At 25 eV, we are 50% correct because of the drastically shifted toward proton gains.

Figure 10 – EIEIO spectrum of LPC 18:1(9). The experimental spectrum obtained at 12 eV is shown at the top. The simulated spectrum in the mirrored part.

This was the easy part: mapping a single double bond on a LysoPC with a single chain. The real challenges emerge when (i) searching for double a bond on lipids with multiple chains, or when (ii) mapping multiple bonds on the same chain. In these scenarios, we have to consider overlaps of peaks and interactions between double bonds, which generate a heterogeneous ensemble of peak intensities. Do the rules derived from a single C=C apply to complex problems?

We show this for the example of PE 18:0/20:4(5,8,11,14) in Figure 10. Our 3rd generation SW simulated and scores 1365 isomers. We obtain the top score for PE 18:0/20:4(5,9,11,14), which indicates a double bond at position 9 instead of 8. Of course, knowing the structure of arachidonic acid would “correct” the issues, but want to rely on MS2 data.

What is the source of the error? Let’s compare the match match between the measured (top in blue/black) and simulated (bottom green/red). Objectively, the similarity is really good: most peaks are matched, and intensities correlate very well. We added the match for the correct candidate PE 18:0/20:4(5,8,11,14) in the bottom panel. The differences are minor: m/z 576, 591, and 616. The predicted intensities exceed reality, because we don’t model correctly the interaction of double bonds. The overall score is 10% lower, and the correct candidate ranks at position 15. I hope this clarifies that we can’t consider double bonds (and possibly oxidation) independently.

Figure 11: Putative annotation of EIEIO spectrum of PE 18:0/22:4(5,8,11,14) at 12 eV. The top panel indicates the match of the isomer with the best score. The bottom panel is the correct isomers at rank 15. The black/red mismatch at ca. 630 is a singular peak that is 0.050 Th off.

Add-on 2: Refined, interdependent 3rd generation scoring

Not happy, we went further by (i) refining the model that describes the intensities of fragments for up to 6 carbons next to a single double bond; (ii) adding additional logic to describe the crosstalk between double bonds, (iii) and readjusting the scoring. This allows us to clearly identify the correct isomer at rank #1, both for LPC 18:1(9) and for PE 18:0/20:4(5,8,11,14) (Figure 12).

Figure 12: Putative annotation of LPC 18:1(9) (top) and PE 18:0/22:4(5,8,11,14) (bottom) fragmented by EIEIO at 12 eV. The intensity prediction was refined on the basis of the single bond case shown on the top, and we implemented additional rules to describe the interaction between double bonds and the ester bond.

Finally, we also tested the scoring with PE 18:0/22:6 (4,7,10,13,16,19) fragmented by EIEIO at 12 eV. This is the compound shown in Figure 7 that has 5005 possible isomers. Based on the score, 5 match well with the measured spectrum. If we aggregate all top results, the predicted localization of double bonds is PE 18:0/22:6(4~100%,7~100%,13~80%,16~80%,19~80%,9~40%,10~40%,…), which means that five C=C are correctly identified with highest probability (100-80%), and then there is a close call on the last C=C between position 9 and 10.

Frankly, this is working much better we could have been hoping for. Notably, the measured spectrum (blue bars on Figure 13) is far from an ideal EIEIO spectrum and has many missing values. Nevertheless, the top hit is off by a single double bond, 9 instead of 10. We are impressed.

Figure 13: Annotation of EIEIO spectrum of PE 18:0/22:6(4,7,10,13,16,19) at 12 eV. We show the match with the top-ranked candidate, which is PE 18:0/22:6(4,7,9,13,16,19).

To summarize: despite the challenges of noisy data, the progress made with scoring indicate that – while proton transfer can’t and should not be mitigated – we can engineering the score to improve predictions beyond radical ions, i.e. for spectra acquired at 12 eV. Abraham is working on this extensively, and managed to modify the scoring to robustly identify LC-MS spectra across a wide range of dilutions (Table 1). Details will be shared at a later time-point, but we wanted to finish with a positive note because, yes, it really seems plausible to automatically and fairly analyzed complex lipid samples by EIEIO with LC-MS!

DatasetPrecisionFDRRecallSpecificityAccuracy
Training data, 210 ms accumulation0.910.090.700.990.96
Test, 535 ms accumulation0.910.090.900.990.98
Test, 10x dilution, 210 ms accumulation0.900.100.520.990.94
Test, 100x dilution, 210 ms accumulationm1.000.000.151.000.93
Table 1 – Performance of new scoring approach for determination of double-bond position in LightSPLASH.

Summary

Yes, it is really possible to automatically and honestly analyze complex lipid samples by EIEIO with LC-MS! We demonstrated rapid identification of class, chain lengths and saturation, sn-isomerism, and double bond with > 90% accuracy with spectra collected with 2.2 min LC-MS methods or with MS2 accumulation time of 210 ms.

Albeit this contribution spent many words to discuss the identification of C=C bonds, I want to stress that this is not the criterion to judge the power of EAD or EIEIO. There are several methods (PB, OxID) that are clearly superior to EAD on this specific task because they are specialized in fragmenting C=C bonds and create a strong signal in an otherwise rather clean spectrum. EAD will always suffer from the limitation that the information is encoded in 10-20 peaks and, hence, it requires more material and effort. The true appeal of EAD is its versatility: it’s great and reliable in assigning sn– positions, it assists in double bonds, but will also indicate oxidation or other positions. It’s pretty amazing to obtain everything with a single box and method… and much of what we discovered here holds true for other classes of compounds.

I don’t spend more time on software: it’s crucial, but this is nothing new. One of the key lessons is not to extrapolate from an analysis done with a known standard. The key question is how the data will be used to generate a hypothesis, and this is a step that requires an inversed, quantitative, large-scale assessment. We have to keep in mind that the answer to the most important question is unlikely to be found in an Analytical Chemistry paper or a marketing brochure.

We still have not decided whether it’s better to (i) run DDA uniquely with CID and 5-10 ms scans (top 10 or so) and collect EAD in a second pass with ca. 200-1000 ms accumulation time to obtain the most informative spectra on the features that are indeed oxidized or have a double bond, or (ii) run DDA by EAD with 35 ms (top 5-7). It will need more testing. Ideally, we would love to decide in real-time on what to do for how long, but the DDA possibilities and performance offered by the instrument are still very limiting. This is the heritage after 10 years of DIA/SWATH. I might write about DDA on a different occasion, it’s also a great source for thoughts.

What to expect in the future?

Vincen is wrapping up a paper on the analysis of the fragments at LC-MS timescale, and the limiting step is me.

Regarding the software, we will publish the container that annotates spectra as described here, but this will take a few more months. Given the advocated flexibility of EAD, we decided to extend the whole framework to concomitantly identify double bonds and oxidation in acyl chains. At the core of these activities, there is a problem that I find particularly interesting also beyond lipids: how to avoid combinatorial explosion? Abraham’s method offers a linear solution and is well-suited to solve all problems at once. We “just” need some additional data to validate everything, polish algorithms, and then pack everything in a 5th-generation software for public use. Then we will move to other types of molecules…

This Post Has 3 Comments

  1. Kevin He

    This is an amazing journey. So informative. Thanks for sharing.

  2. JEAN-BAPTISTE VINCENDET

    Amazing work and knowledge building, it paves the way for the future.

  3. Evelyn Rampler

    Thanks for this valuable insight in your ongoing EAD work! Looking forward to see your paper on the fragment analysis published soon! We are also continuing our journey to use the ZenoTOF and EAD for LC-MS based structural assignment of (glyco)lipids… All the best from Vienna, Evelyn

Leave your comment:

This site uses Akismet to reduce spam. Learn how your comment data is processed.