Imagine running a study with 100 samples by untargeted LC-MS (polar or lipids, you pick!). Data acquisition by DIA/DDA in complete in 1-2 days, and comes the step of data analysis.
The minimum expectation for the software is to find all peaks across samples (rare and frequent, high and low, symmetric and tailing peaks), take care of shifts in m/z and RT, group isotopologues and adducts, and eventually output a table with peak areas/volumes for all features. The luxury version also delivers consolidated MS2 spectra, optionally some recursive gap filling, and maybe a mzTab.
Common challenges in processing untargeted LC-MS/MS data
What software can accomplish all of this? The options are to use commercial software (if one can afford it), or one of the usual suspects: XCMS, openMS/KNIME, mzMine, MS-DIAL, … We tried everything we had access to – repeatedly over years – and the experience has been frustrating. Some recurring problems we encountered:
- The software crashed (reproducibly) with mid-sized LC-MS studies. In some instances, it computed for 7 days and eventually triggered blue screens or odd exceptions.
- Some software was speedy, but it was never a good sign. In our experience, if LC-MS software is too fast, it is because it’s cutting corners, leaving peaks behind, misaligning retention samples, reporting plenty of missing values, and so on.
- Parameter-less algorithms did poorly on real LC-MS samples. Parameter-dense algorithms were tunable, but in most cases, it was a manual, trial-and-mostly-error procedure. Some parameters are just not trivial to understand and to adjust.
- Plenty of coding is necessary to obtain the full result. Importantly, we had to traverse between R, python, maybe java to get everything done.
- The coherence of the results produced by different tools on the same data was surprisingly low. What should we trust?
Many of these issues, and new ones, emerged with an increasing number of samples. Admittedly, it could be that we were simply not skilled enough to get it to work. It could also be that there is commercial software that solves all of these problems and scales, but we could not afford it. Nevertheless, we are on a good average, and most of the metabolomics users interested in mid-to-large-sized studies (100’s to 1000’s of samples) had to – or would – face similar issues. Please feel free to articulate your experience in the comments.
Introducing SLAW: a scalable and self-optimizing LC-MS/MS analysis package
We are proud to present a tool that tackles most of the problems we experienced when processing untargeted LC-MS data. The credit goes to Alexis Delabrière, an extremely skilled and knowledgeable Postdoc with a PhD in Computer Science who spent two years developing a unifying system that got us past all frustrations. The result is SLAW. The name stands for scalable LC-MS analysis workflow, and it hides plenty of functionalities:
- SLAW performs peak picking by any of the most used and top-performing algorithms (featureFinderMetabo from openMS, ADAP from mzMine, CentWave from XCMS)
- SLAW includes a novel sample alignment procedure that scales to thousands of individual LC.MS files. The algorithm works on a desktop computer but can automatically make use of high-performance computing infrastructure.
- SLAW includes a novel parameter optimization procedure that efficiently optimises peak picking and alignment parameters for each dataset (i.e. LC method and MS instrument). In our tests, 5-7 parameters with complex, interdependent effects on the results were optimized simultaneously. This means that all the key parameters can be automatically tuned by SLAW without the user’s intervention for any of the embedded peak pickers. The optimization procedure employs a metric that combines two terms. The first is sensitivity (# of features detected), the second is robustness (frequency of detection across replicate samples). Thereby, parameter optimization ensures that – in large studies – peak picking and alignment don’t degenerate in a myriad of noisy and sporadic features but in robust and reproducible entities. Of course, the objective function could be modified at will, but the default form is effective for both small and large studies.
- SLAW groups isotopologues and adducts into features by exploiting both intra- and inter-sample correlations.
- SLAW fills missing values by raw data recursion, optimizing the extraction windows to minimize biases between recursed and non-recursed data. [This feature is only relevant for TOF data. For Orbitrap data, missing values have to be guessed by imputation.]
- SLAW scales for thousands of LC-MS files. We tested several datasets with > 1000 LC-MS files, and all went fine.
- SLAW digs into raw data to extract isotopic patterns from the best possible samples independent of peak picking.
- Regardless of the number of LC-MS files, SLAW consolidates all MS2 spectra collected across a study (in top-n or iterative procedures) in consensus spectra for each collision energy.
- SLAW outputs data in tables (csv), consensus MS2 spectra (mgf), as well as the individual MS2 spectra (mgf) and peaks table (csv).
- If optimization is included, SLAW returns optimal parameter values that can be used in future runs with similar measurements without repeating parameter searches.
- SLAW requires zero coding skills. SLAW comes as a containerized Docker and takes care of the communication between different languages (python, R, Cpp, …).
- SLAW can be installed in a minute via Docker and immediately run.
SLAW in practice
SLAW requires (i) centroided files in mzML or mzXML format, and (ii) a text file that describes the sample type for each file: BLANK, SAMPLE, QC, MS2. That’s it. Optionally, some fundamental settings (i.e. the peak picking algorithm) or preferred parameter ranges can be passed with a text file provided by SLAW. That’s it. The whole process is started with a simple command-line statement.
Comment: QC files are quality controls, typically pooled study samples that have been measured at regular intervals during the sequence. QC files are crucial for parameter optimization. For short sequences, 3-4 QC samples are sufficient. For long sequences, more is recommended. MS2 indicates files dedicated to fragmentation spectra, for example, as obtained by iterative MS2 methods.
Comment: any optimization procedure requires a metric. In SLAW, the metric combines two terms: sensitivity and robustness. Both can be measured based on QC samples, assuming they are representative of study samples. Notably, we don’t make use of spike-ins or any other ground truth. This was a decision we took early in development because (i) it adheres better to the idea of untargeted methods to capture possibly many features of different classes, and (ii) it preserves general and retroactive applicability (inclusion of pooled study samples is common practice, use of spike-ins is rarer).
Comment: the robustness term gains importance with the increasing number of samples. Pushing (only) sensitivity rewards the detection of noise. In large studies, this leads to detecting huge amounts of features in only a reduced number of samples. This degeneration can hardly be compensated in later filtering steps. The inclusion of a competing robustness term balances the optimization.
We tested SLAW on dozens of studies spanning different MS brands, number of samples, gradient lengths, etc., with satisfying results. In our lab, it has become a workhorse in routine analysis, both for polar and lipid extracts. As it helped us a lot, we think it might be of general interest for scientists that face similar problems.
We just submitted a paper that describes the different components (e.g. parameter optimization, alignment) and the re-coding done to make it scale. The paper compares the pros and cons of SLAW against the only two workflows that were found to scale: XCMS/IPO and openMS. The paper focuses on scalability (up to 2500 LC-MS files). We hope to be soon able to share the article. In the meantime, we are glad to share some exemplary results from tests that are not part of the manuscript.
Comparison to vendor software on a small dataset
We used SLAW to analyzed a complex lipidomics sample by DDA. We analyzed the same sample repeatedly on a Thermo QExactive HF-X and on an Agilent 6546 QTOF. Both datasets were analyzed with the vendor’s software. In SLAW, we used the out-of-the-box configuration and let the automated optimization find the best parameters individually for both datasets. Our evaluation was based on (i) number of peaks identified, (ii) CV of the identified peaks, (iii) number of lipid features left after declustering, deisotoping, and a crude annotation (MS1 and RT).
On QExactive data and compared to CompoundDiscoverer 3.1, SLAW (i) detects 2x more feature groups (8436 vs 4231), (ii) achieves better reproducibility (CV) over the whole range of intensities (13% vs 17%), (iii) finds more frequently features across samples (83% vs 26%), (iv) finds more putative species (1733 vs 1125).
On QTOF data and compared to ProFinder 10, SLAW (i) detects 2x more feature groups (18222 vs 8444), (ii) achieves similar reproducibility (CV ~ 13%), (iii) finds more frequently features across samples (98% vs 83% after gap filling. The pre- number is not available), (iv) finds more putative species (2462 vs 1460).
Large scale data
Exemplarily, we show the analysis of ca. 1200 serum lipidome samples by LC-MSn, both on TOF and Orbitrap. In both cases, SLAW completed the analysis in less than 12 hours. Unfortunately, we can’t compare it to vendor software because it crashes… Despite the sheer number of samples, the number of feature groups is consistent with expectations. Most features have been detected in the majority of samples. The CV (SD) is still large but is before any type of normalization.
Are you interested in using or testing SLAW? The simplest way is to install Docker and pull the latest SLAW version:
docker pull adelabriere/slaw:latest
You’ll find more information on GitHub https://github.com/adelabriere/SLAW (check the wiki). More will be disclosed with the publication of the companion paper.
How can you help us?
Please try it, challenge it, and report your experience. If you like it, just commenting below with a like or a thank you (to Alexis) will be enough to recognize that it is well-received.
If not, tell us what’s bad. Our main interest is in improving the algorithmic part, and we are eager to hear about scalability and parameter optimization. For instance, some TOF data are particularly challenging because of a dependency of peak shape on intensity, MCP ringing, and other issues. If you have good data to share (it doesn’t have to be large scale), it might help.
Comment: We don’t have the resources for troubleshooting data sets, in particular, if the problem is poor chromatography (massive tailing, noise, drifting RTs, overloaded columns, carryover, etc.), wrong instrument tuning, or that you don’t see a metabolite even though you know the m/z, but there is no peak, etc. ;-). For this, we apologize in advance.
Comment: SLAW does not include any GUI or fancy visualization. Such functionalities are not planned. SLAW will remain a stand-alone tool that focuses on the processing of untargeted LC-MS data. In our environment, SLAW is embedded in automated workflows. We have modules that precede SLAW and downstream modules that take SLAW’s output to visualization, analysis, annotation, etc.
Comment: SLAW does not include functionalities for annotation (identification) of features (i.e. by RT, AM, MS2, …). This is a work in progress and will be part of a different module. Therefore, for now, you’ll have to annotate by yourself.