MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra
Mass spectrometry data is one of the key sources of information in many workflows in medicine and across the life sciences. Mass fragmentation spectra are generally considered to be characteristic signatures of the chemical compound they originate from, yet the chemical structure itself usually cannot be easily deduced from the spectrum. Often, spectral similarity measures are used as a proxy for structural similarity but this approach is strongly limited by a generally poor correlation between both metrics. Here, we propose MS2DeepScore: a novel Siamese neural network to predict the structural similarity between two chemical structures solely based on their MS/MS fragmentation spectra. Using a cleaned dataset of > 100,000 mass spectra of about 15,000 unique known compounds, we trained MS2DeepScore to predict structural similarity scores for spectrum pairs with high accuracy. In addition, sampling different model varieties through Monte-Carlo Dropout is used to further improve the predictions and assess the model’s prediction uncertainty. On 3600 spectra of 500 unseen compounds, MS2DeepScore is able to identify highly-reliable structural matches and to predict Tanimoto scores for pairs of molecules based on their fragment spectra with a root mean squared error of about 0.15. Furthermore, the prediction uncertainty estimate can be used to select a subset of predictions with a root mean squared error of about 0.1. Furthermore, we demonstrate that MS2DeepScore outperforms classical spectral similarity measures in retrieving chemically related compound pairs from large mass spectral datasets, thereby illustrating its potential for spectral library matching. Finally, MS2DeepScore can also be used to create chemically meaningful mass spectral embeddings that could be used to cluster large numbers of spectra. Added to the recently introduced unsupervised Spec2Vec metric, we believe that machine learning-supported mass spectral similarity measures have great potential for a range of metabolomics data processing pipelines.
Main Authors: | , , , |
---|---|
Format: | Article/Letter to editor biblioteca |
Language: | English |
Subjects: | Deep learning, Mass spectrometry, Metabolomics, Spectral similarity measure, Supervised machine learning, |
Online Access: | https://research.wur.nl/en/publications/ms2deepscore-a-novel-deep-learning-similarity-measure-to-compare- |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
dig-wur-nl-wurpubs-589301 |
---|---|
record_format |
koha |
spelling |
dig-wur-nl-wurpubs-5893012025-01-14 Huber, Florian van der Burg, Sven van der Hooft, Justin J.J. Ridder, Lars Article/Letter to editor Journal of Cheminformatics 13 (2021) 1 ISSN: 1758-2946 MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra 2021 Mass spectrometry data is one of the key sources of information in many workflows in medicine and across the life sciences. Mass fragmentation spectra are generally considered to be characteristic signatures of the chemical compound they originate from, yet the chemical structure itself usually cannot be easily deduced from the spectrum. Often, spectral similarity measures are used as a proxy for structural similarity but this approach is strongly limited by a generally poor correlation between both metrics. Here, we propose MS2DeepScore: a novel Siamese neural network to predict the structural similarity between two chemical structures solely based on their MS/MS fragmentation spectra. Using a cleaned dataset of > 100,000 mass spectra of about 15,000 unique known compounds, we trained MS2DeepScore to predict structural similarity scores for spectrum pairs with high accuracy. In addition, sampling different model varieties through Monte-Carlo Dropout is used to further improve the predictions and assess the model’s prediction uncertainty. On 3600 spectra of 500 unseen compounds, MS2DeepScore is able to identify highly-reliable structural matches and to predict Tanimoto scores for pairs of molecules based on their fragment spectra with a root mean squared error of about 0.15. Furthermore, the prediction uncertainty estimate can be used to select a subset of predictions with a root mean squared error of about 0.1. Furthermore, we demonstrate that MS2DeepScore outperforms classical spectral similarity measures in retrieving chemically related compound pairs from large mass spectral datasets, thereby illustrating its potential for spectral library matching. Finally, MS2DeepScore can also be used to create chemically meaningful mass spectral embeddings that could be used to cluster large numbers of spectra. Added to the recently introduced unsupervised Spec2Vec metric, we believe that machine learning-supported mass spectral similarity measures have great potential for a range of metabolomics data processing pipelines. en application/pdf https://research.wur.nl/en/publications/ms2deepscore-a-novel-deep-learning-similarity-measure-to-compare- 10.1186/s13321-021-00558-4 https://edepot.wur.nl/557087 Deep learning Mass spectrometry Metabolomics Spectral similarity measure Supervised machine learning https://creativecommons.org/licenses/by/4.0/ https://creativecommons.org/licenses/by/4.0/ Wageningen University & Research |
institution |
WUR NL |
collection |
DSpace |
country |
Países bajos |
countrycode |
NL |
component |
Bibliográfico |
access |
En linea |
databasecode |
dig-wur-nl |
tag |
biblioteca |
region |
Europa del Oeste |
libraryname |
WUR Library Netherlands |
language |
English |
topic |
Deep learning Mass spectrometry Metabolomics Spectral similarity measure Supervised machine learning Deep learning Mass spectrometry Metabolomics Spectral similarity measure Supervised machine learning |
spellingShingle |
Deep learning Mass spectrometry Metabolomics Spectral similarity measure Supervised machine learning Deep learning Mass spectrometry Metabolomics Spectral similarity measure Supervised machine learning Huber, Florian van der Burg, Sven van der Hooft, Justin J.J. Ridder, Lars MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra |
description |
Mass spectrometry data is one of the key sources of information in many workflows in medicine and across the life sciences. Mass fragmentation spectra are generally considered to be characteristic signatures of the chemical compound they originate from, yet the chemical structure itself usually cannot be easily deduced from the spectrum. Often, spectral similarity measures are used as a proxy for structural similarity but this approach is strongly limited by a generally poor correlation between both metrics. Here, we propose MS2DeepScore: a novel Siamese neural network to predict the structural similarity between two chemical structures solely based on their MS/MS fragmentation spectra. Using a cleaned dataset of > 100,000 mass spectra of about 15,000 unique known compounds, we trained MS2DeepScore to predict structural similarity scores for spectrum pairs with high accuracy. In addition, sampling different model varieties through Monte-Carlo Dropout is used to further improve the predictions and assess the model’s prediction uncertainty. On 3600 spectra of 500 unseen compounds, MS2DeepScore is able to identify highly-reliable structural matches and to predict Tanimoto scores for pairs of molecules based on their fragment spectra with a root mean squared error of about 0.15. Furthermore, the prediction uncertainty estimate can be used to select a subset of predictions with a root mean squared error of about 0.1. Furthermore, we demonstrate that MS2DeepScore outperforms classical spectral similarity measures in retrieving chemically related compound pairs from large mass spectral datasets, thereby illustrating its potential for spectral library matching. Finally, MS2DeepScore can also be used to create chemically meaningful mass spectral embeddings that could be used to cluster large numbers of spectra. Added to the recently introduced unsupervised Spec2Vec metric, we believe that machine learning-supported mass spectral similarity measures have great potential for a range of metabolomics data processing pipelines. |
format |
Article/Letter to editor |
topic_facet |
Deep learning Mass spectrometry Metabolomics Spectral similarity measure Supervised machine learning |
author |
Huber, Florian van der Burg, Sven van der Hooft, Justin J.J. Ridder, Lars |
author_facet |
Huber, Florian van der Burg, Sven van der Hooft, Justin J.J. Ridder, Lars |
author_sort |
Huber, Florian |
title |
MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra |
title_short |
MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra |
title_full |
MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra |
title_fullStr |
MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra |
title_full_unstemmed |
MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra |
title_sort |
ms2deepscore: a novel deep learning similarity measure to compare tandem mass spectra |
url |
https://research.wur.nl/en/publications/ms2deepscore-a-novel-deep-learning-similarity-measure-to-compare- |
work_keys_str_mv |
AT huberflorian ms2deepscoreanoveldeeplearningsimilaritymeasuretocomparetandemmassspectra AT vanderburgsven ms2deepscoreanoveldeeplearningsimilaritymeasuretocomparetandemmassspectra AT vanderhooftjustinjj ms2deepscoreanoveldeeplearningsimilaritymeasuretocomparetandemmassspectra AT ridderlars ms2deepscoreanoveldeeplearningsimilaritymeasuretocomparetandemmassspectra |
_version_ |
1822266252496732160 |