https://masst.ucsd.edu/masstplus/
MASST+ is an improvement on GNPS Mass Spectrometry Search Tool (MASST). MASST+ provides fast and error tolerant search of metabolomics mass spectrometry data while reducing the search time by two orders of magnitude. It is capable of querying against databases of billions of mass spectra, which was not feasible with MASST. Like MASST, MASST+ is publicly available as a web service on GNPS.
If you know the spectrum USI of a spectrum you want to search with MASST+, you can enter it directly at https://masst.ucsd.edu/masstplus/.
(a) First, navigate to the spectrum of interest on the GNPS library. Here, a Malyngamide C spectrum is viewed. Next, click the "MASST+" link. (c) This opens the MASST+ tab which runs a mass spectral search and presents the results.
(a) Start by submitting a new molecular networking job on GNPS (this will require you to be logged in to a GNPS account). (b) When the job has completed, click "View All Clusters With IDs". (c) This will open a new tab, where you can click "Advanced MASST" and then "MASST+ Search" (or "MASST+ Analog Search") in order to start a new MASST+ search. (d) This will open a new tab for MASST+, where the search results will display after a few seconds.
We performed molecular networking (both clustering and spectral networking) using NETWORKING+ on the entirety of GNPS. We stored the results of CLUSTERING+ and PAIRING+ in tsv format.
We split the GNPS library into 9 divisions according to different precursor mass ranges and executed CLUSTERING+ on each of them. We provide the cluster information of each spectra and the centers for all clusters.
The output is in tsv
format. Each row of the tsv
output represents a spectra from GNPS library. The columns of the output represent:
cluster_idx
is a unique ID assigned to each cluster in the divisionscan
is a unique ID assigned to the each spectra in the divisionmz
is the precursor mass of the spectraRTINSECONDS
is the retention time of the spectraMSV_source
is the MSV library it belongs toFilename
is the GNPS source file of this spectra inside MSV libraryLocal_scan
is the spectra's scan number inside its GNPS source file
The clustering+ output files for all 9 divisions can be downloaded via the following links:
CLUSTERING+ output for division 0
CLUSTERING+ output for division 1
CLUSTERING+ output for division 2
CLUSTERING+ output for division 3
CLUSTERING+ output for division 4
CLUSTERING+ output for division 5
CLUSTERING+ output for division 6
CLUSTERING+ output for division 7
CLUSTERING+ output for division 8
We write the representative spectrum of each cluster_idx into a mgf file for each division.
The representative spectra for each cluster contains:
CLUSTERINDEX
is the cluster index in the divisionCLUSTERSIZE
is the number of spectra in the clusterMSV_LIB
is the source MSV library of the representative spectraFILENAME
is the source GNPS file of the representative spectraLOCAL_SCAN
is the scan number of the representative spectra inside the GNPS source filePEPMASS
is the percursor mass of the representative spectraRTINSECONDS
is the retention time of the representative spectra
BEGIN IONS
CLUSTERINDEX=9
CLUSTERSIZE=282
MSV_LIB=MSV000083789
FILENAME=pos_Cd10MYY_33.mgf
LOCAL_SCAN=1069
PEPMASS=53.0051
RTINSECONDS=336.875
31.991 36
38.0024 36
38.0076 36
49.9917 111
51.9917 36
52.8466 75
53.0038 1338
53.0203 40
67.9882 72
END IONS
The spectrum file for each division can be downloaded via the following links:
cluster centers for division 0
cluster centers for division 1
cluster centers for division 2
cluster centers for division 3
cluster centers for division 4
cluster centers for division 5
cluster centers for division 6
cluster centers for division 7
cluster centers for division 8
We apply PAIRING+ to the clusters resulting from CLUSTERING+ to compute the molecular network. The network is stored in two files
The first output file stores general information for the nodes of the GNPS molecular network in tsv
format. The network contains over 8M nodes (total number of non-singleton clusters resulting from CLUSTERING+) in total. Each row of the tsv
output represents a node in the network. The columns of the output represent:
scan_number_among_centers
is a unique ID assigned to each cluster in the networkcomponent_index
is a unique ID assigned to the each connected component in the networksource_division
is the division this cluster came from (ranges from division0 to division8)cluster_index_in_division
is the index of this cluster in its source divisioncluster_size
is the size of the clustercenter_MSV
is the source MSV library of the representative spectracenter_source_file
is the source GNPS file of the representative spectracenter_scan_in_source_file
is the scan number of the representative spectra inside the GNPS source filecenter_pepmass
is the percursor mass of the representative spectracenter_RT
is the retention time of the representative spectra
The second output file stores general information for the edges of the GNPS molecular network in tsv
format. Each row of the tsv
output represents an edge in the network. The columns of the output represent:
connected_component_index
is a unique ID assigned to each connected component in the networkfirst_center_scan_number
is a unique ID assigned to the each connected node in the networksecond_center_scan_number
is a unique ID assigned to the each connected node in the networkproduct
is the similarity dot-product between the two nodesproduct_shared
is the contribution of shared peak matches in the similarity scoreproduct_shifted
is the contribution of shifted peak matches in the similarity score