Everyone knows the familiar feeling of having just a few lines of a song stuck in your head and wanting to know which song it was from. You open your browser and type it into your favorite search engine, instantly giving you not only the song, but the artist, year, album, and a multitude of other information. Researchers have a perhaps less-familiar, but still pressing desire to do this whenever they have a molecule they are interested in, but unfortunately, they couldn’t just punch the information into a search engine to get more information. Now scientists at the University of California San Diego have reported in the latest publication of Nature Biotechnology on a new type of tool so the same thing can be done for mass spectrometry (MS) data instead of text. This new web-enabled small-molecule MS search engine, called MASST – short for Mass Spectrometry Search Tool, is a starting point for enabling data-driven discovery of mass spectrometry data in the public domain and enables the reuse of public data and its related knowledge.
Mingxun Wang, Ph.D., the first author on the paper, programmed MASST to allow users to search for the same or similar spectra that are used as a proxy for molecules or similar molecules in public data sets. The results are produced as a report of datasets with their metadata, information associated with that dataset such as sample type, organism, organ, tissue, disease associations, geographic location, altitude depth, etc., that enables researchers to contextualize the information and draw new connections.
“While the goal of MASST is to become as efficient as a Google search engine, at this stage, it operates more like an Alta Vista with dial-up connection, but it is still a huge improvement over the index cards we had to use for library searches before such search engines came around,” said one of the paper’s authors Pieter Dorrestein, Ph.D., Director of the Collaborative Mass Spectrometry Innovation Center at the University of California San Diego.
Like Google, MASST is not targeted just at chemists, so anyone interested in molecules will be able to find value in the new tool. In the report, the authors highlight ten use cases, ranging from identifying connections from mouse models to human biology and across clinical studies, to tracking exposure in humans to molecules in food and the environment including exposure to toxins, drugs, and sunscreen. The impact will be on a vast number of fields including medicine, chemistry, molecular biology, genomics, microbiology, agriculture, ocean science, and ecology. In principle, it could have as much impact as BLAST and Psi-BLAST which were transformational to the genomic revolution and have been cited over 150,000 times. The team believes that MASST combined with GNPS/MassIVE repository could be transformational in the area for the characterization of small molecules.
While the foundation of a single spectrum search, what they are now calling MASST, was operational years ago, there was not enough public data to make the tool useful until now. Through the increased data and emergence of controlled vocabularies for metadata (supported by ReDU), the tool will only become increasingly useful to the community as more data enters the public domain and MASST increases its speed and capabilities.
As MASST continues to grow and evolve the team is actively working to improve its efficiency. While it currently takes between 10-20 seconds to complete a search of a few thousands of datasets for a single molecule, improvements will enable it to scale up dramatically. Eventually, the team hopes to perform comparisons of all the molecules of an entire data file or project at once. According to Dorrestein, something that could be in future rollouts is adding relative quantitative information to the results from studies that could further enhance the interpretation of those results.
More information on the tool can be found at masst.ucsd.edu.