Overview:  Systematic Process for the Identification of Unknowns
by GC-MS and LC-MS

Introduction:  In the last 46 years, we have developed a systematic process for the identification of “known unknowns” by GC-MS and LC-MS in commercial products.  We define “known unknowns” as non-targeted species which are known in the chemical literature or mass spectrometry reference databases, but unknown to the investigator.  The process is shown in the following simplified flowchart:

(Click figure to enlarge)


This process is described in detail in the February 2013 copy of LCGC, “MS-The Practical Art.”  The article is entitled:  “Identifying “Known Unknowns” in Commercial Products by Mass Spectrometry.”  A copy with associated ads is shown below:

LCGC PDF with Advertisements

The article originated from work presented at Pittcon in 2012.  The approach uses either NIST libraries or “spectraless” databases such as the CAS Registry or ChemSpider.  The “spectraless” ones are discussed at the bottom of this post.

(Click figure to enlarge)

NIST Search of EI and CID Spectra:  The initial step in our process utilizes computer searches of EI (GC-MS) or MSMS (LC-MS) spectra against reference libraries.  I have developed two courses that detail the use of the NIST search with EI and MSMS libraries and Wiley KnowItAll for EI libraries for the identification of unknowns.

Link to NIST Courses

Link to Wiley KnowItAll Course

Recently a "delta-mass" search algorithm has been developed for the NIST and the Wiley KnowItAll searches.  They are named the Hybrid and Adaptive searches, respectively.  This approach greatly expands the utility of all EI and MSMS libraries.  These novel and powerful searches are discussed in detail in the course links shown above.

The computer EI searches normally work better than MSMS ones, but the latter are still very useful.  We employ both purchased, proprietary, and other specialty databases.

The NIST search interfaces easily with a wide variety of manufacturers’ data processing and structural drawing programs:

(Click figure to enlarge)


“Spectra-Less” Database Searching:  If the NIST search is not successful, then accurate mass data is used to obtain a molecular formula (MF), a monoisotopic mass, or an average molecular weight.  This data is used to search very large databases such as the CAS Registry or ChemSpider  via web interfaces.  We define them as “spectra-less” databases because they contain no computer-searchable mass spectral data.  We had originally used this approach to search the TSCA database or our Eastman Chemical plant material listing.  The original artwork featured on the cover of our JASMS article is shown below:

(Click figure to enlarge)

The candidate structures from the CAS Registry or ChemSpider searches are prioritized by either the number of associated references or key words.  Other ancillary information such as mass spectral fragments in EI or MSMS spectra; isotopic abundances, UV spectra; types of ion adductsCI data; number of exchangeable protons, substructure, etc. are used to narrow the list to one structure.  This website has many screenshots (SciFinder1SciFinder2ChemSpider) that illustrate these approaches with many examples.

Model EI and CID Spectra from NIST Structure Search:  The NIST MS Search program ranks model compounds employing structural searching of both our commercial and in-house databases.  This is particularly valuable for finding model compounds for a proposed structure found in searches of “spectra-less” databases such as the CAS Registry and ChemSpider.  We use the NIST MS Interpreter  program to automatically correlate fragment ions in the EI and CID spectra with the component’s substructure.

“No Results” from Process:  There will be non-targeted species in the sample which are “unknown unknowns”, those not found in any reference libraries or “spectra-less” databases.  A few thoughts on their identification are discussed in another section.

Future Improvements Needed:  The approach works well for the majority of our samples which are fairly simple and contain components with molecular weights <500 daltons.  On the other hand, improvements are needed for complex samples and components with molecular weights >500 daltons.

Scroll to Top