检查红外谱库搜索结果中的数据分辨率效应
Examination of the Effect of Data Resolution on IR Spectral Database Search Results
Background
The pursuit of better search results has inspired innovation after innovation in the field of spectroscopy. It is commonly accepted that higher resolution of both reference databases and query spectra offer better comparison results for identification [1] . In the mid 1960’s, Sadtler’s original Spec-Finder product encoded up to 13 peaks at 11 intensity levels. Punch card limitations and computing power restricted options. Today, technological advances have changed disk space and computational time limitations of computer search methods even from the more recent past.
Storage space was a very significant concern only a decade ago. Hard disk drives averaged 10 GB or less, and storing and searching Sadtler’s databases in high resolution would have been a strain. However, 4 to 8 GB of high-resolution data will not stress even the lowest end computer on the market today. The average storage cost per gigabyte has dropped from over $10 to less than $0.10 over the same period [2] .
Computing power has also seen massive improvements. In the not too distant past, searching hundreds of thousands of spectra required nearly half an hour. Today’s multi-core processors with a multi-threaded application like Bio-Rad’s KnowItAll Informatics System [3] can perform the same task in only a few seconds. While storage cost has decreased by a factor of 100 in the last decade, the average computer’s computational cost has decreased by a factor of more than 1,000.
Resolution
Confusion occasionally occurs when discussing resolution because the term is loosely interchanged with the data point spacing of a spectrum. Resolution defines the minimum distinguishable, closely spaced peaks. Since a peak requires three data points to be distinguished, the resolution is actually twice the data point spacing of the spectrum. In a spectrum with 0.96 cm -1 data point spacing, any peaks that are at least 1.92 cm -1 (its resolution) apart can be distinguished.
The relationship of resolution to valuable information that can be extracted from a spectrum is almost linear. There is a point of diminishing returns that may be different depending on the application. The difference between a 32 cm -1 data point spaced spectrum and a 1 cm -1 data point spaced spectrum is visually apparent, so the perceived value is significant. The differences between a 4 cm -1 data point-spaced spectrum and 1 cm -1 data point-spaced spectrum are not visually evident. However, mathematically the differences can be observed and the impact on search results traced.
Conclusion
Altering an original spectrum’s resolution has a significant impact on spectral comparison search results. Resolution changes do not need to be visually obvious to influence search results. Simply changing from 0.96 cm -1 to 4 cm -1 data point spacing will alter the top three results in as many as one out of every five searches.
There is a point of diminishing returns with respect to how high the resolution must be for any specific project. Changes in the top three search results are not significant when the correct compound is in the database. A well-run query spectrum can identify an exact match at very low resolutions with most comparison algorithms [5] . However, when a match is not in the database, interpreting results becomes an issue.
Some software is designed only to display or analyze the top few results from spectral comparisons. Industry best practices recommend identifying separation in search result quality indices to aide in classification. Any resolution degradation clearly has some level of impact on quality index separation between search results.