近日来自华东理工大学以及上海生物信息研究中心的研究人员在国际蛋白质组学顶级期刊《分子与细胞蛋白质组学》（Molecular & Cellular Proteomics，MCP，2010年SCI影响因子为8.35）上发表了题为“Feature-matching pattern-based support vector machines for robust peptide mass fingerprinting”的生物信息学研究论文。
Feature-matching pattern-based support vector machines for robust peptide mass fingerprinting
Youyuan Li1, Pei Hao, Siliang Zhang and Yixue Li
Peptide mass fingerprinting (PMF), regardless of becoming complementary to tandem mass spectrometry (MS/MS) for protein identification, is still the subject of in-depth study because of its higher sample throughput, higher level of specificity for single peptides and lower level of sensitivity to unexpected post-translational modifications. In this study, we propose, implement and evaluate a uniform. approach using support vector machines (SVMs) to incorporate individual concepts and conclusions for accurate PMF. We focus on the inherent attributes and critical issues of the theoretical spectrum, the experimental spectrum and spectrum alignment. Eighty-one feature-matching patterns (FMPs) derived from cleavage type, uniqueness and variable masses of theoretical peptides together with the intensity rank of experimental peaks were proposed to characterize the matching profile of the PMF procedure. We developed a new strategy to handle shared peak intensities and 440 parameters were generated to digitalize each FMP. A high performance for an evaluation dataset of 137 items was finally achieved by the optimal multi-criteria SVM approach, with 491 final features out of a feature vector of 35,640 normalized features through cross training and validating a publicly available "gold standard" PMF dataset of 1733 items. Compared to the Mascot, MS-Fit, ProFound and Aldente, the FMP algorithm has a greater ability to identify correct proteins with the highest values for sensitivity (82%), precision (97%) and F1-measure (89%). Several conclusions have been reached via this research. Firstly, inherent attributes showed comparable or even greater robustness than other explicit. Inherent attribute, peak intensity, should receive considerable attention during protein identification. Secondly, alignment between intense experimental peaks and properly digested, unique or non-modified theoretical peptides is very likely to occur in positive PMFs. Finally, normalization by several types of harmonic factors, including missed cleavages and mass modification, can make important contributions to the performance of the procedure.