科学家发明PMF新算法

上一篇 / 下一篇  2011-08-28 12:11:25

近日来自华东理工大学以及上海生物信息研究中心研究人员在国际蛋白质组学顶级期刊《分子与细胞蛋白质组学》(Molecular & Cellular Proteomics,MCP,2010年SCI影响因子为8.35)上发表了题为“Feature-matching pattern-based support vector machines for robust peptide mass fingerprinting”的生物信息学研究论文。

文章的通讯作者是华东理工大学的张嗣良教授,其早年毕业于华东华工学院抗生素制造工学专业,长期以来以微生物反应与发酵工程为研究对象,取得一系列生物医药产品生产技术的重大突破,曾三次获得国家科技进步二等奖和多次省部级科技进步奖项,为推动我国生物医药等行业的技术进步做出了重大贡献。发表论文100多篇,其中SCI收录20余篇。

作为蛋白质组学研究领域一种非常重要的蛋白质鉴定方法,肽质量指纹图谱(Peptide mass fingerprinting,PMF)和串联质谱(Tandem MS,MS/MS)相比,具有高通量、对单肽的高度特异性、对蛋白质翻译后修饰的低敏感度等特点。本研究着眼于提高PMF算法的精确度和稳定性,将蛋白质鉴定过程区分为独立而又关联的三个对象,针对每个对象的特定属性和关键问题,共分解出35640个特征;利用机器学习方法—支持向量机—训练1733项标准数据集;与现有四种PMF鉴定算法(Mascot,MS-Fit,ProFound 和 Aldent)相比,新算法在灵敏度、精确度和稳定性上均获得显著提高;并在新算法理论基础上建立了专用蛋白质鉴定网站。审稿人认为该项研究观念新颖,具有很好的应用性。

本研究得到了国家973项目“生化反应过程放大原理与方法” (2007CB714303)和生物反应器工程国家重点实验室开放课题资助。(

DOI:10.1074/mcp.M110.005785
PMC:
PMID:

Feature-matching pattern-based support vector machines for robust peptide mass fingerprinting

Youyuan Li1, Pei Hao, Siliang Zhang and Yixue Li

Peptide mass fingerprinting (PMF), regardless of becoming complementary to tandem mass spectrometry (MS/MS) for protein identification, is still the subject of in-depth study because of its higher sample throughput, higher level of specificity for single peptides and lower level of sensitivity to unexpected post-translational modifications. In this study, we propose, implement and evaluate a uniform. approach using support vector machines (SVMs) to incorporate individual concepts and conclusions for accurate PMF. We focus on the inherent attributes and critical issues of the theoretical spectrum, the experimental spectrum and spectrum alignment. Eighty-one feature-matching patterns (FMPs) derived from cleavage type, uniqueness and variable masses of theoretical peptides together with the intensity rank of experimental peaks were proposed to characterize the matching profile of the PMF procedure. We developed a new strategy to handle shared peak intensities and 440 parameters were generated to digitalize each FMP. A high performance for an evaluation dataset of 137 items was finally achieved by the optimal multi-criteria SVM approach, with 491 final features out of a feature vector of 35,640 normalized features through cross training and validating a publicly available "gold standard" PMF dataset of 1733 items. Compared to the Mascot, MS-Fit, ProFound and Aldente, the FMP algorithm has a greater ability to identify correct proteins with the highest values for sensitivity (82%), precision (97%) and F1-measure (89%). Several conclusions have been reached via this research. Firstly, inherent attributes showed comparable or even greater robustness than other explicit. Inherent attribute, peak intensity, should receive considerable attention during protein identification. Secondly, alignment between intense experimental peaks and properly digested, unique or non-modified theoretical peptides is very likely to occur in positive PMFs. Finally, normalization by several types of harmonic factors, including missed cleavages and mass modification, can make important contributions to the performance of the procedure.


TAG:

 

评分:0

我来说两句

显示全部

:loveliness::handshake:victory::funk::time::kiss::call::hug::lol:'(:Q:L;P:$:P:o:@:D:(:)

日历

« 2024-04-28  
 123456
78910111213
14151617181920
21222324252627
282930    

数据统计

  • 访问量: 6923
  • 日志数: 186
  • 建立时间: 2011-08-24
  • 更新时间: 2013-01-30

RSS订阅

Open Toolbar