集成的软件环境用于基于NMR的代谢组学分析

会议Poster

An Integrated Software Environment for NMR-Based Metabolomics Data Analysis - Leap to Biomarker Identification from PCA

Introduction

There has been a maturation in computer technologies that convert the instrumental measurements of patient samples to a collection of spectra and then prepare the spectra for sophisticated multivariate analysis tools.  The next steps, recognition of altered concentrations of metabolites and identification of the affected metabolic pathways, remain a challenge.   We have developed two technologies that combined with an integrated informatics system to accomplish this critical step.

Using this integrated informatics approach, raw NMR spectra are processed, and the processed NMR spectra are transferred to an integrated chemometrics data analysis tool.  The outcome is a Principal Component Analysis (PCA) of the spectra, which includes a graphical display called a Scores Plot.  In the Scores Plot, each NMR spectrum from a metabolomics experiment is represented as a single point in the plot.  Similar spectra will appear nearer to one another in a Scores Plot, and dissimilar spectra will appear further away from one another.  A related diagnostic graphical display called a Loadings Plot indicates the regions of the NMR spectra that describe the most variance in the NMR data set.  The Scores Plot is related to the Loadings Plot in that the Loadings Plot describes the spectral regions that are highly variable in the experimental samples and therefore where in the Scores Plot each point will appear. 

A Loadings Plot can be used as a query to search a reference metabolite database to identify a list of putative  metabolites  whose  concentration  is  changing  throughout  the  course  of  the  experiment. Alternatively, the reference metabolite database can be projected into the same space as the PCA Loadings Plot and filtered to identify the putative metabolites visually in the Loadings Plot.  In this second approach, the list of filtered metabolites can be used to create a rank-ordered list of putative biological
pathways based on the co-occurrence of the filtered metabolites in each pathway.   Direct internet links from  the  rank-ordered  list  to  the  KEGG  database  allow  the  researcher  to  explore  each  pathway  in more detail.

Conclusion

It is important to obtain a PCA where classes of samples are clearly separable.  The subsequent peak search or database projection are viable methods to discover compounds which contribute to the class separation.  This system completes the steps to go from a PCA to biomarker(s) identification and finally, to suggest the possible metabolic pathways.