阿基业, 何骏, 孙润彬. 代谢组学数据处理——主成分分析十个要点问题J. 药学学报, 2018,53(6): 929-937. doi: 10.16438/j.0513-4870.2017-1288
引用本文: 阿基业, 何骏, 孙润彬. 代谢组学数据处理——主成分分析十个要点问题J. 药学学报, 2018,53(6): 929-937. doi: 10.16438/j.0513-4870.2017-1288
A Ji-ye, HE Jun, SUN Run-bin. Multivariate statistical analysis for metabolomic data: the key points in principal component analysisJ. Acta Pharmaceutica Sinica, 2018,53(6): 929-937. doi: 10.16438/j.0513-4870.2017-1288
Citation: A Ji-ye, HE Jun, SUN Run-bin. Multivariate statistical analysis for metabolomic data: the key points in principal component analysisJ. Acta Pharmaceutica Sinica, 2018,53(6): 929-937. doi: 10.16438/j.0513-4870.2017-1288

代谢组学数据处理——主成分分析十个要点问题

Multivariate statistical analysis for metabolomic data: the key points in principal component analysis

  • 摘要: 代谢组学研究所产生多变量数据常采用主成分分析方法进行处理和评价,主成分分析涉及抽象的空间模型、复杂的理论计算、精细的数据转换,需要准确理解和把握主成分分析算法原理和特点。本文从主成分、主成分得分、主成分载荷、缩放与权重、偏最小二乘关联分析与判别分析、隐结构正交投影分析、隐结构双向正交投影分析、S-形图、共享与特有化合物结构分析、模型验证等十个方面,以简洁、易懂的语言介绍了代谢组学数据处理常用的主成分分析方法中的重点和难点问题,方便广大代谢组学研究人员更好地熟悉和了解代谢组学数据处理方法,以合理选择数据处理模式、规范数据处理程序、熟练解析数据处理结果,并得出可靠结论。

     

    Abstract: Metabolomics data contains multiple variables usually processed and evaluated by means of principal components analysis. The statistical analysis of the multivariate data is involved in abstract, elusory fitting for the model of hyperspace, complicated theoretical arithmetic and sophisticated transformation of the data matrix. It is crucially important to understand the arithmetic mechanism and the properties of the models fully. In this article, we reviewed the key and puzzling issues in principal components analysis of the metabolomics data, including the principal components, the scores and loadings of a principal components, scaling and weighting, partial least square projection to latent structures, partial least squares discriminant analysis, orthogonal projection to latent structure, orthogonal bidirectional projections to latent structures, S-plot, shared and unique structure plot, and the validation of the model. Hopefully, this article provides a better understanding of data processing mode, model selection, procedure standardization, and data interpretation for a reliable conclusion.

     

/

返回文章
返回