Application of the gene-encoded natural diversity components repository (GNDC) in traditional Chinese medicine research
-
YU Zhi-yin,
-
CHEN Wei,
-
LENG Liang,
-
SUN Dan,
-
LIU Hao,
-
GONG Rui-ze,
-
CONG Zhao-tong,
-
LIU Cheng-cheng,
-
YANG Xiu-ping,
-
BIN Hua-chao,
-
LU Jun,
-
ZHANG San-yin,
-
SONG Chi
-
Abstract
Natural components are crucial sources for drug discovery. Based on the central dogma of molecular biology, we proposed a novel paradigm for classifying natural products by categorizing natural components into "direct gene-encoded components" (including nucleic acids and peptides) and "indirect gene-encoded components" (encompassing primary metabolites and secondary metabolites). Utilizing multi-omics and artificial intelligence technologies, we systematically analyzed the nuclear and organellar genomes of 1 037 medicinal species sourced from eight authoritative global pharmacopeias, integrating multidimensional data resources to establish the gene-encoded natural diversity components repository (GNDC). This paper comprehensively describes: (1) The construction methodology of GNDC, including data integration strategies, standardized annotation pipelines, and rigorous quality control; (2) Core features and data scale: GNDC is currently the world's largest repository of medicinal natural products, housing over 234 million gene-encoded (directly or indirectly) natural components. It encompassed four specialized sub-databases: HerbalMDB for 2.32 million secondary metabolites, HerbalPDB for 229 million small peptides, HerbalRDB for 2.38 million small RNAs, and HerbalCDB for 0.26 million carbohydrates; (3) Exploring the application prospects of GNDC in the modernization of traditional medicine research. GNDC will provide an unprecedented expansive "chemical space" for drug discovery. It will powerfully drive a paradigm shift in drug development from "experience-oriented" to "big data-driven" approaches, offering a transformative framework for traditional medicine research.
-
-