Short Communication
Yisong Liu, Qi Tang, Pi Cheng, Mingfei Zhu, Hui Zhang, Jiazhe Liu, Mengting Zuo, Chongyin Huang, Changqiao Wu, Zhiliang Sun, Zhaoying Liu. Whole-genome sequencing and analysis of the Chinese herbal plant Gelsemium elegans[J]. Acta Pharmaceutica Sinica B, 2020, 10(2): 374-382

Whole-genome sequencing and analysis of the Chinese herbal plant Gelsemium elegans
Yisong Liua,b,c, Qi Tangb, Pi Chengb, Mingfei Zhud, Hui Zhangd, Jiazhe Liud, Mengting Zuoc, Chongyin Huangc, Changqiao Wub, Zhiliang Suna,b,c, Zhaoying Liua,b,c
a Hunan Engineering Technology Research Center of Veterinary Drugs, Hunan Agricultural University, Changsha 410128, China;
b Hunan Key Laboratory of Traditional Chinese Veterinary Medicine, Hunan Agricultural University, Changsha 410128, China;
c College of Veterinary Medicine, Hunan Agricultural University, Changsha 410128, China;
d Nextomics Biosciences Institute, Wuhan 430000, China
Gelsemium elegans (G. elegans) (2n=2x=16) is genus of flowering plants belonging to the Gelsemicaeae family. Here, a high-quality genome assembly using the Oxford Nanopore Technologies (ONT) platform and high-throughput chromosome conformation capture techniques (Hi-C) were used. A total of 56.11 Gb of raw GridION X5 platform ONT reads (6.23 Gb per cell) were generated. After filtering, 53.45 Gb of clean reads were obtained, giving 160×coverage depth. The de novo genome assemblies 335.13 Mb, close to the 338 Mb estimated by k-mer analysis, was generated with contig N50 of 10.23 Mb. The vast majority (99.2%) of the G. elegans assembled sequence was anchored onto 8 pseudo-chromosomes. The genome completeness was then evaluated and 1338 of the 1440 conserved genes (92.9%) could be found in the assembly. Genome annotation revealed that 43.16% of the G. elegans genome is composed of repetitive elements and 23.9% is composed of long terminal repeat elements. We predicted 26,768 protein-coding genes, of which 84.56% were functionally annotated. The genomic sequences of G. elegans could be a valuable source for comparative genomic analysis in the Gelsemicaeae family and will be useful for understanding the phylogenetic relationships of the indole alkaloid metabolism.
Key words:    Gelsemium elegans    Nanopore sequencing    Genome assembly    Hi-C    Genome annotation    Monoterpene indole alkaloid   
Received: 2019-04-15     Revised: 2019-06-27
DOI: 10.1016/j.apsb.2019.08.004
Funds: This study was financially supported by Hunan Provincial Natural Science Foundation of China (grant 2017JJ1017), National Key R&D Program of China (grant 2017YFD0501403), National Natural Science Foundation of China (grant 31400275), and Hunan Provincial Natural Science Foundation of China (2018JJ2172).
Corresponding author: Zhiliang Sun, Zhaoying Liu;
Author description:
PDF(KB) Free
Yisong Liu
Qi Tang
Pi Cheng
Mingfei Zhu
Hui Zhang
Jiazhe Liu
Mengting Zuo
Chongyin Huang
Changqiao Wu
Zhiliang Sun
Zhaoying Liu

1. Ornduff R. The systematics and breeding system of Gelsemium (Loganiaceae). J Arnold Arbor 1970;51:1-17.
2. Sun CK, Kimura T, But PP, Guo JX. International collation of traditional and folk medicine, Northeast Asia, part III. London:World Scientific; 1998.
3. Rujjanawate C, Kanjanapothi D, Panthong A. Pharmacological effect and toxicity of alkaloids from Gelsemium elegans Benth. J Ethnopharmacol 2003;89:91-5.
4. Xu Y, Qiu HQ, Liu H, Liu M, Huang ZY, Yang J, et al. Effects of koumine, an alkaloid of Gelsemium elegans Benth., on inflammatory and neuropathic pain models and possible mechanism with allopregnanolone. Pharmacol Biochem Behav 2012;101:504-14.
5. Liu YC, Li L, Pi C, Sun ZL, Wu Y, Liu ZY. Fingerprint analysis of Gelsemium elegans by HPLC followed by the targeted identification of chemical constituents using HPLC coupled with quadrupole-time-offlight mass spectrometry. Fitoterapia 2017;121:94-105.
6. Liu YC, Xiao S, Yang K, Ling L, Sun ZL, Liu ZY. Comprehensive identification and structural characterization of target components from Gelsemium elegans by high-performance liquid chromatography coupled with quadrupole time-of-flight mass spectrometry based on accurate mass databases combined with MS/MS spectra. J Mass Spectrom 2017;52:378-96.
7. Xu YK, Liao SG, Na Z, Hu HB, Li Y, Luo HR. Gelsemium alkaloids, immunosuppressive agents from Gelsemium elegans. Fitoterapia 2012;83:1120-4.
8. Lu JM, Qi ZR, Liu GL, Shen ZY, Tu KC. Effect of Gelsemium elegans Benth injection on proliferation of tumor cells. Chin J Cancer 1990;9. 472-474, 477.
9. Cai J, Lei LS, Chi DB. Antineoplastic effect of koumine in mice bearing H22 solid tumor. J South Med Univ 2009;29. 1851-1852, 1856.
10. Zhang JY, Wang YX. Gelsemium analgesia and the spinal glycine receptor/allopregnanolone pathway. Fitoterapia 2015;100:35-43.
11. Zhang LL, Wang ZR, Huang CQ, Zhang ZY, Lin JM. Extraction and separation of koumine from Gelsemium alkaloids. J First Mil Med Univ 2004;24:1006-8.
12. Liu M, Shen J, Liu H, Xu Y, Su YP, Yang J, et al. Gelsenicine from Gelsemium elegans attenuates neuropathic and inflammatory pain in mice. Biol Pharm Bull 2011;34:1877-80.
13. Yi JE, Yuan H. Research and development on enterotoxin of Gelsemium elegans benth. J Hunan Environ-Biol Polytech 2003;9:26-30.
14. Tan J, Qiu C, Zhen L. Analgesic effect and no physical dependence of Gelsemium elegans benth. Pharmacol Clin Chin Mater Med 1988;4:24-8.
15. Jain M, Koren S, Miga KH, Quick J, Rand AC, Sasani TA, et al. Nanopore sequencing and assembly of a human genome with ultralong reads. Nat Biotechnol 2018;36:338-45.
16. Deschamps S, Zhang Y, Llaca V, Ye L, Sanyal A, King M, et al. A chromosome-scale assembly of the Sorghum genome using nanopore sequencing and optical mapping. Nat Commun 2018;9:4844.
17. Wang M, Tu L, Yuan D, Zhu D, Shen C, Li J, et al. Reference genome sequences of two cultivated allotetraploid cottons, Gossypium hirsutum and Gossypium barbadense. Nat Genet 2019;51:224-9.
18. Ghurye J, Pop M, Koren S, Bickhart D, Chin CS. Scaffolding of long read assemblies using long range contact information. BMC Genomics 2017;18:527.
19. Senol Cali D, Kim JS, Ghose S, Alkan C, Mutlu O. Nanopore sequencing technology and tools for genome assembly:computational analysis of the current state, bottlenecks and future directions. Brief Bioinform 2019;20:1542-59.
20. Marçais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 2011;27:764-70.
21. Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu:scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 2017;27:722-36.
22. WTDBG package. Available from:[accessed 10.01.18].
23. Loman NJ, Quick J, Simpson JT. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat Methods 2015;12:733-5.
24. Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, et al. Pilon:an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 2014;9:e112963.
25. Burton JN, Adey A, Patwardhan RP, Qiu R, Kitzman JO, Shendure J. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat Biotechnol 2013;31:1119-25.
26. Bickhart DM, Rosen BD, Koren S, Sayre BL, Hastie AR, Chan S, et al. Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome. Nat Genet 2017;49:643-50.
27. Zhang L, Li X, Ma B, Gao Q, Du H, Han Y, et al. The tartary buckwheat genome provides insights into rutin biosynthesis and abiotic stress tolerance. Mol Plant 2017;10:1224-37.
28. Avni R, Nave M, Barad O, Baruch K, Twardziok SO, Gundlach H, et al. Wild emmer genome architecture and diversity elucidate wheat evolution and domestication. Science 2017;357:93-7.
29. Mascher M, Gundlach H, Himmelbach A, Beier S, Twardziok SO, Wicker T, et al. A chromosome conformation capture ordered sequence of the barley genome. Nature 2017;544:427-33.
30. Belton JM, McCord RP, Gibcus JH, Naumova N, Zhan Y, Dekker J. Hi-C:a comprehensive technique to capture the conformation of genomes. Methods 2012;58:268-76.
31. Chen S, Zhou Y, Chen Y, Gu J. fastp:an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 2018;34:i884-90.
32. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods 2012;9:357-9.
33. Kim D, Langmead B, Salzberg SL. HISAT:a fast spliced aligner with low memory requirements. Nat Methods 2015;12:357-60.
34. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO:assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 2015;31:3210-2.
35. Thiel T, Michalek W, Varshney R, Graner A. Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Theor Appl Genet 2003;106:411-22.
36. Tarailo-Graovac M, Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics 2009;25. 4.10.1-14.
37. Bao W, Kojima KK, Kohany O. Repbase update, a database of repetitive elements in eukaryotic genomes. Mobile DNA 2015;6:11.
38. Nussbaumer T, Martis MM, Roessner SK, Pfeifer M, Bader KC, Sharma S, et al. MIPS PlantsDB:a database framework for comparative plant genome research. Nucleic Acids Res 2013;41:D1144-51.
39. Kalvari I, Argasinska J, Quinones-Olvera N, Nawrocki EP, Rivas E, Eddy SR, et al. Rfam 13.0:shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res 2018;46:D335-42.
40. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+:architecture and applications. BMC Bioinformatics 2009;10:421.
41. Lowe TM, Eddy SR. tRNAscan-SE:a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 1997; 25:955-64.
42. Lagesen K, Hallin P, Rødland EA, Staerfeldt HH, Rognes T, Ussery DW. RNAmmer:consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 2007;35:3100-8.
43. Birney E, Durbin R. Using GeneWise in the Drosophila annotation experiment. Genome Res 2000;10:547-8.
44. Stanke M, Steinkamp R, Waack S, Morgenstern B. AUGUSTUS:a web server for gene finding in eukaryotes. Nucleic Acids Res 2004;32:W309-12.
45. Blanco E, Parra G, Guigó R. Using geneid to identify genes. Curr Protoc Bioinformatics 2007;18. 4.3.1-28.
46. Majoros WH, Pertea M, Salzberg SL. TigrScan and GlimmerHMM:two open source ab initio eukaryotic gene-finders. Bioinformatics 2004;20:2878-9.
47. Bromberg Y, Rost B. SNAP:predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res 2007;35:3823-35.
48. Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol 2008;9:R7.
49. TransposonPSI. an application of PSI-Blast to mine (Retro-)transposon ORF homologies. Available from:
50. Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M. KAAS:an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res 2007;35:W182-5.
51. Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R, et al. InterProScan:protein domains identifier. Nucleic Acids Res 2005;33:W116-20.
52. Hoopes GM, Hamilton JP, Kim J, Zhao D, Wiegert-Rininger K, Crisovan E, et al. Genome assembly and annotation of the medicinal plant Calotropis gigantea, a producer of anticancer and antimalarial cardenolides. G3 Genes Genom Genet 2018;8:385-91.
53. Wei C, Yang H, Wang S, Zhao J, Liu C, Gao L, et al. Draft genome sequence of Camellia sinensis var. sinensis provides insights into the evolution of the tea genome and tea quality. Proc Natl Acad Sci U S A 2018;115:E4151-8.
54. Liu X, Liu Y, Huang P, Ma Y, Qing Z, Tang Q, et al. The genome of medicinal plant Macleaya cordata provides new insights into benzylisoquinoline alkaloids metabolism. Mol Plant 2017;10:975-89.
55. Zapata L, Ding J, Willing EM, Hartwig B, Bezdan D, Jiao WB, et al. Chromosome-level assembly of Arabidopsis thaliana Ler reveals the extent of translocation and inversion polymorphisms. Proc Natl Acad Sci U S A 2016;113:E4052-60.
56. Wang X, Wang H, Wang J, Sun R, Wu J, Liu S, et al. The genome of the mesopolyploid crop species Brassica rapa. Nat Genet 2011;43:1035-9.
57. Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, Casagrande A, et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 2007;449:463-7.
58. Mahesh HB, Shirke MD, Singh S, Rajamani A, Hittalmani S, Wang GL, et al. Indica rice genome assembly, annotation and mining of blast disease resistance genes. BMC Genomics 2016;17:242.
59. VanBuren R, Bryant D, Edger PP, Tang H, Burgess D, Challabathula D, et al. Single-molecule sequencing of the desiccationtolerant grass Oropetium thomaeum. Nature 2015;527:508-11.
60. Li L, Stoeckert Jr CJ, Roos DS. OrthoMCL:identification of ortholog groups for eukaryotic genomes. Genome Res 2003;13:2178-89.
61. Edgar RC. MUSCLE:multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004;32:1792-7.
62. Talavera G, Castresana J. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 2007;56:564-77.
63. Stamatakis A. RAxML version 8:a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 2014;30:1312-3.
64. Yang Z. PAML 4:phylogenetic analysis by maximum likelihood. Mol Biol Evol 2007;24:1586-91.
65. Sen S, Sahu NP, Mahato SB. Flavonol glycosides from Calotropis gigantea. Phytochemistry 1992;31:2919-21.
66. Wang HT, Yang YC, Mao X, Wang Y, Huang R. Cytotoxic gelsedinetype indole alkaloids from Gelsemium elegans. J Asian Nat Prod Res 2017;20:321-7.
67. Mo P, Zhu Y, Liu X, Zhang A, Yan C, Wang D. Identification of two phosphatidylinositol/phosphatidylcholine transfer protein genes that are predominately transcribed in the flowers of Arabidopsis thaliana. J Plant Physiol 2007;164:478-86.
68. De Bie T, Cristianini N, Demuth JP, Hahn MW. CAFE:a computational tool for the study of gene family evolution. Bioinformatics 2006;22:1269-71.
69. Seberg O, Droege G, Barker K, Coddington JA, Funk V, Gostel M, et al. Global genome biodiversity network:saving a blueprint of the tree of life d a botanical perspective. Ann Bot 2016;118:393-9.
70. Franke J, Kim J, Hamilton JP, Zhao D, Pham GM, Wiegert-Rininger K, et al. Gene discovery in Gelsemium highlights conserved gene clusters in monoterpene indole alkaloid biosynthesis. Chembiochem 2019;20:83-7.
71. Stavrinides AK, Tatsis EC, Dang TT, Caputi L, Stevenson CE, Lawson DM, et al. Discovery of a short-chain dehydrogenase from Catharanthus roseus that produces a new monoterpene indole alkaloid. Chembiochem 2018;19:940-8.