Algorithmic and computational comparison of metagenome assemblers

ANU SHARMA; DWIJESH CHANDRA MISHRA; NEERAJ BUDHLAKOTI; ANIL RAI; SHASHI BHUSHAN LAL; SANJEEV KUMAR

doi:10.56093/ijas.v90i5.104327

Authors

ANU SHARMA ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
DWIJESH CHANDRA MISHRA ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
NEERAJ BUDHLAKOTI ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
ANIL RAI ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
SHASHI BHUSHAN LAL ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India
SANJEEV KUMAR ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110 012, India

https://doi.org/10.56093/ijas.v90i5.104327

Keywords:

Assembler, Algorithm, Computation, Metagenome

Abstract

Assembly of genome sequences of a microbial community is computationally challenging and complex than its single genome counterparts. Keeping in view the volume, diversity and varied abundance of different microbes, number of metagenome assemblers have been developed addressing specific associated computational issues mainly following De Bruijn Graph (DBG) and Overlap Layout Consensus (OLC) approaches. It is very pertinent to understand different computational approaches and issues of metagenomic assembly to further improve them with respect to time and computational resource requirements. Therefore, the main objective of this article is to discuss various metagenomics assemblers with respect to their development addressing major computational issues. Initially the computational perspective of single genome assemblers based on OLC and DBG graph construction approaches was described. This is followed by review of metagenomic assemblers with respect to the algorithm implemented for addressing issues in metagenome assembly. Further, performance of some of the popular metagenome assemblers were empirically evaluated with respect to their run time and memory requirements by taking diversified benchmark metagenomics data at ICAR-IASRI, New Delhi in 2019. It was concluded that performance of assemblers varied considerably on these datasets and there is further need to make an effort to develop new tools or to modify the existing ones using efficient algorithms and data structures.

Downloads

Download data is not yet available.

References

Afiahayati S K and Akakibara Y S. 2013. An extended genovo metagenomic assembler by incorporating paired-end information. Peer J 1: e196. DOI: https://doi.org/10.7717/peerj.196

Afiahayati S K and Sakakibara Y. 2015. MetaVelvet-SL: an extension of the Velvet assembler to a de novo metagenomic assembler utilizing supervised learning. DNA Research 22(1): 69–77. DOI: https://doi.org/10.1093/dnares/dsu041

Antipov D, Korobeynikov A, McLean J S and Pevzner P A. 2016. HYBRIDSPADES: an algorithm for hybrid assembly of short and long reads, Bioinformatics 32(7): 1009–1015. DOI: https://doi.org/10.1093/bioinformatics/btv688

Boisvert S, Raymond F, Godzaridis É, Laviolette F and Corbeil J. 2012. Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biology 13: R122. DOI: https://doi.org/10.1186/gb-2012-13-12-r122

Bokhari S H and Sauer J R. 2005. A parallel graph decomposition algorithm for DNA sequencing with nanopores. Bioinformatics 21(7): 889-896. DOI: https://doi.org/10.1093/bioinformatics/bti129

Boucher C, Bowe A, Gagie T, Puglisi S J and Sadakane K. 2015. Variable-order de Bruijn graphs. (In) Proceedings of the Data Compression Conference, Snowbird, Utah, USA, April 7-9, pp 383–392. DOI: https://doi.org/10.1109/DCC.2015.70

Breitwieser F P, Lu J and Salzberg S L. 2017. A review of methods and databases for metagenomic classification and assembly. Briefings in Bioinformatics bbx120. DOI: https://doi.org/10.1093/bib/bbx120

Chatterji S, Yamazaki I, Bai Z et al. 2008. CompostBin: a DNA composition-based algorithm for binning environmental shotgun reads. Research in Computational and Molecular Biology (LNCS), Vol 4955, pp 17-28. Vingron M and Wong L (Eds). Springer, Berlin, Heidelberg. DOI: https://doi.org/10.1007/978-3-540-78839-3_3

Chevreux B, Pfisterer T, Drescher B, Driesel A J, Müller et al. 2004. Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Research 14(6): 1147–59. DOI: https://doi.org/10.1101/gr.1917404

Chikhi R and Rizk G. 2013. Space-efficient and exact de Bruijn graph representation based on a Bloom filter. Algorithms for Molecular Biology 8: 22. DOI: https://doi.org/10.1186/1748-7188-8-22

Conway T C and Bromage A J. 2011. Succinct data structures for assembling large genomes. Bioinformatics 27(4): 479–86. DOI: https://doi.org/10.1093/bioinformatics/btq697

Dinh H and Rajasekaran S. 2011. A memory-efficient data structure representing exact-match overlap graphs with application for next-generation DNA assembly. Bioinformatics 27(14): 1901–7. DOI: https://doi.org/10.1093/bioinformatics/btr321

El-Metwally S, Hamza T, Zakaria M and Helmy M. 2013. Next- Generation sequence assembly: Four stages of data processing and computational challenges. PLoS Computational Biology 9(12): e1003345. DOI: https://doi.org/10.1371/journal.pcbi.1003345

Ghurye J S, Cepeda-espinoza V and Pop M. 2016. Metagenomic assembly: Overview, challenges and applications. Yale Journal of Biology and Medicine 89: 353–62.

Gonnella G and Kurtz S. 2012. Readjoiner: a fast and memory efficient string graph-based sequence assembler. BMC Bioinformatics 13: 82. DOI: https://doi.org/10.1186/1471-2105-13-82

Haider B, Ahn T, Bushnell B, Chai J, Copeland A and Pan C. 2014. Omega: an overlap-graph de novo assembler for metagenomics. Bioinformatics 30(19): 2717–22. DOI: https://doi.org/10.1093/bioinformatics/btu395

Huang X, Wang J, Aluru S, Yang SP and Hillier L. 2003. PCAP: a whole-genome assembly program. Genome Research 13(9): 2164–70. DOI: https://doi.org/10.1101/gr.1390403

Kashtan N et al., 2014. Single-cell genomics reveals hundreds of coexisting subpopulations in wild. Prochlorococcus Science 344: 416–20. DOI: https://doi.org/10.1126/science.1248575

Kececioglu J D and Myers E W. 1995. Combinatorial algorithms for DNA sequence assembly. Algorithmica 13(1-2): 7-51. DOI: https://doi.org/10.1007/BF01188580

Kleftogiannis D, Kalnis P and Bajic VB. 2013. Comparing memory-efficient genome assemblers on stand-alone and cloud infrastructures. PlosONE 8(9): 1–11. DOI: https://doi.org/10.1371/journal.pone.0075505

Laserson J, Jojic V and Koller D. 2011. Genovo: De Novo assembly for metagenomes. Journal of Computational Biology 18(3): 48–53 DOI: https://doi.org/10.1089/cmb.2010.0244

Leung H C M, Yiu S M and Yang B et al. 2011. A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio. Bioinformatics 27(11): 1489–95. DOI: https://doi.org/10.1093/bioinformatics/btr186

Li D, Liu C, Luo R, Sadakane K and Lam T. 2015. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31(10): 1674–6. DOI: https://doi.org/10.1093/bioinformatics/btv033

Li D, Luo R, Liu C, Leung C, Ting H et al. 2016. MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods 102: 3–11. DOI: https://doi.org/10.1016/j.ymeth.2016.02.020

Li R, Zhu H, Ruan J, Qian W, Fang X et al. 2010. De novo assembly of human genomes with massively parallel short read sequencing. Genome Research 20(2): 265. DOI: https://doi.org/10.1101/gr.097261.109

Lin Y and Pevzner P A. 2014. Manifold de Bruijn Graphs. Algorithms in Bioinformatics: 14th International Workshop Lecture Notes in Computer Science, Vol 8701, pp 296–310. DOI: https://doi.org/10.1007/978-3-662-44753-6_22

Frith M, Pedersen C N S(Eds). Springer, Berlin. Lin Y, Yuan J, Kolmogorov M, Shen M W, Chaisson M and Pevzner P A. 2016. Assembly of long error-prone reads using de Bruijn graphs. Proceeding of National Academy of Sciences of United States of America, Vol 113, pp E8396–E8405. Waterman M S (Eds). USA. DOI: https://doi.org/10.1073/pnas.1604560113

Liu C, Luo R and Lam T. 2014. GPU-accelerated BWT construction for large collection of short reads. arXiv , 1401: 7457.

Namiki T, Hachiya T, Tanaka H and Sakakibara Y. 2012. MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Research 40(20): e155. DOI: https://doi.org/10.1093/nar/gks678

Nurk S, Meleshko D, Korobeynikov A and Pevzner P A. 2017. metaSPAdes: a new versatile metagenomic assembler. Genome Research 27: 824–34. DOI: https://doi.org/10.1101/gr.213959.116

Mande S S, Mohammed M H and Ghosh T S. 2012. Classification of metagenomic sequences: methods and challenges. Briefing in Bioinformatics 13(6): 669–81. DOI: https://doi.org/10.1093/bib/bbs054

Miller J R, Koren S and Sutton G. 2010. Assembly algorithms for next-generation sequencing data. Genomics 95: 315–27. DOI: https://doi.org/10.1016/j.ygeno.2010.03.001

Myers E W, Sutton G G, Delcher A L, Dew I M, Fasulo D P. et al. 2000. A whole-genome assembly of Drosophila. Science 287(5461): 2196–204. DOI: https://doi.org/10.1126/science.287.5461.2196

Myers E W. 2005. The fragment assembly string graph. Bioinformatics 21(2): ii79-ii85. DOI: https://doi.org/10.1093/bioinformatics/bti1114

Peng Y, Leung H C M, Yiu S M and Chin F Y L. 2011. Meta-IDBA: a de Novo assembler for metagenomic data. Bioinformatics 27: i94–i101. DOI: https://doi.org/10.1093/bioinformatics/btr216

Peng Y, Leung H C M, Yiu S M and Chin F Y. 2012. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28: 1420–1428. DOI: https://doi.org/10.1093/bioinformatics/bts174

Pevzner P A, Tang H and Waterman M S. 2001. An Eulerian path approach to DNA fragment assembly. Proceedings of National Academy of Sciences of United States of America, Vol 98, pp 9748–9753. Waterman M S (Eds). USA. DOI: https://doi.org/10.1073/pnas.171285098

Pevzner P A and Tang H. 2001. Fragment assembly with double barreled data. Bioinformatics 17(1): S225-S233. DOI: https://doi.org/10.1093/bioinformatics/17.suppl_1.S225

Pevzner P, Tang H and Tesler G. 2004. De novo repeat classification and fragment assembly. Genome Research 14: 1786–96. DOI: https://doi.org/10.1101/gr.2395204

Prjibelski A D et al. 2014. ExSPAnder: a universal repeat resolver for DNA fragment assembly. Bioinformatics 30: i293–i301. DOI: https://doi.org/10.1093/bioinformatics/btu266

Ruby J G, Bellare P and DeRisi J L. 2013. PRICE: Software for the targeted assembly of components of (Meta). Genomic Sequence Data 3: 865-880. DOI: https://doi.org/10.1534/g3.113.005967

Sharon I, Kertesz M, HugL A, Pushkarev D, Blauwkamp T A et al. 2015. Accurate, multi-kb reads resolve complex populations and detect rare microorganisms. Genome Research 14(2): 55. DOI: https://doi.org/10.1101/gr.183012.114

Salikhov K, Sacomoto G and Kucherov G. 2014. Using cascading Bloom filters to improve the memory usage for de Brujin graphs. Algorithms for Molecular Biology 9(2): 48–65. DOI: https://doi.org/10.1186/1748-7188-9-2

Simpson J T and Durbin R. 2010. Efficient construction of an assembly string graph using the FM-index. Bioinformatics 26: 367–73. DOI: https://doi.org/10.1093/bioinformatics/btq217

Simpson J T, Wong K, Jackman S D, Schein J E, Jones S J M and Biro I. 2009. ABySS: A parallel assembler for short read sequence data. Genome Research 19: 1117–23. DOI: https://doi.org/10.1101/gr.089532.108

Teeling H, Waldmann J, Lombardot T et al. 2004. TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformatics 5: 163. DOI: https://doi.org/10.1186/1471-2105-5-163

Vasilinetc I, Prjibelski A D, Gurevich A, Korobeynikov A and Pevzner P. 2015. Assembling short reads from jumping libraries with large insert sizes. Bioinformatics 31: 3262–8. DOI: https://doi.org/10.1093/bioinformatics/btv337

Vollmers J, Wiegand S and Kaster A K. 2017. Comparing and evaluating metagenome assembly tools from a microbiologist’s perspective – Not only size matters! PLoSONE 12(1): e0169662. DOI: https://doi.org/10.1371/journal.pone.0169662

Wu Y W and Ye Y. 2011.A novel abundance-based algorithm for binning metagenomic sequences using l tuples. Journal of Computational Biology 18(3): 523–34. DOI: https://doi.org/10.1089/cmb.2010.0245

Ye C, Ma Z S, Cannon C H, Pop M and Yu D W. 2011. Sparse assembler: de novo assembly with the sparse de Bruijn graph. arXiv preprint arXiv, 1106.2603.

Zerbino D R and Birney E. 2008. Velvet: algorithms for de novo short read assembly using de bruijn graphs. Genome Research 18: 821–9. DOI: https://doi.org/10.1101/gr.074492.107