JCSE, vol. 15, no. 2, pp.59-71, 2021
DOI: http://dx.doi.org/10.5626/JCSE.2021.15.2.59
Compression Techniques for DNA Sequences: A Thematic Review
Rosario Gilmary, Akila Venkatesan, and Govindasamy Vaiyapuri
Department of Computer Science and Engineering, Pondicherry Engineering College, Pondicherry, India
Department of Information Technology, Pondicherry Engineering College, Pondicherry, India
Abstract: Deoxyribonucleic acid (DNA) is the basic entity that carries genetic instructions. This information is used in the evolution,
progression, and improvement of all species. It is estimated that 10 CD-ROMs are required to store the genomic
data of an individual being. With the increase in DNA sequencing equipment, an extensive heap of genomic data is created.
The increase in DNA data in public databases is surpassing the rate of growth in storage space, thereby raising a
significant concern related to data storage, transmission, retrieval, and search. To reduce the data storage and storage
expense, lossless compression procedures were applied. Conventional compression methods are not proficient while
compressing the biological data. Hence, several unique and contemporary lossless compression mechanisms were used
to achieve improved compression ratio in biological sequences. Here, we scrutinize the diverse existing compression
procedures that are appropriate for the compression of DNA sequences. The efficiency of algorithms is compared in
terms of compression ratio, the ratio of the capacity of the compressed folder, and compression/decompression time.
Main challenges and future research directions in DNA compression are also presented. Emphasis has been given to special
references related to contemporary techniques.
Keyword:
DNA sequences; Lossless compression; Genomic sequence compression; Horizontal compression; Vertical compression
Full Paper: 207 Downloads, 1331 View
|