Abstract
Bioseqeunces such as protein, RNA and DNA, are made up of sequences of amino acids/nucleotides. The binding of biosequences among themselves is important for governing many biological processes of a living organism. The bindings are maintained by short segments of these biosequences, known as functional elements. Due to the importance of these functional elements, their presence is well conserved throughout evolution, allowing them to be discovered as patterns. As sequencing technologies continue to improve, the amount of biosequences is available in abundance. It is thus convenient and cost-effective if functional elements can be discovered from biosequences data computationally in an unsupervised manner without the need of prior knowledge or costly pre-preprocessing. In this paper, we aim to give a brief review of an unsupervised pattern discovery tool known as Aligned Pattern Clustering (or its software WeMineTM). It is developed to facilitate the discovery and analysis of patterns in biosequences, and has been applied in1) unsupervised identification of protein binding sites; 2) revealing functioning subgroup characteristics; and 3) identification of intra-protein, inter-protein and protein- DNA binding sites. In the era of ever-expanding biosequence data, we believe that this unsupervised pattern discovery approach would render a reliable, robust, and scalable method for scientific discovery and applications through leveraging the ever expanding volume of biosequences.
Citation
Lee EA, Sze-To A, Wong AKC and Stashuk D. Unsupervised Pattern Discovery in Biosequences Using Aligned Pattern Clustering. SM J Bioinform Proteomics. 2016; 1(2): 1008.