Back to Journal

SM Bioinformatics and Proteomics

Unsupervised Pattern Discovery in Biosequences Using Aligned Pattern Clustering

[ ISSN : 3068-0921 ]

Abstract
Details

Received: 21-Jul-2016

Accepted: 10-Aug-2016

Published: 12-Aug-2016

En-Shiun Annie Lee*, Antonio Sze-To, Andrew KC Wong and Daniel Stashuk 

Department of Systems Design Engineering, Centre for Pattern Analysis and Machine Intelligence, University of Waterloo, Canada

Corresponding Author:

En-Shiun Annie Lee, Department of Systems Design Engineering, Centre for Pattern Analysis and Machine Intelligence, University of Waterloo, Canada, Email: annie.lee@uwaterloo.ca

Abstract

Bioseqeunces such as protein, RNA and DNA, are made up of sequences of amino acids/nucleotides. The binding of biosequences among themselves is important for governing many biological processes of a living organism. The bindings are maintained by short segments of these biosequences, known as functional elements. Due to the importance of these functional elements, their presence is well conserved throughout evolution, allowing them to be discovered as patterns. As sequencing technologies continue to improve, the amount of biosequences is available in abundance. It is thus convenient and cost-effective if functional elements can be discovered from biosequences data computationally in an unsupervised manner without the need of prior knowledge or costly pre-preprocessing. In this paper, we aim to give a brief review of an unsupervised pattern discovery tool known as Aligned Pattern Clustering (or its software WeMineTM). It is developed to facilitate the discovery and analysis of patterns in biosequences, and has been applied in1) unsupervised identification of protein binding sites; 2) revealing functioning subgroup characteristics; and 3) identification of intra-protein, inter-protein and protein- DNA binding sites. In the era of ever-expanding biosequence data, we believe that this unsupervised pattern discovery approach would render a reliable, robust, and scalable method for scientific discovery and applications through leveraging the ever expanding volume of biosequences.

Citation

Lee EA, Sze-To A, Wong AKC and Stashuk D. Unsupervised Pattern Discovery in Biosequences Using Aligned Pattern Clustering. SM J Bioinform Proteomics. 2016; 1(2): 1008.