Back to Journal

SM Journal of Biometrics & Biostatistics

Interactive Big Data Analytics Platform for Healthcare and Clinical Services

[ ISSN : 2573-5470 ]

Abstract
Details

Received: 31-May-2018

Accepted: 11-Jun-2018

Published: 15-Jun-2018

Chrimes D1*, Kuo MH2, Moa B3 and Kushniruk AW2

1Database Integration and Management, IMIT Quality Systems, Vancouver Island Health Authority, Canada

2School of Health Information Science, University of Victoria, Canada

3Compute Canada/WestGrid/University Systems, University of Victoria, Canada

Corresponding Author:

Chrimes D, Database Integration and Management, IMIT Quality Systems, Vancouver Island Health Authority, Canada

Keywords

Adaptable Architectures; Big Data; Data Mining; Distributed Filing System; Distributed Data Structures; Healthcare Informatics; Hospital Systems; Metadata; Relational Database

Abstract

The study objective is to establish an interactive Big Data Platform Analytics (BDA) platform with Hadoop/ MapReduce technologies distributed over HBase (key-value NoSQL database storage) and to generate hospitalization metadata on the platform. Performance tests retrieved results from simulated patient records with Apache tools in Hadoop’s ecosystem. At optimized iteration, the Hadoop distributed file system (HDFS) ingestion with HBase exhibited sustained database integrity over hundreds of iterations; however, the platform required a month to complete its bulk loading via MapReduce to HBase and validate queries required a month. To generate HBase datafiles, the framework took a week for one billion (10TB) files and a month for three billion (30TB) files. Inconsistencies of MapReduce limited the capacity to generate/replicate data efficiently. Dependencies among the data elements system could be expressed via “family” primary keys set in code via Apache Phoenix as database generator. Modeling a hospital system based on a patient encounter-centric database was very difficult because data profiles were fully representative of complex relationships. Apache Spark and Apache Drill showed high performance. Recommendations regarding key-value storage should be considered when analyzing large volumes of healthcare data securely.

Citation

Chrimes D, Kuo MH, Moa B and Kushniruk AW. Interactive Big Data Analytics Platform for Healthcare and Clinical Services. SM J Biometrics Biostat. 2018; 3(2): 1030.