GOBii, a scalable genomics data management system with rapid data extract times and integration with downstream genomic selection analysis pipelines

The Genomic Open-Source Breeding informatics initiative (GOBii) has built a genomics data management system that is highly scalable and has focused on data extract performance for large genomics data files. We have benchmarked several SQL and noSQL open-source data management systems with a view to managing large scale genomics data, and have determined that the HDF5 file system outperformed other data management systems both in loading and extract times. In order to also accommodate metadata management, we have designed and developed a hybrid system based on Postgres for sample and marker metadata management and HDF5 for the large genomics files. The HDF5 genomics files are stored in two different orientations to enable rapid extract in either sample or marker-fast formats. The system is flexible enough to be used across different crops and with diverse marker and sequence-based platforms. We are now working to integrate the genomics data extracts with downstream genomic selection applications in Galaxy.

Saved in:
Bibliographic Details
Main Authors: Nti-Addae, Yaw, Ulat, Victor Jun, Matthews, Dave, Sempere, Guilhem, Guignon, Valentin, Larmande, Pierre, Renner, Jon, Petel, Adrien, Jones, Elizabeth, Robbins, Kelly
Format: conference_item biblioteca
Language:eng
Published: PAG
Online Access:http://agritrop.cirad.fr/600979/
http://agritrop.cirad.fr/600979/1/ID600979.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!