Big Data Supervised Pairwise Ortholog Detection in Yeasts
Fecha
2018-02-01
Autores
Galpert, Deborah
del Río García, Sara
Herrera, Francisco
Ancede-Gallardo, Evys
Antunes, Agostinho
Agüero Chapin, Guillermin
Título de la revista
ISSN de la revista
Título del volumen
Editor
IntechOpen
Resumen
Ortholog are genes in different species, evolving from a common ancestor. Ortholog
detection is essential to study phylogenies and to predict the function of unknown genes.
The scalability of gene (or protein) pairwise comparisons and that of the classification
process constitutes a challenge due to the ever-increasing amount of sequenced genomes.
Ortholog detection algorithms, just based on sequence similarity, tend to fail in classification,
specifically, in Saccharomycete yeasts with rampant paralogies and gene losses. In this
book chapter, a new classification approach has been proposed based on the combination
of pairwise similarity measures in a decision system that consider the extreme imbalance
between ortholog and non-ortholog pairs. Some new gene pair similarity measures are
defined based on protein physicochemical profiles, gene pair membership to conserved
regions in related genomes, and protein lengths. The efficiency and scalability of the
calculation of these measures are analyzed to propose its implementation for big data. In
conclusion, evaluated supervised algorithms that manage big and imbalanced data
showed high effectiveness in Saccharomycete yeast genomes.
Descripción
Palabras clave
ortholog detection, similarity measures, big data supervised classification, scalability