Reconstrucción de secuencias genéticas ancestrales usando CUDA
Fecha
2013-07-04
Autores
Fuentes Alba, Roddy
Título de la revista
ISSN de la revista
Título del volumen
Editor
Universidad Central “Marta Abreu” de Las Villas
Resumen
La reconstrucción de secuencias ancestrales (ASR) constituye en la actualidad una técnica
de creciente importancia en la biología molecular evolutiva, la genómica comparativa y la
bioinformática. Su desarrollo in silico ha sido ampliamente investigado y aplicado. La
necesidad de procesar volúmenes de datos, cada vez mayores y de forma más precisa, hace
de este, un proceso que muchas veces puede resultar muy costoso, por lo que la
investigación de nuevas técnicas aplicables a esta tarea resulta de gran valor.
La popularidad de las unidades de procesamiento gráfico (GPU) en los computadores
modernos brinda un vasto potencial de paralelismo. La computación sobre las unidades de
procesamiento gráfico de propósito general constituye actualmente un campo en desarrollo
e investigación, específicamente el uso de CUDA (Compute Unified Device Architecture)
con este propósito se ha extendido en los últimos tiempos. En este trabajo se implementan
en paralelo, usando CUDA, las fases más costosas de los algoritmos para: la determinación
de los parámetros del modelo evolutivo, el cálculo de las tasas de mutaciones por sitio, y la
reconstrucción de secuencias ancestrales, utilizados en el proceso global de ASR. Estos
fueron incorporados en una aplicación que realiza todo el proceso.
En las pruebas realizadas a la aplicación se utilizaron dos GPUs, una GeForce GT520 y una
QUADRO 2000, realizándose un total de 8 pruebas, con dos alineamientos de secuencias
de diferente tamaño. Los resultados obtenidos fueron validados utilizando una aplicación
secuencial existente con dicho propósito, con la cual también se comparan los tiempos de
ejecución, lográndose mejoras sustanciales de estos con el uso de CUDA.
The Ancestral Sequence Reconstruction, abbreviated as ASR, is nowadays an increasingly important technique in molecular evolutionary biology, comparative genomics and bioinformatics. Its development in silico has been widely investigated and applied. The need to process larger data volumes in a more accurate way makes this, a process that can often be very computationally expensive, so research into new techniques applicable to this task is of great value. The popularity of graphics processing units (GPU) in modern computers provides a vast potential for parallelism. Computation on general purpose GPUs is a field in active development and research, specifically the use of CUDA (Compute Unified Device Architecture) with this purpose has greatly increased in recent years. In this project, the most expensive phases of the algorithms used for: determining the parameters of the evolutionary model, calculation of mutation rates per site, and reconstruction of ancestral sequences, used in the overall process of ASR, are implemented in parallel using the technology CUDA. These algorithms were incorporated later in an application that performs the whole process. Two GPUs were used with the purpose of testing the application, a GeForce 520 and a QUADRO 2000, carrying out a total of eight tests with two sequence alignments of different sizes. The obtained results were validated using an existing sequential application for this purpose. This application was also used to compare the execution times. The tests revealed that the use of CUDA for these tasks substantially reduces the execution time of the ASR process.
The Ancestral Sequence Reconstruction, abbreviated as ASR, is nowadays an increasingly important technique in molecular evolutionary biology, comparative genomics and bioinformatics. Its development in silico has been widely investigated and applied. The need to process larger data volumes in a more accurate way makes this, a process that can often be very computationally expensive, so research into new techniques applicable to this task is of great value. The popularity of graphics processing units (GPU) in modern computers provides a vast potential for parallelism. Computation on general purpose GPUs is a field in active development and research, specifically the use of CUDA (Compute Unified Device Architecture) with this purpose has greatly increased in recent years. In this project, the most expensive phases of the algorithms used for: determining the parameters of the evolutionary model, calculation of mutation rates per site, and reconstruction of ancestral sequences, used in the overall process of ASR, are implemented in parallel using the technology CUDA. These algorithms were incorporated later in an application that performs the whole process. Two GPUs were used with the purpose of testing the application, a GeForce 520 and a QUADRO 2000, carrying out a total of eight tests with two sequence alignments of different sizes. The obtained results were validated using an existing sequential application for this purpose. This application was also used to compare the execution times. The tests revealed that the use of CUDA for these tasks substantially reduces the execution time of the ASR process.
Descripción
Palabras clave
Secuencias Genéticas Ancestrales (ASR), Reconstrucción, Unidades de Procesamiento Gráfico (GPU), Compute Unified Device Architecture (CUDA)