Nuevo método de aprendizaje de redes bayesianas. Extensiones a Weka
Archivos
Fecha
2007-07-02
Autores
Moya Cruz, Iosvany
Título de la revista
ISSN de la revista
Título del volumen
Editor
Universidad Central “Marta Abreu” de Las Villas
Resumen
El trabajo de investigación se formula teniendo en cuenta las necesidades del Grupo de Bioinformática de la Universidad Central de las Villas, relacionadas con la solución de problemas de clasificación basados en redes bayesianas. Se plantea una metodología general para incorporar al Weka nuevos métodos o algoritmos de aprendizaje en redes bayesianas, haciéndose énfasis en el modo de implementar un nuevo algoritmo más ventajoso que los elaborados con anterioridad en el referido grupo y el propio Weka. El nuevo algoritmo, basado en significaciones de Chi-cuadrado posee cambios esenciales en la concepción de la segmentación de la población que determinan la topología de la red, además utiliza las facilidades del Weka para delimitar las problemáticas esenciales. En la investigación se muestra la utilidad de incorporar las facilidades operativas del software JavaBayes como interfaz del Weka y las ventajas que esto ofrece para el propósito de extender sus facilidades en el tratamiento de redes Bayesianas. Finalmente se validan los resultados con un problema de Bioinformática, relacionado concretamente con la identificación de verdaderos donors y acceptors, sitios clave en la localización de introns y exons y por tanto, de genes en un genoma, particularmente del genoma humano. Se logra demostrar la obtención de mejores resultados, respecto a los anteriormente obtenidos por el propio Grupo de Bioinformática, al menos con los algoritmos de aprendizaje basados en Árboles de Decisión. Se arriba a conclusiones y recomendaciones al respecto. Estas últimas prometen mejorar la eficiencia del nuevo algoritmo en este problema, con facilidades ya implementadas y solo pendientes de experimentación, fortalecen la idea de elaborar multiclasificadores para una solución integral del problema y sugieren su extensión al análisis de los genomas de otras especies.de interés especial, además del homo sapiens.
The investigation it is developed based on the necessities of the Bioinformatics Group in the Central University of Las Villas, related to the solution of classification problems based on Bayesian networks. A general methodology is postulated to extend Weka with new learning algorithms for Bayesian networks, emphasizing on the way of implementing a new and more advantageous algorithm than those previously developed by the already mentioned group and Weka itself. This new algorithm based on Chi-square probabilities introduces essential modifications to the conception of population segmentation that determine network’s topology; it also makes use of Weka’s features in order to delimit essential problems. The investigation shows the usefulness of extending JavaBayes software’s operational features as a new Weka interface and the resulting advantages for increasing Bayesian networks handling. Finally, the results are validated through a Bioinformatics problem, specifically with the identification of true donors and acceptors, key positions in the location of introns and exons and therefore, the location of genes in a genome, human genome in particular. It is proved the obtaining of better results in regard to those previously obtained by the Bioinformatics Group, at least with learning algorithms based on Decision Trees. Conclusions and recommendations on results are drawn. Both improve the new algorithm’s performance in this problem, with features already implemented and pending only for experimentation, foster the idea of implementing multiagent-solution of the problem and suggests to extend it to the analysi
The investigation it is developed based on the necessities of the Bioinformatics Group in the Central University of Las Villas, related to the solution of classification problems based on Bayesian networks. A general methodology is postulated to extend Weka with new learning algorithms for Bayesian networks, emphasizing on the way of implementing a new and more advantageous algorithm than those previously developed by the already mentioned group and Weka itself. This new algorithm based on Chi-square probabilities introduces essential modifications to the conception of population segmentation that determine network’s topology; it also makes use of Weka’s features in order to delimit essential problems. The investigation shows the usefulness of extending JavaBayes software’s operational features as a new Weka interface and the resulting advantages for increasing Bayesian networks handling. Finally, the results are validated through a Bioinformatics problem, specifically with the identification of true donors and acceptors, key positions in the location of introns and exons and therefore, the location of genes in a genome, human genome in particular. It is proved the obtaining of better results in regard to those previously obtained by the Bioinformatics Group, at least with learning algorithms based on Decision Trees. Conclusions and recommendations on results are drawn. Both improve the new algorithm’s performance in this problem, with features already implemented and pending only for experimentation, foster the idea of implementing multiagent-solution of the problem and suggests to extend it to the analysi
Descripción
Palabras clave
Algoritmo, Aprendizaje Automatizado, Redes Bayesianas, Extensiones a Weka, Grupo de Bioinformática, Universidad Central de Las Villas