Medidas de similitud biomolecular k-arias basadas en la Teoría de la Generalizabilidad
Fecha
2013-07-04
Autores
Castillo Fernández, Emilia María
Título de la revista
ISSN de la revista
Título del volumen
Editor
Universidad Central “Marta Abreu” de Las Villas
Resumen
Se introduce un marco formal acerca de la Teoría de la Generalizabilidad (TG) en
Quimioinformática para mejorar el análisis matemático e interpretación de relaciones de similitud
biomolecular. Las medidas TG incorporan en sus fórmulas información en cuanto a transformaciones
admisibles de las escalas métricas de medición, que permiten modelar las semejanzas biomoleculares
como similitudes entre vectores de representación transformados funcionalmente. También, las
similitudes basadas en la TG tienen distribuciones estadísticas asociadas, que permiten contrastar si
un puntaje de similitud en particular es significativo o no. La utilidad de las medidas de TG por pares
para problemas Quimioinformáticos se muestra a través del análisis estadístico de su desempeño
relativo en experimentos de búsqueda de similitud que comprenden un amplio rango de medidas de
proximidad reportadas previamente, la selección de conjuntos de datos para validación y aplicación,
descriptores numéricos informativos, una fase de selección de rasgos, y una métrica de desempeño
adecuada. Los resultados indican que las medidas basadas en la TG se desempeñan comparable o
superiormente en algunos casos a las medidas reportadas, demostrando así su efectividad en la
recuperación temprana. Su interpretación y otras aplicaciones potenciales son consideradas
posteriormente. Después, se presentan las medidas TG multivariadas como la generalización natural
del caso bivariado para múltiples biomoléculas. Se muestra que estas son equivalentes a la media
ponderada de sus contrapartes binarias correspondientes, siendo tres de ellas completamente nuevas
en la literatura. Luego, las medidas TG multivariadas son calculadas en conjuntos de datos
modificados para evaluar su poder discriminatorio entre compuestos activos e inactivos. Su
integración a herramientas operativas de cribado virtual y generalización a relaciones polinómicas de
grado superior son discutidas en la parte final de este trabajo.
A formal frame on Generalizability Theory (GT) is introduced in Cheminformatics in order to improve the mathematical analysis and interpretation of biomolecular similarity relationships. GT-measures incorporate information in their formulas concerning the admissible transformations of metric scales of measurement, so they allow modeling the biomolecular resemblances as similarities among functionally transformed representation vectors. Also, GT-based similarities have associated statistical distribution functions, so they allow contrasting whether a particular similarity score is significant. The usefulness of 2-way GT-measures for cheminformatic problems is shown through the statistical analysis of their relative performance in similarity searching experiments comprising a broad range of previously reported proximity measures, medium-to-large screening data sets for validation and application, informative numerical descriptors, a feature selection stage, and a suitable performance metric. Results indicate that GT-based measures perform comparably or superiorly in some cases to the state-of-the-art measures, thus demonstrating their effectiveness at the early retrieval. Their interpretation and other potential applications are further considered. Afterwards, k-way GT-measures are presented as the natural generalization to the bivariate case for multiple biomolecules. They are shown to be equivalent to the weighted average of their corresponding 2-way counterparts, three of them being entirely new in the literature. Later, k-way GT-measures are calculated for modified data sets in order to assess their discriminatory power between active and inactive compounds. Their integration into operative virtual screening tools and generalization to higher-degree polynomial relationships are discussed in the last part of this work.
A formal frame on Generalizability Theory (GT) is introduced in Cheminformatics in order to improve the mathematical analysis and interpretation of biomolecular similarity relationships. GT-measures incorporate information in their formulas concerning the admissible transformations of metric scales of measurement, so they allow modeling the biomolecular resemblances as similarities among functionally transformed representation vectors. Also, GT-based similarities have associated statistical distribution functions, so they allow contrasting whether a particular similarity score is significant. The usefulness of 2-way GT-measures for cheminformatic problems is shown through the statistical analysis of their relative performance in similarity searching experiments comprising a broad range of previously reported proximity measures, medium-to-large screening data sets for validation and application, informative numerical descriptors, a feature selection stage, and a suitable performance metric. Results indicate that GT-based measures perform comparably or superiorly in some cases to the state-of-the-art measures, thus demonstrating their effectiveness at the early retrieval. Their interpretation and other potential applications are further considered. Afterwards, k-way GT-measures are presented as the natural generalization to the bivariate case for multiple biomolecules. They are shown to be equivalent to the weighted average of their corresponding 2-way counterparts, three of them being entirely new in the literature. Later, k-way GT-measures are calculated for modified data sets in order to assess their discriminatory power between active and inactive compounds. Their integration into operative virtual screening tools and generalization to higher-degree polynomial relationships are discussed in the last part of this work.
Descripción
Palabras clave
Descubrimiento de Fármacos, Teoría de la Generalizabilidad, Técnicas de Cribado Virtual, Quimioinformática, Análisis Matemático, Relaciones de Similitud