Desambiguación del sentido de las palabras
Fecha
2014-06-27
Autores
Nadal Pérez, Ariel
Santana González, Katherine
Título de la revista
ISSN de la revista
Título del volumen
Editor
Universidad Central “Marta Abreu” de Las Villas
Resumen
El lenguaje natural es ambiguo, por lo que muchas palabras pueden interpretarse de varias maneras, dependiendo del contexto en que se producen. De ahí que tiene gran desarrollo en la actualidad la identificación computacional del significado de las palabras en su contexto. Los métodos no supervisados para la desambiguación del sentido de las palabras tienen la ventaja que no requieren partir de textos previamente etiquetados; sin embargo, aún presentan deficiencias ya que logran bajos valores de efectividad de la desambiguación y realizan una gran cantidad de iteraciones para desambiguar. Por tal motivo, el objetivo general de la investigación consiste en desarrollar métodos más efectivos y eficientes que permitan desambiguar de manera no supervisada el sentido de las palabras, basados en agrupamientos y en la teoría de los conjuntos aproximados. Los principales resultados obtenidos son: (1) se identificaron los algoritmos existentes ya sean supervisados o no supervisados, que permitan desambiguar el sentido de las palabras, destacándose aquellos no supervisados basados en grafos; (2) se transformó el método propuesto por (Anaya-Sánchez et al., 2007) y se creó el método RST-Disambiguation; (3) se creó la biblioteca UnsupervisedWSD que integra los métodos modificados y el creado; y (4) se validó con el Semcor la exactitud y precisión del algoritmo propuesto evidenciando buenos resultados en la desambiguación de términos.
Natural language is ambiguous. Words can be interpreted in various ways depending on the context. Thus the push today for a computational identification of the meaning of words in their context. Unsupervised meaning-disambiguation methods have the advantage of not requiring pre-labeled texts as a starting point. However, they still present deficiencies, as they achieve relatively low values of disambiguation effectiveness and require a large number of iterations to disambiguate. Given the case, the overall objective of this research is to develop more effective and efficient methods to disambiguate the meaning of words, in an unsupervised manner and based in groupings and the rough sets theory. The results obtained were: (1) Identification of the existing algorithms, either supervised or unsupervised, which allow to disambiguate the meaning of the words, making emphasis on unsupervised, graph-based ones (2) Transformation of the method proposed by (Anaya-Sánchez et al., 2007) and creation of the method RST-Disambiguation (3) Creation of the UnsupervisedWSD library that integrates the modified methods and the created one (4) Validation, with Semcor, of the accuracy and precision of the proposed algorithm, with a resulting evidence of good results in the disambiguation of terms
Natural language is ambiguous. Words can be interpreted in various ways depending on the context. Thus the push today for a computational identification of the meaning of words in their context. Unsupervised meaning-disambiguation methods have the advantage of not requiring pre-labeled texts as a starting point. However, they still present deficiencies, as they achieve relatively low values of disambiguation effectiveness and require a large number of iterations to disambiguate. Given the case, the overall objective of this research is to develop more effective and efficient methods to disambiguate the meaning of words, in an unsupervised manner and based in groupings and the rough sets theory. The results obtained were: (1) Identification of the existing algorithms, either supervised or unsupervised, which allow to disambiguate the meaning of the words, making emphasis on unsupervised, graph-based ones (2) Transformation of the method proposed by (Anaya-Sánchez et al., 2007) and creation of the method RST-Disambiguation (3) Creation of the UnsupervisedWSD library that integrates the modified methods and the created one (4) Validation, with Semcor, of the accuracy and precision of the proposed algorithm, with a resulting evidence of good results in the disambiguation of terms
Descripción
Palabras clave
Lenguaje Natural, Métodos de Desambiguación, Lingüística Computacional