Creación y perfeccionamiento de herramientas para la minería de opinión en idioma Español
Date
2014-06-28
Authors
Borroto Escalante, Claudia Milissen
Journal Title
Journal ISSN
Volume Title
Publisher
Universidad Central “Marta Abreu” de Las Villas
Abstract
Las herramientas utilizadas actualmente en la Minería de Opinión (Amores)son generalmente enfocadas en el idioma Inglés, presentan dificultades en el procesamiento, estántodas muy dispersas y algunas en formatos obsoletos.Por lo tanto, se necesitan de otras herramientas para el idioma Español y que reporten mejores resultados. Además, sería de gran utilidad agrupar la mayor cantidad posible de herramientas en una biblioteca. De ahí que el objetivo general de este trabajo consiste en desarrollar una biblioteca en Java con nuevas herramientas que permitan realizar la minería de opinión, junto con algunas ya existentes actualizadas y perfeccionadas.Los principales resultados obtenidos son: (1) se modificó el Índice Intralingüístico transformando el formato del mismo para hacer más eficiente su uso en la MO; (2) se modificó el SentiWordNet 3.0 transformando el formato y aplicando cuatro etapas que permitieron obtener una mejor puntuación de los términos; (3) se creó el SpanishSentiWordNet que facilitará significativamente la MO en Español; (4) se creó la biblioteca PolarityDetection que encapsula los recursos creados y modificados facilitando la MO; y finalmente, (5) se realizaron experimentos con 200 opiniones positivas y 200 opiniones negativas con igual representatividad de los idiomas Inglés y Español y se obtuvo aproximadamente el 90% de opiniones bien clasificadas.
The currently used tools in Opinion Mining (OM) are generally focused on English language. They have difficulties in processing, are very scattered and some of them are provided in obsolete formats. Therefore, the need for other tools focused in the Spanish language that can provide better results become apparent. It would also be useful to group as many tools as possible in a library. Hence, the general objective of this work is to develop a Java library with new tools to perform opinion mining, along with some existing tools, which were updated and improved. The main results are: (1) the intralinguistic index was modified by means of transforming its format for a more efficient use in OM; (2) SentiWordNet 3.0 was modified transforming the existing format and applying four stages that allowed to obtain better score for the terms; (3) SpanishSentiWordNet was created, which will significantly facilitate OM in Spanish; (4) The library PolarityDetection was created which encapsulate the created and modified resources facilitating OM; and finally, (5) experiments with 200 positive reviews and 200 negative reviews with equal representation of the English and Spanish languages were made and they got about 90% of well classified reviews.
The currently used tools in Opinion Mining (OM) are generally focused on English language. They have difficulties in processing, are very scattered and some of them are provided in obsolete formats. Therefore, the need for other tools focused in the Spanish language that can provide better results become apparent. It would also be useful to group as many tools as possible in a library. Hence, the general objective of this work is to develop a Java library with new tools to perform opinion mining, along with some existing tools, which were updated and improved. The main results are: (1) the intralinguistic index was modified by means of transforming its format for a more efficient use in OM; (2) SentiWordNet 3.0 was modified transforming the existing format and applying four stages that allowed to obtain better score for the terms; (3) SpanishSentiWordNet was created, which will significantly facilitate OM in Spanish; (4) The library PolarityDetection was created which encapsulate the created and modified resources facilitating OM; and finally, (5) experiments with 200 positive reviews and 200 negative reviews with equal representation of the English and Spanish languages were made and they got about 90% of well classified reviews.
Description
Keywords
Perfeccionamiento de Herramientas, Minería de Opinión, Idioma Español, Biblioteca Java