Aplicación del modelo word2vec para el análisis de sentimientos en tweets en idioma inglés
Fecha
2019-06-20
Autores
Guerrero Alvarez, Ruddy
Título de la revista
ISSN de la revista
Título del volumen
Editor
Universidad Central “Marta Abreu” de Las Villas
Resumen
Redes sociales como Twetter y Facebook producen una gran cantidada de información que
es de gran importancia tanto para gobiernos como empresas. El modelo de representación espacio
vectorial, aunque es uno de los más utilizados, no es capaz de representar adecuadamente el
contenido semántico de los documentos, lo cual es importante para aplicaciones como la minería
de opinión. Modelos más recientemente desarollados, como word2vec son capaces de captar el
contexto de una palabra en un documento y de esta manera la similitud semántica de estas. En
el presente trabajo se realiza un estudio comparativo entre los modelos de representación tradicionales
y word2vec para determinar su efectividad en el análisis de sentimientos en tweets en
idioma inglés, con este objetivo se implementan tres transformaciones al modelo word2vec que
permiten utilizarse con algoritmos incrementales de clasificación, debido a que Twitter siguen un
modelo de flujo de datos, dado que los tweets se obtienen de manera online. Los resultados obtenidos
muestran que el modelo wor2vec en comparación con el modelo espacio vectorial obtiene
mejores resultados para los conjuntos de datos empleados en los experimentos.
Social networks such as Twetter and Facebook produce a large amount of information that is of great importance for governments and companies. The vector space model representation, although one of the most used, is not capable of adequately representing the semantic content of documents, which is important for applications such as opinion mining. More recently developed models, such as word2vec are able to capture the context of a word in a document and in this way the semantic similarity of these. In the present work a comparative study between the traditional models of representation and word2vec is made to determine its effectiveness in the analysis of feelings in tweets in English, with this objective three adaptations are implemented to the word2vec model that allow to be used with incremental classification algorithms, because Twitter follows a data flow model, since tweets are obtained online. The results obtained show that the wor2vec model compared to the vector space model obtained better results for the data sets used in the experiments.
Social networks such as Twetter and Facebook produce a large amount of information that is of great importance for governments and companies. The vector space model representation, although one of the most used, is not capable of adequately representing the semantic content of documents, which is important for applications such as opinion mining. More recently developed models, such as word2vec are able to capture the context of a word in a document and in this way the semantic similarity of these. In the present work a comparative study between the traditional models of representation and word2vec is made to determine its effectiveness in the analysis of feelings in tweets in English, with this objective three adaptations are implemented to the word2vec model that allow to be used with incremental classification algorithms, because Twitter follows a data flow model, since tweets are obtained online. The results obtained show that the wor2vec model compared to the vector space model obtained better results for the data sets used in the experiments.
Descripción
Palabras clave
Análisis de Sentimientos, Aprendizaje Automático, Word Embedding, Feelings Analysis, Machine Learning, Word Embedding