Detección de spam utilizando clasificadores incrementales
Fecha
2018-06-20
Autores
Díaz Benítez, Asiel
Título de la revista
ISSN de la revista
Título del volumen
Editor
Universidad Central “Marta Abreu” de Las Villas
Resumen
El correo electrónico, pese a su antigüedad, sigue siendo una forma popular de comunicación
por ser muy económico y fácil de usar. La mayoría de los correos electrónicos que son enviados
en la actualidad son correos spam los cuales suponen una pérdida de tiempo y dinero para usuarios
y empresas. Para contrarrestar el spam se han creado herramientas que intentan detectarlo de
forma automática. No obstante, no existe un método que permita la detección de spam con una
precisión absoluta dado que los spammers cambian la forma de los spam para confundir a los
filtros anti-spam. Sin embargo, en el área del aprendizaje automático existen algoritmos capaces
de reconocer estos patrones (conocido como cambio de concepto) y adaptarse al mismo. El presente
trabajo se propuso como objetivo detectar de manera eficiente los correos spam utilizando
clasificadores incrementales, de los cuales algunos son capaces de adaptarse al cambio de concepto,
para lo cual se diseñó un modelo para la detección personalizada de spam y se realizaron
experimentos para evaluar el desempeño de los algoritmos a utilizar en el modelo, obteniendo
resultados favorables que permitieron llegar a la conclusión de que este tipo de clasificadores
son eficaces para la detección de spam, con un elevado grado de precisión en las predicciones,
destacándose el clasificador FASE, el cual fue seleccionado para formar parte de la herramienta
final propuesta.
Email, despite its age, still is a popular form of communication because it is very economic and easy to use. Most of the emails that are sent today are spam emails, which represent a waste of time and money for users and companies. To counteract spam, tools have been created that try to detect it automatically. However, there is no method that allows the detection of spam with absolute precision since spammers change the form of spam to confuse anti-spam filters. However, in the machine learning area there are algorithms capable of recognizing these patterns (also know as concept drift) and adapting to them. The objective of the present work is to efficiently detect spam emails using online classifiers, some of which are capable of adapting to concept drift. A model for the personalized spam detection was designed and experiments were carried out to evaluate the performance of the algorithms to be used in the model, obtaining favorable results that allowed to reach the conclusion that this type of classifier are effective for spam detection, with a high percent of accuracy in the predictions, standing out from the rest FASE classifier, which was selected to be part of the proposed final tool.
Email, despite its age, still is a popular form of communication because it is very economic and easy to use. Most of the emails that are sent today are spam emails, which represent a waste of time and money for users and companies. To counteract spam, tools have been created that try to detect it automatically. However, there is no method that allows the detection of spam with absolute precision since spammers change the form of spam to confuse anti-spam filters. However, in the machine learning area there are algorithms capable of recognizing these patterns (also know as concept drift) and adapting to them. The objective of the present work is to efficiently detect spam emails using online classifiers, some of which are capable of adapting to concept drift. A model for the personalized spam detection was designed and experiments were carried out to evaluate the performance of the algorithms to be used in the model, obtaining favorable results that allowed to reach the conclusion that this type of classifier are effective for spam detection, with a high percent of accuracy in the predictions, standing out from the rest FASE classifier, which was selected to be part of the proposed final tool.
Descripción
Palabras clave
Correo Electrónico, Detección de Spam, Clasificadores Incrementales, Inteligencia Artificial