Desarrollo de clasificadores multinstancia para aplicaciones textuales basados en la fórmula de Rocchio

Díaz Figueredo, Carlos Arturo

Desarrollo de clasificadores multinstancia para aplicaciones textuales basados en la fórmula de Rocchio

dc.contributor.advisor	Sánchez Tarragó, Dánel
dc.contributor.author	Díaz Figueredo, Carlos Arturo
dc.coverage.spatial	Santa Clara	en_US
dc.date.accessioned	2018-03-12T22:26:01Z
dc.date.available	2018-03-12T22:26:01Z
dc.date.issued	2016-06-26
dc.description.abstract	La clasificación multinstancia, como parte del aprendizaje automático, tiene como objetivo construir a partir de un conjunto de ejemplos, un modelo matemático que permita clasificar objetos descritos por múltiples vectores de atributos. Específicamente, la clasificación textual es una tarea de la clasificación multinstancia, donde la idea es asignar etiquetas semánticas a los documentos. La rama de la clasificación textual es ampliamente utilizada en un sinnúmero de campos de aplicación. Dentro de los algoritmos destacados en el área de la clasificación textual se encuentra el algoritmo clasificador simpleinstancia de Rocchio. Recientemente fue publicado el algoritmo clasificador MIRocchio que adecua la fórmula de Rocchio al enfoque multinstancia, con buenos resultados en el área de clasificación textual. Sin embargo este algoritmo presenta ciertas limitaciones durante el proceso de aprendizaje. Estas limitantes se analizan en profundidad junto con el funcionamiento de este algoritmo. Con el objetivo de mitigar estas limitaciones el presente trabajo propone tres variantes del algoritmo MIRocchio que intentan mejorar tanto la eficiencia como la eficacia del mismo. Se propone además en esta tesis una nueva hipótesis basada en la frontera de decisión entre clases, la cual se incorporara en el diseño de dos de las propuestas. Todos los algoritmos obtenidos en este trabajo están enfocados específicamente al área de la clasificación textual. La validez de las hipótesis propuestas durante esta investigación se comprueba experimentalmente para los problemas de la recomendación de paginas web indices y TREC9. Los experimentos arrojaron resultados favorables para dos de los tres algoritmos propuestos, siendo capaces de mejorar el desempeño del clasificador MIRocchio y siendo competitivos con algoritmos del estado del arte.	en_US
dc.description.abstract	The multinstance classification, like part of the automatic learning, has as objective to build a mathematical model that allows to classify objects described by multiple vectors of attributes starting from a group of examples. Specifically, the textual classification is a task of the multinstance classification, where the idea is assign semantic labels to the documents. The branch of the textual classification is broadly used in many application fields. One of the most recognized algorithms in the area of the textual classification is the singleinstance classifier of Rocchio. Recently was published a classifier algorithm baptized as MIRocchio that adapts the formula of Rocchio to the multinstance learning with good results in the area of textual classification. This algorithm presents certain limitations during the learning process. These restrictive ones are analyzed in depth together with the operation of this algorithm. With the objective of mitigating these limitations the present work proposes three variants of the algorithm MIRocchio that promise to improve as much the efficiency as the effectiveness of the same one. Is also intends in this thesis a new hypothesis based on the frontier of decision among classes, which incorporated in the design of two of the proposals. All the algorithms obtained in this work are focused specifically to the area of the textual classification. The validity of the hypotheses proposed during this investigation is proven experimentally for the problems TREC9 and WIR. The experiments threw favorable results for two of the three proposed algorithms, being able to improve the acting of the classifier MIRocchio and being competent also with algorithms of the state of the art.	en_US
dc.description.sponsorship	Facultad de Matemática, Física y Computación. Departamento de Ciencias de la Computación	en_US
dc.description.status	non-published	en_US
dc.identifier.uri	https://dspace.uclv.edu.cu/handle/123456789/8892
dc.language.iso	es	en_US
dc.publisher	Universidad Central “Marta Abreu” de Las Villas	en_US
dc.rights	Este documento es Propiedad Patrimonial de la Universidad Central “Marta Abreu” de Las Villas. Los usuarios podrán hacer uso de esta obra bajo la siguiente licencia: Creative Commons: Atribución-No Comercial-Compartir Igual 4.0 License	en_US
dc.subject	Desarrollo	en_US
dc.subject	Clasificadores Multinstancia	en_US
dc.subject	Aplicaciones Textuales	en_US
dc.subject	Fórmula de Rocchio	en_US
dc.subject	Aprendizaje Automático	en_US
dc.subject.other	Algoritmos de Clasificación	en_US
dc.subject.other	Modelo de Clasificación	en_US
dc.subject.other	Clasificadores	en_US
dc.subject.other	Clasificación de Documentos	en_US
dc.subject.other	Aprendizaje Automático	en_US
dc.subject.other	Inteligencia Artificial	en_US
dc.title	Desarrollo de clasificadores multinstancia para aplicaciones textuales basados en la fórmula de Rocchio	en_US
dc.type	Thesis	en_US
dc.type.thesis	bachelor	en_US

Archivos

Bloque original

Mostrando 1 - 1 de 1

Nombre:: Carlos Arturo Diaz Figueredo.pdf
Tamaño:: 2.13 MB
Formato:: Adobe Portable Document Format

Descargar

Bloque de licencias

Mostrando 1 - 1 de 1

Nombre:: license.txt
Tamaño:: 3.33 KB
Formato:: Item-specific license agreed upon to submission
Descripción:

Descargar

Colecciones

Tesis de Pregrado - Licenciatura en Ciencias de la Computación