Implementación de un algoritmo de aprendizaje automático en Apache Spark

Sánchez Alba, Ricardo

Implementación de un algoritmo de aprendizaje automático en Apache Spark

dc.contributor.advisor	Morell Pérez, Carlos
dc.contributor.author	Sánchez Alba, Ricardo
dc.coverage.spatial	7004624	en_US
dc.date.accessioned	2018-01-22T16:17:43Z
dc.date.available	2018-01-22T16:17:43Z
dc.date.issued	2017-07-07
dc.description.abstract	El análisis de grandes cantidades de datos, así como la extracción de conocimiento útil de estos constituye en la actualidad un reto ya que cada día crecen velozmente los volúmenes de información generada y se necesitan programas capaces de realizar esta tarea en poco tiempo. Durante varios años frameworks de código abierto han sido utilizados para la aplicación de técnicas de aprendizaje automático en pequeños volúmenes de datos, pero la necesidad creciente de la industria ha dado como consecuencia una evolución en el área del cómputo distribuido, surgiendo así herramientas como Apache Hadoop y Apache Spark siendo éste último entre 10 y 100 veces más rápido que su antecesor. En este trabajo se propone un procedimiento general para la inclusión de nuevos algoritmos de aprendizaje automático en el framework Apache Spark y se implementa un algoritmo de regresión lineal con el fin de validar la metodología propuesta. Se realizaron una serie de experimentos al software implementado que permitieron valorar las ventajas del framework Apache Spark para reducir significativamente los tiempos de ejecución cuando este tipo de algoritmo se somete al procesamiento de cantidades masivas de datos.	en_US
dc.description.abstract	The analysis of large amounts of data, as well as the extraction of useful knowledge of these, is now a challenge as each day the volumes of information generated grow rapidly and programs are needed that can perform this task in a short time. For several years Open source frameworks have been used for the application of automated learning techniques in small volumes of data, but the growing need of the industry has resulted in an evolution in the area of distributed computing, resulting in tools such as Apache Hadoop and Apache Spark The latter being between 10 and 100 times faster than its predecessor. In this paper we propose a general procedure for the inclusion of new algorithms of au- tomatic learning in the Apache Spark framework and a linear regression algorithm is implemented in order to validate the proposed methodology. A series of experiments were performed on the implemented software that allowed to eva- luate the advantages of the Apache Spark framework to significantly reduce execution times when this type of algorithm is submitted to the processing of massive amounts of data.	en_US
dc.description.sponsorship	Facultad de Matemática, Física y Computación. Departamento de Ciencias de la Computación	en_US
dc.description.status	non-published	en_US
dc.identifier.uri	https://dspace.uclv.edu.cu/handle/123456789/8505
dc.language.iso	es	en_US
dc.publisher	Universidad Central “Marta Abreu” de Las Villas	en_US
dc.rights	Este documento es Propiedad Patrimonial de la Universidad Central “Marta Abreu” de Las Villas. Los usuarios podrán hacer uso de esta obra bajo la siguiente licencia: Creative Commons: Atribución-No Comercial-Compartir Igual 4.0 License	en_US
dc.subject	Algoritmo	en_US
dc.subject	Implementación	en_US
dc.subject	Aprendizaje Automático	en_US
dc.subject	Regresión Lineal	en_US
dc.subject	Cómputo Distribuido	en_US
dc.subject	Apache Spark	en_US
dc.subject.other	Algoritmo	en_US
dc.subject.other	Aprendizaje Automático	en_US
dc.subject.other	Framework	en_US
dc.subject.other	Desarrollo de Software	en_US
dc.title	Implementación de un algoritmo de aprendizaje automático en Apache Spark	en_US
dc.type	Thesis	en_US
dc.type.thesis	bachelor	en_US

Archivos

Bloque original

Mostrando 1 - 1 de 1

Nombre:: Trabajo de Diploma Ricardo.pdf
Tamaño:: 917.14 KB
Formato:: Adobe Portable Document Format

Descargar

Bloque de licencias

Mostrando 1 - 1 de 1

Nombre:: license.txt
Tamaño:: 3.33 KB
Formato:: Item-specific license agreed upon to submission
Descripción:

Descargar

Colecciones

Tesis de Pregrado - Licenciatura en Ciencias de la Computación