Regression Training using Model Parallelism in a Distributed Cloud

© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Machine learning requires a relevant amount of computational resources and it is usually executed in high-capacity centralized cloud infrastructures (e.g., data centers). In such infrastructures, resources are shared in a scalable manner through instantiation and orchestration of multiple virtualized services. Emerging trends in machine learning are distribution and parallelization of model training, which allows the execution of model training tasks in multiple distributed computational domains, with the aim of reducing the overall training time. A possible drawback in decentralization of machine learning is that performance latency issues may arise when the computation of training is geographically distributed to nodes with long distance from each other. One way to reduce latency is to utilize edge computing infrastructure, i.e., to distribute computation near the origin of the request. As edge resources can be scarce, it is important to orchestrate the model training in a parallelized manner. To this extent, in order to effectively ease the use of parallelization both in centralized and in distributed scenarios, we propose and implement a concept that we refer to Intelligent Agent (IA). An IA is responsible for instantiating and scheduling of the machine learning tasks (e.g., model training), and deriving inferences. In our solution, model training is distributed to multiple IAs in parallel. Each IA is packaged into a Linux container in order to take advantage of container portability across heterogenous deployments and to reuse existing container orchestration tools. We validate our proposal by deploying and instantiating multiple IAs across a distributed cloud environment, where each IA is accounting for a fixed amount of computational resources.

URI

https://doi.org/10.1109/DASC/PiCom/CBDCom/CyberSciTech.2019.00139

Emojulkaisu

2019 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech)

ISBN

978-1-7281-3024-8

OKM-julkaisutyyppi

A4 Artikkeli konferenssijulkaisussa

intelligent agents Big Data Intelligent cloud Model parallelism Regression training

Tietueen kaikki tiedot

Regression Training using Model Parallelism in a Distributed Cloud

Toimittaja(t)

Pysyvä osoite

Kuvaus

URI

Emojulkaisu

ISBN

ISSN

Aihealue

OKM-julkaisutyyppi

Regression Training using Model Parallelism in a Distributed Cloud

Toimittaja(t)

Pysyvä osoite

Kuvaus

URI

Emojulkaisu

ISBN

ISSN

Aihealue

OKM-julkaisutyyppi

Avainsanat