DistDL: A Distributed Deep Learning Service Schema with GPU Accelerating

Abstract

Deep Learning is a hot topic developed by the industry and academia which integrates the broad field of artificial intelligence with the deployment of deep neural networks in the big data era. Recently, the capability to train neural networks has resulted in state-of-the-art performance in many domains such as computer vision, speech recognition, recommended system, natural language processing, drug discovery and behavioural analysis etc. However, existing deep learning systems are inadequate for scaling, especially in current cloud infrastructures where the nodes are distributed across multiple geographical location. The tendency is evolving that deep learning must be collaborative optimized by the fields both machine learning and systems. In this paper, we have presented DistDL, a novel distributed deep learning service schema in order to reduce the training time consumption along with communication overhead and achieve extremely parallelism with data and model. Additionally, we also took into consideration GPUs inside DistDL by leveraging the remarkable competency of heterogeneous computation. The results of our experiments in the benchmarks suggest that DistDL is adaptive to various scaling patterns with the same accuracy while minimising training time by adopting the GPU, up to 80% speed up.

Type
Publication
17th Asia-PacificWeb Conference Web Technologies and Applications
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.