Federated Deep Learning

Challenge of training deep learning models

Training deep learning models has always been a challenge. Not only because of the large quantity of data to train on, or because of the large number of model parameters to optimize, but also because of the fact that training a deep learning model involves finding its high through a huge variety of related paths, which involves testing many different combinations of hyper-parmeters and model topologies, without any certainty but just using basic intuition and human knowledge and expertise.

As a result, it’s not only that a single training run may be very long even on many GPUs, but that we have to actually perform many of such runs before finding a good path. So, globally, the cost - in time, human resources and money - of training deep learning models is very high.

In order to speed up this process, the common trend is to deploy models in high-performance computing clusters, equipped with last-generation powerful GPUs. But such clusters are extremely costly, and become outdated only after a few years. This is why another paradigm is currently emerging under the name federated deep learning.

Distributed training

Let us first review the various strategies to train deep learning models, centralized or distributed:

On a single GPU on a single machine: this is the easiest nowadays, thanks to efficient implementation of CUDA operations in pytorch ans tensorflow
Accross GPUs on a single machine: this is a bit more difficult to realize, because it often involves modifying the code of the model to efficiently distribute it on multiple GPUs. More and more libraries are however proposed to try and automate this distribution.
Accross GPUs on multiple nodes in a single cluster: this is more difficult, because central memory is not shared anymore, and latency between nodes is higher.
Accross low-end devices: this is the so-called federated deep learning. Latency is however a major issue there, and synchronous SGD is not an option any more. So the idea is to train on each device for multiple epochs using only the data hosted locally, before updating the global model. This is also presented as a way to protect privacy, as device-dependent data is not transferred to another node.

Federated, but not in the sense of OLKi

The term federated deep learning in the sense given above involves a central model, and every node is still working to update this central model. In that sense, it fundamentally differs from the use of federated in the federated OLKi platform, where there is no central model and every node has his own objectives, but still voluntarily share knowledge with other nodes in his community.