Challenges in Training TinyML Models

When ML engineers train models, they usually do not think about where the model will be deployed. Their models are deployed in large multi-core systems with support for many instruction sets, compilers and toolkits that can abstract away the complications of the underlying hardware. Moreover, if a model does have poor execution speeds, it is easy to either add more compute resources (like more CPU cores or GPUs) or to parallelize the execution of the model across multiple devices to improve its latency.

However this is not true in the case of TinyML where the final hardware will be resource constrained and have limited instruction set support. Therefore, for TinyML, the target hardware is a bottleneck that we need to keep in mind when training our models.

Target Hardware

An advantage of TinyML is that there are many micro-controllers available to deploy our models to. Each of them have different memory, sensors, power requirements and connectivity. This means that you have a wide range of devices to choose from, each with different characteristics so that they match the requirements of your system. However, the flip side of this coin is that many of these devices support different instruction sets, toolkits and have different compute power (many don’t even have units for performing floating point arithmetic). Unfortunately for us, the constraints of the deployment hardware creates a domino effect that affects not only our final model’s accuracy, but also the kind of layers we can use and how secure and robust we can make our model.

Model Accuracy

The choice of the target hardware can affect the accuracy of the model we deploy to it. Remember that TinyML devices have less computational power and they do not support many operations. Moreover they do not have a lot of memory. This means that the models we deploy to TinyML devices need to be small, efficient and simple. However these qualities are completely opposite to what makes models accurate. And this is one of the biggest challenges in training TinyML models: How do we make models tiny, while still making sure that they are accurate enough to perform our tasks?

Model Size

To execute a neural network on a TinyML device, we need to load the model architecture and weights into memory and also be able to store intermediate activations when executing the model. Since TinyML devices usually have less memory, we need to make sure that the sum of size of the model, its parameters (weights, biases and other internal values) and the largest generated activation can fit in the memory at the same time. While larger model architectures are more accurate, they also have more parameters and generate larger activations. Larger models will not only take more time to load to memory, but it will also take longer to execute since more operations need to performed. So even if your model can fit into memory, a larger model will have more latency. So when we train our model, we need to make sure that the model size is small such that we can load it into the final hardware we are deploying our model in and its latency is low.

Model Architecture

Another factor affecting latency is the architecture of the model. The architecture of the model includes the kinds of operations we use in it (for instance multiply, ReLU etc), the number of layers in the model and how data and activations flows through those layers. Layers with complex data flows, for instance skip connections (like those in ResNets) may need to have their intermediate activations stored (thus increasing latency and memory) and may not be supported. Complex operations like lookup-tables and fused operations may also not be supported by the target hardware. These operations do not even have to be complex, even simple and commonly used operations like sigmoid, softmax and resize may also not be supported on the target hardware and it’s associated toolkits.

On the other hand, some operations or layers might be supported, but they could increase the latency or computational complexity of your final model. For instance, replacing a max pooling operation with average pooling may result in faster model execution or increasing the stride of your convolutional layer will reduce its computational complexity. We will learn more about these techniques in the next section.

Model Security and Robustness

Finally, many TinyML devices are deployed in remote or open places where they can be affected by environmental factors like sunlight, temperature, dust as well as be prone to physical attacks (for instance to steal model details). For instance, a model that checks for wildlife in a jungle needs to work not only in the morning and night, but also at times with low light, changing shadows, rain, dust and so on. Lately techniques have been developed to train our models such that they are more robust to such environmental changes and to sensor or battery degradation.

Since many deployed models may have been trained with sensitive data or may have IP, they can become targets for attackers looking to steal the model architecture and weights. Recent work in side-channel attacks have shown that this is not only possible to be done with high accuracy, but it can be also done quickly and with cheap easily available tools. Thankfully, there are also some ways to keep create models that can make such attacks either difficult or impossible to do.

We will talk more about model security and robustness in a later chapter, but now let’s take a look at how we can train models for TinyML while solving these challenges.