Techniques for Training TinyML Models

Techniques for Training TinyML Models

Previously we looked at some of the challenges of training TinyML models. In short, we need to train our models while recognizing the constraints of the hardware we are deploying to and the overall system requirements like latency and accuracy.

As engineers, if we train our models while being aware of our final hardware limitations, then we can take steps to reduce our model’s accuracy and performance drop when we deploy it. Most of our model’s accuracy and performance will drop at one of two steps in our TinyML pipeline: when we optimize our model and when we deploy our model.

Optimization reduces either the size or the number of operations in our model and it is the main place where our model’s accuracy can drop. Deploying can reduce our model’s performance due to hardware constraints that are mostly out of our control like the size and speed of the memory, the presence of floating point units and the types of operations supported by the target hardware. However we can take into account these optimizations and hardware constraints when we build and traintz our model architecture to make our system more efficient.

The Look-Ahead Method

I like to call this technique of being aware of our entire pipeline the “Look-Ahead” method. Being able to look-ahead when building our system is an extremely powerful tool. However it is not possible to do it if we have not at least planned for and finallized the broad requirements and steps in our TinyMLOps pipeline.

Some of these steps will be constant for almost all pipelines. For instance we will almost always have to optimize our trained model for TinyML with algorithms like quantization or pruning. So we can train our initial model in a way such that the accuracy drop after optimization is reduced. We will look at these techniques in a bit.

Some steps in our TinyMLOps pipeline may be dependent on factors like client requirements and the choice of hardware. In these cases it is not possible to be able to look-ahead and take into consideration any constraints the system may have when training our model. The easiest way to get around this is to finalize these requirements and decisions before training models.

You could go the other way around where you choose a hardware based on how big a model is, but this is not always going to be the right decision in the long-run. This is mostly because models are software based and can be updated more easily and frequently (we will talk more about this in a later chapter) as compared to hardware. New model architectures and data preprocessing and optimization techniques are invented every few weeks that will improve the performance of your model. Moreover as you run your system and collect more data, you can retrain your model to improve its performance and eventually train smaller architectures that run more efficiently.

However, if you choose a more powerful hardware that consumes more power or has a larger form-factor than the requirements, then updating your hardware can be a very painful process, especially if you already have a fleet of many TinyML devices already deployed. Moreover, to make your system efficient, your software will be tightly coupled with your device and changing it could also result in you having to rethink your model and other software systems as well.

Optimization Look-Ahead

Many techniques have been proposed to reduce accuracy drop after optimizing models. One of the earliest proposed solutions is to finetune your optimized model or use a calibration dataset to check for accuracy drops while optimizing your model. However these are techniques that