Deploying neural networks on microcontroller units (MCUs) presents
substantial challenges due to their constrained computation and memory
resources. Previous researches have explored patch-based inference as a
strategy to conserve memory without sacrificing model accuracy. However, this
technique suffers from severe redundant computation overhead, leading to a
substantial increase in execution latency. A feasible solution to address this
issue is mixed-precision quantization, but it faces the challenges of accuracy
degradation and a time-consuming search time. In this paper, we propose
QuantMCU, a novel patch-based inference method that utilizes value-driven
mixed-precision quantization to reduce redundant computation. We first utilize
value-driven patch classification (VDPC) to maintain the model accuracy. VDPC
classifies patches into two classes based on whether they contain outlier
values. For patches containing outlier values, we apply 8-bit quantization to
the feature maps on the dataflow branches that follow. In addition, for patches
without outlier values, we utilize value-driven quantization search (VDQS) on
the feature maps of their following dataflow branches to reduce search time.
Specifically, VDQS introduces a novel quantization search metric that takes
into account both computation and accuracy, and it employs entropy as an
accuracy representation to avoid additional training. VDQS also adopts an
iterative approach to determine the bitwidth of each feature map to further
accelerate the search process. Experimental results on real-world MCU devices
show that QuantMCU can reduce computation by 2.2x on average while maintaining
comparable model accuracy compared to the state-of-the-art patch-based
inference methods.

The article discusses the challenges of deploying neural networks on microcontroller units (MCUs) due to their limited computation and memory resources. Previous research has explored patch-based inference as a way to conserve memory, but it suffers from redundant computation overhead and increased execution latency. The proposed solution, QuantMCU, utilizes value-driven mixed-precision quantization to reduce redundant computation. It introduces value-driven patch classification (VDPC) to maintain model accuracy and applies 8-bit quantization to feature maps with outlier values. For patches without outlier values, it utilizes value-driven quantization search (VDQS) to reduce search time, employing a novel quantization search metric that considers both computation and accuracy. Experimental results on real-world MCU devices demonstrate that QuantMCU can significantly reduce computation while maintaining comparable model accuracy compared to existing patch-based inference methods.

Exploring QuantMCU: A Novel Patch-based Inference Method for Deploying Neural Networks on MCUs

Deploying neural networks on microcontroller units (MCUs) comes with its fair share of challenges due to the limited computation and memory resources available. Previous research has looked into patch-based inference as a strategy to conserve memory without compromising model accuracy. However, this approach often leads to redundant computation overhead, resulting in increased execution latency. To address these concerns, a new and innovative solution called QuantMCU is proposed.

The Challenges and Proposed Solution

One of the main issues with patch-based inference is the excessive redundant computation it entails. This problem can be mitigated through mixed-precision quantization, which aims to reduce redundancy while maintaining accuracy. However, traditional mixed-precision quantization methods pose challenges in terms of accuracy degradation and lengthy search times. QuantMCU introduces value-driven mixed-precision quantization as a solution to these challenges.

The Value-driven Approach

QuantMCU leverages value-driven patch classification (VDPC) to ensure model accuracy is maintained. VDPC categorizes patches into two classes based on whether they contain outlier values. For patches containing outlier values, an 8-bit quantization technique is employed on the feature maps of following dataflow branches. On the other hand, for patches without outlier values, value-driven quantization search (VDQS) is utilized on their corresponding feature maps to reduce search time.

The Importance of Value-driven Quantization Search

VDQS introduces a unique quantization search metric that takes into account both computation and accuracy. It utilizes entropy as a representation of accuracy, eliminating the need for additional training. VDQS also adopts an iterative approach to determine the bitwidth of each feature map, further accelerating the search process. By considering both computation and accuracy, VDQS ensures that the resulting quantization maintains model accuracy while reducing redundant computations.

Real-world MCU Device Experimentation

An extensive set of experimental trials was conducted using real-world MCU devices to evaluate the performance of QuantMCU. The results showcased impressive outcomes, with QuantMCU reducing computation by an average of 2.2x compared to existing patch-based inference methods. Significantly, this was achieved while maintaining comparable model accuracy.

QuantMCU poses as an innovative solution to the challenges faced in deploying neural networks on MCUs. By implementing value-driven mixed-precision quantization and introducing value-driven patch classification and value-driven quantization search, QuantMCU effectively reduces redundant computation while preserving model accuracy. With its promising experimental results, QuantMCU opens up new possibilities for optimizing the deployment of neural networks on resource-constrained MCUs.

Deploying neural networks on microcontroller units (MCUs) is a challenging task due to their limited computational and memory resources. Previous research has explored patch-based inference as a way to conserve memory while maintaining model accuracy. However, this approach often results in redundant computation overhead, leading to increased execution latency.

To address this issue, the proposed method in this paper is QuantMCU, which utilizes value-driven mixed-precision quantization to reduce redundant computation. The authors first introduce value-driven patch classification (VDPC) to maintain model accuracy. VDPC classifies patches into two classes based on whether they contain outlier values. For patches with outlier values, 8-bit quantization is applied to the feature maps on the subsequent dataflow branches. This helps reduce the memory requirements for these patches.

For patches without outlier values, the authors propose value-driven quantization search (VDQS) to reduce search time. VDQS introduces a novel quantization search metric that considers both computation and accuracy. It utilizes entropy as a representation of accuracy without requiring additional training. VDQS also adopts an iterative approach to determine the bitwidth of each feature map, further accelerating the search process.

Experimental results on real-world MCU devices demonstrate that QuantMCU can reduce computation by an average of 2.2x while maintaining comparable model accuracy compared to state-of-the-art patch-based inference methods.

This research is significant as it addresses the challenges of deploying neural networks on resource-constrained MCUs. By introducing value-driven mixed-precision quantization and utilizing patch-based inference, QuantMCU offers a practical solution for reducing memory usage and computation overhead. The use of VDPC and VDQS techniques further enhance the efficiency of the method by accurately classifying patches and reducing search time.

The results obtained from real-world MCU devices validate the effectiveness of QuantMCU in achieving a balance between computational efficiency and model accuracy. This research opens up possibilities for deploying neural networks on MCUs with limited resources, enabling applications in areas such as Internet of Things (IoT) devices, embedded systems, and edge computing.

Moving forward, it would be interesting to explore the scalability of QuantMCU to larger and more complex neural network models. Additionally, investigating the impact of QuantMCU on other performance metrics such as power consumption and energy efficiency would provide a comprehensive evaluation of its practicality in real-world deployments. Overall, this research contributes valuable insights and techniques for overcoming the challenges of deploying neural networks on resource-constrained MCUs.
Read the original article