STM32N6 NPU – ROM size increases for some models after Neural-ART compilation?
STM32N6 NPU – Why does ROM size increase for some models after Neural-ART compilation?
Hello,
I noticed an interesting behavior while comparing STM32N6 CPU execution vs NPU execution using ST Edge AI + Neural-ART.
For a simple Dense/Fully Connected (ad.tflite) model, enabling the NPU did not significantly change the weights size:
- CPU target:
- weights (ro): ~270 KB
- NPU target:
- weights (ro): ~269 KB
So the compiled NPU representation stayed almost identical.
However, for a VWW/MobileNet-like model using many DEPTHWISE_CONV_2D layers, I observed a large ROM increase after enabling the NPU:
- CPU target:
- weights (ro): ~42 KB
- NPU target:
- weights (ro): ~227 KB
At first I thought this was caused by the quantization format conversion:
- CPU:
- model_fmt : ss/sa per channel
- NPU:
- model_fmt : ss/sa per tensor
But I also tested ResNet model where the same quantization conversion occurs, without a major ROM increase.
So it seems the quantization format change alone is not the main reason.
My question
Can ST confirm whether:
- Neural-ART internally repacks/duplicates depthwise convolution weights for NPU scheduling?
- Some MobileNet-style architectures inherently require more ROM overhead on STM32N6 NPU?
- The per-channel -> per-tensor conversion contributes significantly to this behavior, or is the main factor actually the depthwise memory layout optimization?
Thanks!