Get the same result for all samples from the NPU of NUCLEO-N657X0-Q
Hello ST Community,
I am trying to run a TFLite model on the Neural-ART NPU of the NUCLEO-N657X0-Q board using STM32CubeIDE + STM32CubeMX and STM32Cube AI Studio. I followed the guide here: https://community.st.com/t5/stm32-mcus/how-to-build-an-ai-application-from-scratch-on-the-nucleo-n657x0/ta-p/828502 but with my own TFLite model.
Model details:
- Architecture: 5x FullyConnected layers (41→32→32→16→8→1) with ReLU activations
- Input: int8[1,41], scale=0.0717, zero_point=-24
- Output: int8[1,1], scale=0.0354, zero_point=1
- Task: binary classification
Problem: I have 31 test samples. Every sample produces the same output value 53 regardless of input. I verified the inputs are correctly quantized and different for each sample.
What I investigated: Through per-epoch debugging I found the following epoch flow:
EP0 (HW): reads input
EP1 (HW): continuation
EP2 (SW): DequantizeLinear
EP3 (SW): Conv float → writes correct floats
EP4 (hybrid): outputs ALL ZEROS
EP5 (HW): reads zeros → always produces same output 53
I ran the TFLite model and same dataset on STM32 Cube AI Studio, and still got the same output value 53 regardless of input.
This is my inference code:
int aiRun(void) {
LL_ATON_RT_RetValues_t ret = LL_ATON_RT_DONE;
LL_ATON_RT_Reset_Network(&NN_Instance_network);
LL_ATON_Set_User_Input_Buffer_network(0, stai_input_data, 41);
LL_ATON_Set_User_Output_Buffer_network(0, stai_output_data, 1);
SCB_CleanDCache_by_Addr((uint32_t*)stai_input_data, 64);
SCB_InvalidateDCache_by_Addr((uint32_t*)stai_output_data, 64);
do {
ret = LL_ATON_RT_RunEpochBlock(&NN_Instance_network);
if (ret == LL_ATON_RT_WFE)
LL_ATON_OSAL_WFE();
} while (ret != LL_ATON_RT_DONE);
SCB_InvalidateDCache_by_Addr((uint32_t*)stai_output_data, 64);
return 0;
}
Environment:
- Board: NUCLEO-N657X0-Q
- ST Edge AI Studio: v4.0.0
- OS: Ubuntu 24