[BUG] STM32 lwIP Ethernet driver Rx deadlock
This bug is present in F1, F2, F4 and F7 series examples and CubeMX generated code with RTOS and is one of the biggest flaws in ST's lwIP integration.
Problem
lwIP core (also known as the "tcpip_thread") calls low_level_output(), which calls HAL_ETH_TransmitFrame(). While the latter is processing, Ethernet input thread (ethernetif_input() function) can call low_level_input(), which calls HAL_ETH_GetReceivedFrame_IT(). Because both of those HAL functions use HAL's ingeniously stupid "lock" mechanism, HAL_ETH_GetReceivedFrame_IT() returns HAL_BUSY. Subsequently that makes low_level_input() to return NULL and ethernetif_input() to go back on waiting for a semaphore.
Consequences
- Received frames are not processed and Rx buffers are not released to DMA.
- When the next frames are received, code will iterate and try to process all previous frames up to the current frame. But again, if at some iteration HAL_BUSY is returned, the rest of the frames will be left unprocessed.
- If no more frames do come from network, then the frames waiting in Rx buffers will never be processed.
- If all Rx descriptors have been used (OWN bit cleared) when semaphore is acquired and HAL_BUSY is returned on processing the first frame, no more Rx complete interrupts will be generated. Consequently semaphore will not be released and Ethernet input thread will be stuck forever on waiting for that semaphore.
Solution
This code:
if(HAL_ETH_GetReceivedFrame_IT(&EthHandle) != HAL_OK)
return NULL;Must be replaced with this code:
HAL_StatusTypeDef status;
LOCK_TCPIP_CORE();
status = HAL_ETH_GetReceivedFrame_IT(&EthHandle);
UNLOCK_TCPIP_CORE();
if (status != HAL_OK) {
return NULL;
}