I am developing an application that uses lwIP's sockets interface. Since I needed non-blocking connect, I implemented one, see patch number 6860 ( https://savannah.nongnu.org/patch/index.php?6860
I tested my system using Intel's threads-checker, which detected a thread-safety problem in lwip's code, regarding conn->err handling. This field is used by many netconn functions (i.e. netconn_send, netconn_getaddr, etc.) as the value to use for the return value. However, this field might be changed asynchronously due to network events. Since the variable is not even properly synchronized, it means that the return values from various functions could be undefined. Additionally, based on further examination of the code, it seems that this variable is used for 2 very different purposes – it is used both to carry "return value" information from the tcpip_thread back to the calling thread, and to hold "connection global error state" information. As a result, attempting to use netconn_getaddr on a non-connected UDP socket will set conn->err to ERR_CONN (considered "fatal"), causing subsequent calls to netconn_recv to fail. This is clearly an unwanted side-effect that should be prevented.
It seems to me that the proper solution to the problem would be to use the api_msg_msg struct to convey the error/return code information for api calls, and keep conn->err as the global error state of the connection, but use SYS_ARCH_PROT_* wrappers to make sure it is providing a coherent status for the connection global state, for use by netconn_recv and netconn_send. Additionally, there should be a way for netconn_recv and netconn_accept to provide error information to the caller – as of time being they can either return a pointer or a NULL pointer to indicate error, but as for the details of the error, it is up for the caller to guess.
What is your opinion on this subject? Is there a synchronization mechanism that we missed which prevents said problems from happening?