Plain sanic runs much faster than the uvicorn-ASGI-sanic stack used in the benchmark, and the ASGI API in the middle is probably degrading other async frameworks' performance too. But then this benchmark also has other major issues, like using HTTP/1.0 without keep-alive in its Nginx proxy_pass config (keep-alive again has a huge effect on performance, and would be enabled on real performance-critical servers). https://sanic.readthedocs.io/en/latest/sanic/nginx.html
Ignore all character support in the standard library and handle UTF-8 as opaque binary buffers. If you need complex string algorithms, decode into UCS-4 (UTF-32). You'll find short encoding and decoding functions on StackOverflow. For case-insensitive comparisons and sorting, use an external library that knows the latest Unicode standard.
The decoding phase will do that, if needed. Also note that in many cases you must process it as opaque binary, even though it should be valid UTF-8. This is in particular with filenames on POSIX systems because otherwise you could not access any files that happen to have invalid UTF-8 in their names.
Syntax of pointers. Easy to use high level languages make extensive use of pointers (i.e. all their variables are actually pointers) but beginners cope with them because no stars or ampersands are required, with the help of GC. Of course they'll get bitten soon and often because it is too easy to create copies of pointers rather than copies of full data structures, and without understanding pointers it's hard to grasp why that happens.
char effectively behaves as a signed type, making it unsuitable for binary operations (e.g. UTF-8 manipulation). I/O functions deal with char pointers, so using unsigned type like uint8_t requires casting back and forth. Is there any way out of this problem, and am I already breaking the aliasing rules with that cast?
Casting between the three character types is safe and doesn't violate aliasing rules. In addition, objects of all types can be accessed by lvalues of any of the three character types (though unsigned char is recommended), so there's no problem there either.
I/O functions that take a plain char* are designed to interoperate with char arrays and strings, so passing in unsigned or signed char is a sign that they aren't being used as intended. (Functions that traffic in binary data like fread/fwrite should take void*).
At least from what I've heard that's because stdint values are optional.
6.2.5p17
The three types char, signed char, and unsigned char are collectively called the character types.
The implementation shall define char to have the same range, representation, and behavior as either
signed char or unsigned char.
48)
and
5.2.4.2.1 says that width of char, signed char and unsigned char are the same (8).
A conforming implementation could extend the language with an 8-bit type __nonaliasingbyte which has no special aliasing privileges, and define uint8_t as being synonymous with that type.
On the other hand, the Standard should never have given character types special aliasing rules to begin with. Such rules would have been unnecessary if the Standard had noted that an access to an lvalue which is freshly visibly derived from another is an access to the lvalue from which it is derived. The question of whether a compiler recognizes a particular lvalue as "freshly visibly derived" from another is a Quality of Implementation issue outside the Standard's jurisdiction.
Why is the struct tm* returned by localtime() not thread-local like errno and other similar variables are (at least in implementations)? Do you have any plans to improve calendar support for practical uses?
Both question would get better answers if they were asked to a panel of experts on POSIX (which could including members of the POSIX standardization committee).
For the first one, I can attempt a guess: maybe it was feared that making the result of localtime thread-locale would break some programs? You could build such a program on purpose, although I am not clear how frequently one would write one by accident.
Anyway, localtime_r is the function that one should use if one is concerned by thread-safety. A more likely answer is that no Unix implementation bothered to fix localtime because the proper fix was for programs to call localtime_r.