A 36-bit system with (it sounds like) 9-bit bytes stores bit 8 of a int in bit 8 of a char, and bit 9 of the int in bit 0 of the next char; memcpy won't change that. They're asking for somthing like:
unsigned int x = in[0] + 512*in[1] + 512*512*in[2] + 512*512*512*in[3];
/* aka x = *(int*)in */
out[0] = x & 255; x>>=8;
out[1] = x & 255; x>>=8;
out[2] = x & 255; x>>=8;
out[3] = x & 255;
/* *not* aka *(int*)out = x */
The amount of effort for a compiler to process optimally all 72 variations of "read/write a signed/unsigned 2/4/8-byte big/little-endian value from an address that is aligned on a 1/2/4/8-byte boundary" would be less than the amount of effort required to generate efficient machine code for all the ways that user code might attempt to perform such an operation in portable fashion. Such operations would have platform-independent meaning, and all implementations could implement them in conforming fashion by simply including a portable library, but on many platforms performance could be enormously improved by exploiting knowledge of the target architecture. Having such functions/intrinsics in the Standard would eliminate the need for programmers to choose between portability and performance, by making it easy for a compiler to process portable code efficiently.
I'm not disagreeing, just showing code to illustrate why memcpy doesn't work for this. Although I do disagree that writing a signed value is useful - you can eliminate 18 of those variations with a single intmax_t-to-twos-complement-uintmax_t function (if you drop undefined behaviour for (unsigned foo_t)some_signed_foo this becomes a no-op). A set of sext_uintN functions would also eliminate 18 read-signed versions. Any optimizing compiler can trivially fuse sext_uint32(read_uint32le2(buf)), and minimal implementations would have less boilerplate to chew through.
> Although I do disagree that writing a signed value is useful
Although the Standard defines the behavior of signed-to-unsigned conversion in a way that would yield the same bit pattern as a two's-complement signed number, some compilers will issue warnings if a signed value is implicitly coerced to unsigned. Adding the extra 18 forms would generally require nothing more than defining an extra 24 macros, which seems like a reasonable way to prevent such issues.
72 static in-line functions. If a compiler does a good job of handling such things efficiently, most of them could be accommodated by chaining to another function once or twice (e.g. to read a 64-bit value that's known to be at least 16-bit aligned, on a platform that doesn't support unaligned reads, read and combine two 32-bit values that are known to be 16-bit likewise).
Far less bloat than would be needed for a compiler to recognize and optimize any meaningful fraction of the ways people might write code to work around the lack of portably-specified library functions.