By late 2000s, instruction scheduling research was largely considered done and d...

By late 2000s, instruction scheduling research was largely considered done and dusted, with papers like:

https://dl.acm.org/doi/book/10.5555/923366 https://dl.acm.org/doi/10.1145/349299.349318

and many, many others (it produced so many PhDs in 90s). And, needless to say, HP and Intel hired so many excellent researchers during the heydays of Itanium. So I don't know on what basis you think there wasn't enough investment. So I have no choice but to assume you're ignorant of the actual history here, both in academics and industry.

It turns out instruction scheduling can not overcome the challenge of variable memory and cache latency, and branch prediction, because all of those are dynamic and unpredictable, for "integer" application (i.e. bulk of the code running on the CPUs of your laptop and cell phones). And, predication, which was one of the "solutions" to overcome branch misprediction penalties, turns out to be not very efficient, and is limited in its application.

For integer applications, it turns out the instruction level parallelism isn't really the issue. It's about how to generate and maintain as many outstanding cache misses at a time. VLIW turns out to be insufficient and inefficient for that. Some minor attempts are addressing that through prefetches and more elaborate markings around load/store all failed to give good results.

For HPC type workload, it turns out data parallelism and thread-level parallelism are much more efficient way to improve the performance, and also makes ILP on a single instruction stream play only a very minor role - GPUs and ML accelerators demonstrate this very clearly.

As for the security and the speculative execution, speculative execution is not going anywhere. Naturally, there are many researches around this like:

https://ieeexplore.ieee.org/abstract/document/9138997 https://dl.acm.org/doi/abs/10.1145/3352460.3358306

and while it will take a while before the real pipeline implements ideas like above thus we may continue to see some smaller and smaller vulnerabilities as the industry collectively plays whack-a-mole game, I don't see a world where the top of the line general-purpose microprocessor giving up on speculative execution, as the performance gain is simply too big.

I have yet to meet any academics or industry processor architects or compiler engineer who think VLIW / Itanium is the way to move forward.

This is not to say putting as much work to the compiler is a bad idea, as nVidia has demonstrated. But what they are doing is not VLIW.