I immediately made the connection, too. Itanium pushed a lot of complexity up into the compiler. The performance gains were supposed to come from the Itanium's wide-issue instructions, where one would issue multiple instructions at once. But that requires the compiler to do the necessary analysis to know which instructions can be executed at the same time. Typical out-of-order processors figure this out at runtime. My understanding is that in practice, the Itanium compiler was not able to do this well enough to justify the architectural decisions.
That, though, is still far less radical than the design presented in this article.
That, though, is still far less radical than the design presented in this article.