undefined | Better HN

0 pointsskywhopper8y ago0 comments

There was an article on Hacker News recently that covered some of the reasons for Itanium's failure to realize its theoretical benefits. I'm not finding it now, but IIRC, the argument made was that predicting likely-parallelizable code is actually a lot harder to do at compile time, and that, like so many ultra-optimized systems, the real world works much differently and a messier, more random approach ultimately yields far better performance.

0 comments

gh02t8y ago

Itanium suffered performance wise initially because they had trouble with compilers, but that's not the whole story. You also have to consider that AMD launched AMD64, which was backwards compatible, at about the same time. Later on the Itanium compilers got better, but on release it became a choice of "sluggish, incompatible and expensive Itanium with potential to perform well in the future" versus "backwards compatible, currently faster and cheaper x86_64." It didn't gain any real momentum to start because of this, which ultimately doomed it even when a lot of the issues were resolved later on.

mattnewport8y ago

> which ultimately doomed it even when a lot of the issues were resolved later on.

Was there ever a point in the Itanium's history where there were Itaniums that ran mainstream software with better performance than equivalently priced x64 processors?

zlynx8y ago

There were hand-coded assembly loops that were 3-4 times faster than x86, using Itanium's predicates and rolling register windows.

But I guess you said mainstream. So unless you count database engines, I suppose the answer is "No."

Today you can get the same vector performance using SSE4 and AVX. Almost all of Itanium's good stuff has been rolled into Xeon.

gh02t8y ago

As far as I know (which isn't very far, admittedly) they only really managed to reach parity with some performance gains over x86 in a few niches, but it's also a bit chicken-and-egg. It never had enough attention to really get the optimization and porting efforts it would have seen if it had been successful.

dnautics8y ago

> the argument made was that predicting likely-parallelizable code is actually a lot harder to do at compile time, and that, like so many ultra-optimized systems, the real world works much differently and a messier, more random approach ultimately yields far better performance.

I am not an expert on computer history, but my feelings on the matter are as follows:

It's hard for certain domains, like handling millions of web requests. For most computational stuff where you're just blowing through regularly-shaped numerical computation (like for example ML, or signal processing), it's not that hard, but arguably the compilers of the time were still not quite up to it (there's a lot of neat stuff that's getting worked into the LLVM pluggable architecture these days). Of course ML wasn't really a thing back then, and intel didn't seem interested in putting itaniums into cell towers.

One way to think of the OOO and branch predict processing that current x86 (and arm) do is that they are doing on-the-fly re-JITing of the code. There is a lot of silicon dedicated to doing the right thing and avoiding highly costly branch mispredicts, etc. During itanium's heyday, there was a premium of performance over efficiency. Now everyone wants power efficiency (since that is now often a cost bottleneck). Besides which, for other reasons Itanium wasn't as power efficient as (ideally) the chosen architecture could have achieved.

drbawb8y ago

>the argument made was that predicting likely-parallelizable code is actually a lot harder to do at compile time

So don't do it at compile time? That's really a very weak argument against the Itanium ISA, and honestly more of an argument against the AOT complication model. Take a runtime with a great JIT, like the JVM or V8, and teach it to emit instructions for the Itanium ISA. (As an added advantage these runtimes are extremely portable and can be run, with less optimizations, on other ISAs without issue.)

The problem, as always, is that nobody with money to spend ever wants to part with their existing software. (Likely written in C.) In 2001 Clang/LLVM didn't even exist, and I'm not familiar with any C compilers of the era that had so much as a rudimentary JIT.

mattnewport8y ago

There's not that much overlap between the kind of optimizations that JITs do and the optimizations that modern CPUs do. The promise of JITs outperforming AoT compiled code has never really materialized. The performance advantages of OoO execution, speculative execution, etc. are very real and all modern high performance CPUs do them. Attempts to shift some of that work onto the compiler like Itanium and Cell have largely been failures.

dnautics8y ago

arguably the "sufficiently advanced compiler" (cue joke) has arrived (sadly, post Itanium, Cell) in the form of a popularized LLVM[0], so it's improper to claim failure based on two, aged datapoints.

The flaws of OOO and SpecEx are evident with the overhead required to secure a system (spectre, meltdown) in a nondeterministic computational environment, and there is certainly a power cost to effectively JITting your code on every clock cycle.

As the definition of performance is changing due to the topping out of moore's law and shifting paralellism from amdahl to gustafson, I think there is a real opportunity for non ooo, non specex in th future.

2 more replies

admax88q8y ago

> The promise of JITs outperforming AoT compiled code has never really materialized.

Well JITs do actually outperform AoT compiled code today. Java is faster than C in many workloads. Especially large scale server workloads with huge heaps.

Java can allocate/deallocate memory faster than C, and it can compact the heap in the process which improves locality.

1 more reply

marshray8y ago

This argument has been made since the introduction of the JVM in the early mid-90's.

Seems to me like if, in practice, JIT provided better performance then by now people would be rewriting their C/C++ code in Java and C# for speed.

gpderetta8y ago

Most importantly people would write JITers for C and C++.

Tobba_8y ago

It might still be possible. The JVM and .NET both have their speed annihilated by their awful choice of memory model.

2 more replies

lmm8y ago

> Seems to me like if, in practice, JIT provided better performance then by now people would be rewriting their C/C++ code in Java and C# for speed.

It's a little bit faster, not faster by enough to matter. If you're going to rewrite C/C++ code for speed you'd go to Fortran or assembler, and even then you're unlikely to get enough of a speedup to be worth a rewrite.

New projects do use Java or C# rather than C/C++ though.

3 more replies

Avshalom8y ago

One of the big problems with predicting what can be MIMDed is that almost all the languages we use except for Haskell allow for dependency on who knows what. With out very strict refusal of state it's hard as fuck to figure out what is independent of what at compile time.

Not that it can't be done so much as getting programmers to accept it is can't be done.

j / k navigate · click thread line to collapse

0 comments

gh02t8y ago

mattnewport8y ago

> which ultimately doomed it even when a lot of the issues were resolved later on.

Was there ever a point in the Itanium's history where there were Itaniums that ran mainstream software with better performance than equivalently priced x64 processors?

zlynx8y ago

There were hand-coded assembly loops that were 3-4 times faster than x86, using Itanium's predicates and rolling register windows.

But I guess you said mainstream. So unless you count database engines, I suppose the answer is "No."

Today you can get the same vector performance using SSE4 and AVX. Almost all of Itanium's good stuff has been rolled into Xeon.

gh02t8y ago

dnautics8y ago

I am not an expert on computer history, but my feelings on the matter are as follows:

drbawb8y ago

>the argument made was that predicting likely-parallelizable code is actually a lot harder to do at compile time

mattnewport8y ago

dnautics8y ago

arguably the "sufficiently advanced compiler" (cue joke) has arrived (sadly, post Itanium, Cell) in the form of a popularized LLVM[0], so it's improper to claim failure based on two, aged datapoints.

2 more replies

admax88q8y ago

> The promise of JITs outperforming AoT compiled code has never really materialized.

Well JITs do actually outperform AoT compiled code today. Java is faster than C in many workloads. Especially large scale server workloads with huge heaps.

Java can allocate/deallocate memory faster than C, and it can compact the heap in the process which improves locality.

1 more reply

marshray8y ago

This argument has been made since the introduction of the JVM in the early mid-90's.

Seems to me like if, in practice, JIT provided better performance then by now people would be rewriting their C/C++ code in Java and C# for speed.

gpderetta8y ago

Most importantly people would write JITers for C and C++.

Tobba_8y ago

It might still be possible. The JVM and .NET both have their speed annihilated by their awful choice of memory model.

2 more replies

lmm8y ago

> Seems to me like if, in practice, JIT provided better performance then by now people would be rewriting their C/C++ code in Java and C# for speed.

New projects do use Java or C# rather than C/C++ though.

3 more replies

Avshalom8y ago

Not that it can't be done so much as getting programmers to accept it is can't be done.

j / k navigate · click thread line to collapse