undefined | Better HN

0 pointssitkack5y ago0 comments

Very little user code generates binaries that can _tell_ it is running on non-x86 hardware. Rust is Arm Memory Model safe, existing C/C++ code that targets the x86 memory model is slowly getting ported over, but unless you are writing multithreaded C++ code that cuts corners it isn't an issue.

Running on the JVM, Ruby, Python, Go, Dlang, Swift, Julia or Rust and you won't notice a difference. It will be sooner than you think.

0 comments

mhh__5y ago

It's not the memory model I'm thinking of but the cache design, ROB size etc.

Obviously this is fairly niche but the friction to making something fast is hugely easier locally.

sitkackOP5y ago

The vast majority of developers never profile their code. I think this is much less of an issue than anyone on HN would rank it. Only when the platform itself provides traces do they take it into consideration. And even then, I think most perf optimization is in a category of don't do the obviously slow thing, or the accidentally n^2 thing.

I partially agree with you though, as the penetration of Arm goes deeper into the programmer ecosystem, any mental roadblocks about deploying to Arm will disappear. It is a mindset issue, not a technical one.

In the 80s and 90s there were lots of alternative architectures and it wasn't a big deal, granted the software stacks were much much smaller and more metal. Now they are huge, but more abstract and farther away from machine issues.

jerf5y ago

"The vast majority of developers never profile their code."

Protip: New on the job and want to establish a reputation quickly? Find the most common path and fire a profiler at it as early as you can. The odds that there's some trivial win that will accelerate the code by huge amounts is fairly decent.

Another bit of evidence developers rarely profile their code is that I can tell my mental model of how expensive some server process will be to run and most other developer's mental models tend to differ by at least an order of magnitude. I've had multiple conversations about the services I provide and people asking me what my hardware is, expecting it to be run on some monster boxes or something when I tell them it's really just two t3.mediums, which mostly do nothing, and I only have two for redundancy. And it's not like I go profile crazy... I really just do some spot checks on hot-path code. By no means am I doing anything amazing. It's just... as you write more code, the odds that you accidentally write something that performs stupidly badly goes up steadily, even if you're trying not to.

1 more reply

mhh__5y ago

This isn't really about you or me but the libraries that work behind the spaghetti people fling into the cloud.

1 more reply

lumost5y ago

given a well designed chip which achieves competitive performance across most benchmarks, Most code will run sufficiently well for most use cases regardless of the nuance of specific cache design and sizes.

There is certainly an exception to this for chips with radically different designs and layouts, as well as folks writing very low-level performance sensitive code which can benefit from specific platform optimization ( graphics comes to mind ).

However even in the latter case, I'd imagine the platform specific and fallback platform agnostic code will be within 10-50% performance of each other. Meaning a particularly well designed chip could make the platform agnostic code cheaper on either a raw performance basis or cost/performance basis.

scythe5y ago

If you use a VM language like Java, Ruby, etc, that work is largely abstracted.

tyingq5y ago

True, though the work/fixes sometimes take a while to flow down. One example: https://bugs.openjdk.java.net/browse/JDK-8255351

foobiekr5y ago

I honestly don’t know why you put Go or the JVM in this list. It isn’t that the language used properly has sane semantics in multithreaded code, it’s that generations of improper multithreaded code have appeared to work because the x86 memory semantics have covered up an unexpressed dependency that should have been considered incorrect.

j / k navigate · click thread line to collapse