Ask HN: What is hand-coded assembly language used for these days?

32 pointsbkovitz16y ago58 comments

What is hand-coded assembly language used for these days?

To put that another way, in the current marketplace, what kinds of program are so worthy of optimization that it's economically sensible to have a human spend several days hand-tuning machine language to squeeze out every CPU cycle?

58 comments

chadaustin16y ago

IMVU hand-rolled its SSE skinning loops and parts of the software 3D lighting code, because only 2/3 of our customers have GPUs. We need to run well on five-year-old Dells with Intel graphics. (Direct3D on Intel isn't as good as a dedicated software renderer. We chose RAD's Pixomatic.)

In addition, look at how popular netbooks are becoming. The Intel Atom is an in-order CPU. Imagine a hyperthreaded, 1.6 GHz 486...

On the iPhone it's even worse. It's got a decent vector unit, but the CPU is very slow. You'll see great wins by doing your 3D math yourself.

As we continue to become multicore, I could imagine somebody shaving a couple cycles out of the core message passing routines, though you're almost certainly bus bound in those situations...

Computers are getting smaller and people want more out of them; assembly language is back in style!

chadaustin16y ago

Oh, I forgot about another huge case for writing assembly language... These days, we have tons of different languages talking to each other, all in the same program. In Mozilla, for example, JavaScript talks to C++ objects via XPCOM/XPIDL. Since the C++ objects expect data laid out on the C stack in a certain order and JavaScript has no notion of the C stack, there is a bit of platform-specific assembly code in the middle that takes the JavaScript values, places them on the stack, and jumps into the C++.

I'm guessing that most languages with built-in foreign function interfaces (like Python's ctypes) have similar thunking layers.

robin_reala16y ago

Mozilla also use assembly for heavily used computationally expensive operations like image decoding. mmoy’s work is worth looking at:

https://bugzilla.mozilla.org/buglist.cgi?query_format=advanc...

WalkingDead16y ago

Exactly for this reason I found myself writing assembly again: Invoke C functions from a scripting language. The last I wrote was probably 15 years back.

luu16y ago

I did some assembly optimization for an internal RTL-level simulator. We had ~1000 machines on a three year upgrade cycle, i.e., we upgraded 333 machines / year = $333k / year. Lets say I cost the company $200k / year. Several days = perhaps $2k, so I'd only need to get a .6% speedup for it to be worth it, not even including the cost of powering and maintaining our machines.

When I worked on it, our simulator was an order of magnitude faster than commercially available simulators (Synopsis VCS and Cadence NC-Verilog), which cost between $1k and $10k per license per year. I worked for a tiny hardware startup; established hardware companies use a few orders of magnitude more compute power than we did, so the equation is probably at least four orders of magnitude further in favor of doing assembly optimization in a commercial simulator.

bkovitzOP16y ago

Thanks! Especially for the numbers. I had not even heard of RTL simulation before. Wow, extremely cool.

DarkShikari16y ago

Anything that's worth spending time to do fast is worth spending time writing SIMD assembly for.

You can get 5x, 10x, 20x, or more performance increases just by using the vector instructions given to you by the CPU. Until a magic compiler appears that can make proper use of them (read: never), hand-coded assembly will be critical for almost any application for which performance is critical, especially multimedia processing.

gruseom16y ago

A compiler can't be made to make proper use of vector instructions? Why is that?

DarkShikari16y ago

It is an extraordinarily difficult problem to transform scalar code into vector instructions. The only way to get even passable output from a vectorizing compiler is to write the code as vectors to begin with, such as with cross-platform assembly tools like Orc.

And even then you'll often end up significantly worse off than if you wrote the assembly by hand.

A run of Intel's compiler on the C versions of our DSP functions resulted in a grand total of one vectorization, which was done terribly, too.

1 more reply

JoachimSchipper16y ago

As I understand - and I may be mistaken - the problem is that it's very hard to make sure that the emitted code is correct in the presence of raw pointers à la C.

I am told that Fortran does better than C here; there is a reason it is still widely used in the scientific computing community, after all.

This is also part of the reasoning behind C99's new restrict keyword.

angelbob16y ago

In many cases it can, but then you have to be sure that it keeps doing so. You can write test cases for that, but in many cases it's easier to verify by just using the instructions directly.

If you don't write the compiler, you have to make assumptions about when and how it can/will use those instructions, and often you assume wrong, particularly across compiler upgrades.

ntoshev16y ago

Well, you can still use C with GCC and Intel intrinsics I guess.

DarkShikari16y ago

Intrinsics are still worse (performance-wise) than hand-coded assembly, for two reasons.

The first reason is the whole category of optimizations that the compiler is worse than a human at (like register allocation) or cannot do effectively at all (messing with calling conventions, computed jumps).

The second reason is more subtle: in any case that you abstract yourself from some part of a problem, you inherently create a less efficient solution.

For example, intrinsics mean that you don't have to manually allocate registers. But this also means that if your algorithm uses too many registers and it would be more efficient to modify it to require fewer (and thus not need spills), you will have no way of knowing such a thing. By insulating yourself from that layer of complexity, you've also limited your ability to make higher-level optimizations that improve lower-level performance.

This applies on practically every level possible: any method of abstraction, no matter how well designed, will always in some fashion reduce the maximum performance you can achieve. Of course, this doesn't mean abstraction is bad--it provides an often-useful tradeoff between developer time and performance.

1 more reply

a-priori16y ago

Signal processing algorithms on the phones made by a certain company I worked at are mostly written in assembly. The cellular protocols, at least those that use time-division (e.g. GSM), have strict real-time constraints, but mostly they use assembly because every microsecond you can shave off those algorithms is a microsecond you can sleep and conserve power.

maryrosecook16y ago

Joshua Block, Chief Java Architect at Google, says in Coders at Work:

"But for the absolute core of the system—the inner loops of the index servers, for instance—very small gains in performance are worth an awful lot. When you have that many machines running the same piece of code, if you can make it even a few percent faster, then you’ve done something that has real benefits, financially and environmentally. So there is some code that you want to write in assembly language."

DCoder16y ago

Debugging and reverse engineering games.

When publishers/developers don't give a bleep, the fans take up the task of fixing the bugs themselves. I happen to run one such project in my spare time (for C&C: Red Alert 2), and it's amazing how much stuff is broken. It's not as "serious" as other projects mentioned here, but still a reason to know ASM. (And a good way to see bad programming practices in action :) )

smiler16y ago

People are still playing C&C: RA2?!

listic16y ago

Come on, people are playing all kinds of games!

I'm sure people still play Master of Magic, a strategic game from 1994. I've been playing it on and off since it came out, and I began to think - is it something wrong with me that I like this old game so much? I mean, surely there must be newer games that are better. I showed it to my teenage brother in the mid 2000's and he loved it too. I had my non-computer-gaming friends blown away by the original Heroes of Might and Magic (1995).

I think the world needs better means for preservation of old computer games.

1 more reply

andrewf16y ago

Going the other way around - can anyone think of an open source project that could benefit from some assembly optimization, and isn't? I'd love an excuse to play with this stuff in a useful fashion.

(I love what I do, but my twelve year old self would be disgusted that I'm not writing games.)

jermy16y ago

video playback and compression is sufficiently cool, and could benefit from optimisation. Have a look to see if any help is needed in the ffmpeg tree at the moment.

bilbo0s16y ago

Medical Imaging and Oil Exploration. A lot of the really fast packages are using ARB Assembly instead of GLSL to minimize the number of instructions per voxel. It adds up if you are doing 4D imaging in real time for instance.

nvoorhies16y ago

In addition to the optimization reasons, you also end up coding assembly by hand to tickle features in the verification and bringup of new processors and/or processor architectures.

Since a lot of the bugs therein may be dependent on a certain sequence of instructions, doing it in a high level language doesn't make any sense.

reedlaw16y ago

Microcontroller firmware. There are many examples of AVR code in assembly on the web. I learned assembly this way. It really makes sense when you're working on bare hardware with no abstraction layers in the way. Also, it's useful for time-critical applications such as creating video signals or audio processing.

angelbob16y ago

GPGPU stuff -- that is, using your graphics processor for random programming tasks. While something like <a href="http://www.nvidia.com/object/cuda_home.html>CUDA</a&...; reduces the need to write assembly-like code, it also reduces the available speed substantially.

For that matter, CUDA (and ATI's Bare-Metal Interface, which is similar) is more assembly-like than C-like in many ways. So even using the higher-level available language is still pretty much like assembly.

You tend to only write these things when you're going to be running a lot of elements through, so almost everything you do in these platforms is inner-loop, or you'd be using a different tool. So even small speed-ups tend to matter.

mfukar16y ago

Contrary to popular belief, assembly is not only used for performance. Ask the security industry for more info.

tptacek16y ago

That has more to do with reading assembly than writing it by hand.

JoachimSchipper16y ago

There are certainly people who write shellcode. As I understand it, people have written shellcodes that use only bytes that happen to map to ASCII, are obfuscated to bypass intrusion detection systems, and so on. I'm sure it requires quite a bit of (specialized) knowledge.

2 more replies

mfukar16y ago

Actually, no. If you're dealing with obfuscation, IDS and antivirus evasion etc. you need to know how to read, write and otherwise manipulate assembly code (debug, self-modification, name-it).

1 more reply

rythie16y ago

It's used in bits of kernel programming - often because there is no other way to do the task.

tptacek16y ago

Modern kernels have very little assembly, outside of things like locore. They've heavily abstracted away the things you'd normally write in assembly, like modifying MSRs; also, so much of what you do now is simply memory mapped.

In all of xnu, not counting AES, there are ~17kloc in x86 assembly, most of it in osfmk/i386 --- where no normal developer is ever going to go. There are over 730kloc in C.

rythie16y ago

I wasn't implying there was a lot, just that sometimes that's the only way.

1 more reply

Locke168916y ago

I work in virtual machine development, so a portion of the interface code for hardware virtualization I wrote in straight ASM. This is not (exactly) for speed reasons, though; it's just impossible to touch the hardware at that level in C. :)

daeken16y ago

Compiler intrinsics, binary patches and hooks (although EasyHook has made assembly a rarity here outside of the occasional shim where odd calling conventions are used), in-process debuggers, low-level bootloaders, hardware initialization/management, various thunking mechanisms.

Others have covered the optimization side of things well so I won't repeat it, but there are tiny fragments of assembly all over the place -- they hold your system together.

vabmit16y ago

I've done it for cryptography code and cryptanalysis code. Specifically, optimizing code to take advantage of specific instructions available in certain processors or to make use of vector registers and instructions. I wrote my programs in C and then went back and wrote assembly for parts of the code that could deliver a significant overall speedup with hand optimization.

One place I did this was various RSA Challenge attack clients.

bkovitzOP16y ago

Thanks to all for the many informed and detailed replies!

I am now assistant-teaching a college course in low-level computer programming. It's an excellent course: the students reprogram a children's toy robot that uses the ARM processor. http://www.amazon.com/Little-Tikes-Giggles-Remote-Control/dp... They're getting up to speed very quickly on how to get hardware to actually do stuff.

Yes, I actually left Silicon Valley to do grad school. I haven't given up the principle of "do real stuff, see real results", though. I'm looking to design a couple fairly small homework assignments consisting of optimizing some ARM code. I want the examples to be real. Now mulling over which to do...

gte910h16y ago

Lots of time small embedded programs, especially on underpowered micro's, see this sort of attention.

Additionally, low level hardware interfacing is often done with hand coded assembly, because it is easier to "get right" on some crappy compiler toolchains that you face, then C.

tptacek16y ago

Debugging and performance monitoring.

jonah16y ago

Inner loops of graphics algorithms - Picasa, Photoshop, etc.

onewland16y ago

Yes.

Tangentially related but not quite the same, I work at a company that makes barcode recognition software and some of our most performance-sensitive areas use assembly. It is mostly C, though.

scumola16y ago

I've re-written many perl things in C to speed up processing time - for me, it's better than buying more/newer hardware. Also, for smallish scripts that I invoke millions of times or a small perl script that does regexps, I can rewrite those in C to boost speed as well. I don't do any ASM code anymore, but C is a really good optimization step for me and my projects.

brg16y ago

Supporting backwards compatibility in vtable lookups. This is becoming increasingly important with COM.

nirmal16y ago

Not a marketplace use, but the most recent use I've had for Assembly was in an Atari 2600 programming class.

http://nirmalpatel.com/hacks/atari.html

njn16y ago

Some weirdos just plain like it more than high-level languages. One of those weirdos is developing the Linoleum language: http://anywherebb.com/bb/index.php?l=D4JeGEdhacS6Srr6QLQgZpW...

zokier16y ago

I, for one, think that Assembly is just beautiful in its simplicity.

daeken16y ago

Having read and written assembly on a daily basis for years, I have to disagree entirely. The only simple thing about assembly is that it happens to map to machine code directly, but macros and quasi-instructions even make that iffy. There are so many idiosyncrasies in every ISA, so many ways in which the code you write has side effects. Assembly isn't just complex in practice, it's complex in concept.

If you want simplicity, you look at lisps; homoiconicity is perhaps the most elegant, simple concept known in computing. It may be more complex in practice (many more layers above the bare metal), but in concept it's simply beautiful.

1 more reply

j / k navigate · click thread line to collapse

58 comments

chadaustin16y ago

In addition, look at how popular netbooks are becoming. The Intel Atom is an in-order CPU. Imagine a hyperthreaded, 1.6 GHz 486...

On the iPhone it's even worse. It's got a decent vector unit, but the CPU is very slow. You'll see great wins by doing your 3D math yourself.

As we continue to become multicore, I could imagine somebody shaving a couple cycles out of the core message passing routines, though you're almost certainly bus bound in those situations...

Computers are getting smaller and people want more out of them; assembly language is back in style!

chadaustin16y ago

I'm guessing that most languages with built-in foreign function interfaces (like Python's ctypes) have similar thunking layers.

robin_reala16y ago

Mozilla also use assembly for heavily used computationally expensive operations like image decoding. mmoy’s work is worth looking at:

https://bugzilla.mozilla.org/buglist.cgi?query_format=advanc...

WalkingDead16y ago

Exactly for this reason I found myself writing assembly again: Invoke C functions from a scripting language. The last I wrote was probably 15 years back.

luu16y ago

bkovitzOP16y ago

Thanks! Especially for the numbers. I had not even heard of RTL simulation before. Wow, extremely cool.

DarkShikari16y ago

Anything that's worth spending time to do fast is worth spending time writing SIMD assembly for.

gruseom16y ago

A compiler can't be made to make proper use of vector instructions? Why is that?

DarkShikari16y ago

And even then you'll often end up significantly worse off than if you wrote the assembly by hand.

A run of Intel's compiler on the C versions of our DSP functions resulted in a grand total of one vectorization, which was done terribly, too.

1 more reply

JoachimSchipper16y ago

As I understand - and I may be mistaken - the problem is that it's very hard to make sure that the emitted code is correct in the presence of raw pointers à la C.

I am told that Fortran does better than C here; there is a reason it is still widely used in the scientific computing community, after all.

This is also part of the reasoning behind C99's new restrict keyword.

angelbob16y ago

In many cases it can, but then you have to be sure that it keeps doing so. You can write test cases for that, but in many cases it's easier to verify by just using the instructions directly.

If you don't write the compiler, you have to make assumptions about when and how it can/will use those instructions, and often you assume wrong, particularly across compiler upgrades.

ntoshev16y ago

Well, you can still use C with GCC and Intel intrinsics I guess.

DarkShikari16y ago

Intrinsics are still worse (performance-wise) than hand-coded assembly, for two reasons.

The second reason is more subtle: in any case that you abstract yourself from some part of a problem, you inherently create a less efficient solution.

1 more reply

a-priori16y ago

maryrosecook16y ago

Joshua Block, Chief Java Architect at Google, says in Coders at Work:

DCoder16y ago

Debugging and reverse engineering games.

smiler16y ago

People are still playing C&C: RA2?!

listic16y ago

Come on, people are playing all kinds of games!

I think the world needs better means for preservation of old computer games.

1 more reply

andrewf16y ago

(I love what I do, but my twelve year old self would be disgusted that I'm not writing games.)

jermy16y ago

video playback and compression is sufficiently cool, and could benefit from optimisation. Have a look to see if any help is needed in the ffmpeg tree at the moment.

bilbo0s16y ago

nvoorhies16y ago

In addition to the optimization reasons, you also end up coding assembly by hand to tickle features in the verification and bringup of new processors and/or processor architectures.

Since a lot of the bugs therein may be dependent on a certain sequence of instructions, doing it in a high level language doesn't make any sense.

reedlaw16y ago

angelbob16y ago

mfukar16y ago

Contrary to popular belief, assembly is not only used for performance. Ask the security industry for more info.

tptacek16y ago

That has more to do with reading assembly than writing it by hand.

JoachimSchipper16y ago

2 more replies

mfukar16y ago

Actually, no. If you're dealing with obfuscation, IDS and antivirus evasion etc. you need to know how to read, write and otherwise manipulate assembly code (debug, self-modification, name-it).

1 more reply

rythie16y ago

It's used in bits of kernel programming - often because there is no other way to do the task.

tptacek16y ago

In all of xnu, not counting AES, there are ~17kloc in x86 assembly, most of it in osfmk/i386 --- where no normal developer is ever going to go. There are over 730kloc in C.

rythie16y ago

I wasn't implying there was a lot, just that sometimes that's the only way.

1 more reply

Locke168916y ago

daeken16y ago

Others have covered the optimization side of things well so I won't repeat it, but there are tiny fragments of assembly all over the place -- they hold your system together.

vabmit16y ago

One place I did this was various RSA Challenge attack clients.

bkovitzOP16y ago

Thanks to all for the many informed and detailed replies!

gte910h16y ago

Lots of time small embedded programs, especially on underpowered micro's, see this sort of attention.

Additionally, low level hardware interfacing is often done with hand coded assembly, because it is easier to "get right" on some crappy compiler toolchains that you face, then C.

tptacek16y ago

Debugging and performance monitoring.

jonah16y ago

Inner loops of graphics algorithms - Picasa, Photoshop, etc.

onewland16y ago

Yes.

Tangentially related but not quite the same, I work at a company that makes barcode recognition software and some of our most performance-sensitive areas use assembly. It is mostly C, though.

scumola16y ago

brg16y ago

Supporting backwards compatibility in vtable lookups. This is becoming increasingly important with COM.

nirmal16y ago

Not a marketplace use, but the most recent use I've had for Assembly was in an Atari 2600 programming class.

http://nirmalpatel.com/hacks/atari.html

njn16y ago

Some weirdos just plain like it more than high-level languages. One of those weirdos is developing the Linoleum language: http://anywherebb.com/bb/index.php?l=D4JeGEdhacS6Srr6QLQgZpW...

zokier16y ago

I, for one, think that Assembly is just beautiful in its simplicity.

daeken16y ago

1 more reply

j / k navigate · click thread line to collapse