Creating a Fibonacci Generator in Assembly (opens in new tab)

(blog.seso.io)

50 pointswillvk7y ago35 comments

35 comments

In Forth language would be:

  : fib 2dup + ;

2dup word duplicates n and n-1 elements and put it on the top of the stack.

+ word takes n and n-1 elements from the stack and sum them. The result goes on top of the stack.

Just start with 1 and 1 at the top of stack.

willvkOP7y ago

That's some very concise code golf! It would be interesting in seeing how Forth compiles this under the hood.

pjc507y ago

Normally Forth is interpreted. Compiled Forth code is unlikely to look particularly clean or elegant.

(If you like languages with a very high density of functionality to symbol count, you should look at APL and its successor J)

userbinator7y ago

As someone with considerable experience in using x86 Asm (over a decade), it's always nice to see more articles on it, but there are a few things I found odd about this one.

The basis of the algorithm to use was decided to be a space-optimised method using a dynamic programming approach. This effectively means using an array on the stack to hold our values while giving O(n) time and space complexity.

That isn't "space-optimised" at all --- when I think of a Fibonacci example I think of O(1) space. Unless it was chosen specifically to demonstrate array indexing (in which case it would be better to use an algorithm that actually requires an array, like sorting), the choice of algorithm is unusual. The same applies to finding the length of argv[1] before doing the equivalent of an atoi() on it --- if you're going to read the bytes of the string through, might as well accumulate and convert, and oddly enough the length appears to be unused too.

The unexplained and useless nop is also puzzling. Why is it there? What is it for? The same goes for the many other instructions which aren't actually contributing to the calculation, and initially made me think it was compiler output when I first glanced over the code. Writing a program that performs the given task is relatively straightforward even in Asm, but your solution is unnecessarily complex.

Making things easier to understand using function calls

I consider this a premature abstraction. A comment or even label will do just fine to delimit the blocks of functionality, as your "functions" are called only once. Your task has 3 parts [basically atoi(), fib(), itoa()+output] so you will have 3 blocks of code. It cannot be any simpler than that. Better is "making things easier to understand by avoiding unnecessary code".

Here is how I would write the first 2 blocks; I wrote this without testing at all so it could have errors, but it is really quite simple:

        mov esi, [esp+8]  ; get argv[1]
        xor eax, eax      ; clear the upper bits
        xor ecx, ecx      ; will hold converted value
    atoiloop:
        lodsb             ; get a character
        cmp al, '0'
        jb fibinit
        cmp al, '9'
        ja fibinit
        imul ecx, ecx, 10
        lea ecx, [ecx+eax-'0']
        jmps atoiloop
    fibinit:
        jecxz invalid       ; can't do fib(0)
        xor eax, eax
        inc eax             ; eax = 1
        cdq                 ; edx = 0
    fibloop:
        xadd eax, edx
        loop fibloop
        ; now eax and edx will contain fib(n) and fib(n-1)

Observe that I chose the specific registers for a reason: esi/al will be used by lodsb, ecx by the looping instructions, and edx for xadd. This is one of the things that compilers are not very good at, and also why carefully written Asm can easily beat them.

If you want to learn more Asm I recommend any assembler besides GAS, and looking at some examples from advanced users: http://www.hugi.scene.org/compo/compoold.htm The demoscene in its smaller-size competitions has lots of particularly knowledgeable users too.

willvkOP7y ago

That's some awesome feedback. Thanks so much. As I'm teaching myself asm I'm learning as I go. I'll incorporate it all into a revised version soon.

pkilgore7y ago

Quick suggestion to massively improve your writing for general consumption: strip out the passive voice where not absolutely necessary, e.g., "was decided". Not only is the active voice "I/we decided" generally much more interesting to read, it usually forces you to adopt a sentence structure with fewer words (easier to read). Your writing later in the article is much more engaging as you start to use the active voice.

And thanks for the article, I love reading interesting excursions in low level programming!

willvkOP7y ago

Thanks for reading, and thanks for the feedback! That's a good tip on the passive/active voice. I did actually think about standardising that but let it (unfortunately) slide as I thought I would go from my initial thinking to more instructional but that's probably not as effective and engaging to the reader the whole way through. Will remember that tip for my next post! Thanks again!

exikyut7y ago

I mentioned this elsewhere, but just to make sure you see it, I also stumbled on https://codegolf.stackexchange.com/a/135618/15675, which may also be useful.

willvkOP7y ago

Thanks for that. Yep, I've seen that too.

pjc507y ago

Your article is actually pretty handy in terms of setting up the "workflow" for compilation and debugging from scratch. It's the sort of thing genuine beginners need to avoid getting tripped up.

willvkOP7y ago

Thank-you for your kind words about the post. That was what I was trying to do with it - to break down some of the nebulous fear that beginners have with assembly. Cheers!

saagarjha7y ago

> The unexplained and useless nop is also puzzling. Why is it there? What is it for? The same goes for the many other instructions which aren't actually contributing to the calculation, and initially made me think it was compiler output when I first glanced over the code. Writing a program that performs the given task is relatively straightforward even in Asm, but your solution is unnecessarily complex.

It is explained:

> Line 5 is a no-operation (or noop), which means that it is a filler instruction and doesn’t actually do anything but take up a CPU clock cycle. Older versions of GDB require this proceeding _start, and so I have left it in for these examples.

userbinator7y ago

So it is. It would've been much clearer to mention that at the beginning when it's first introduced, since it's not commented like the other instructions and I first thought it was a left-over placeholder.

That said, it must be a very old GDB since I've otherwise not ever seen a NOP as the very first instruction of any nontrivial program in all these years... maybe two NOPs in the early days for patching, but that's about it.

willvkOP7y ago

I agree the structure of some of my explanations could have been a bit better. The book I was using as a basis was from the mid-90s so it would be a version of GDB from 20 years ago that requires the NOP.

exikyut7y ago

As someone interested in understanding/getting into assembly language (actually for years), I have a small torrent of naive and hopefully not too annoying questions. :)

- Why and how is argv[1] at [esp+8]? I realize this is a Linux-specific implementation detail but would like to understand.

- argv[1] is "gotten" by reading from esi. You then proceed to poke eax and ecx. TIL that SI, AX and CX were implicitly linked. What specifically is going on here?

- Continuing along the implicit linked-ness train, how is accessing AL reading the value set in SI?

- How does the LEA usage here work? You're putting the address of "[ecx+eax-'0']" into ecx. First question, how is ecx not clobbered? Secondly, how does that dereference work? I see similar semantics used in the 2nd and 3rd instructions, it seems AX and CX are linked in some way (in this situation).

- So the jecxz is bailing out (to an assumed label, that's okay) if ecx is 0. Cool. But why do you now zero and then increment eax?

- At the end you have a loop around a single xadd instruction, (which, if I understand what's happening, is the equivalent of "edx += eax" here?). The only way I can interpret this is that the atoiloop routine was actually building up a stack (in the LEA bit) and this is ticker-tape-ing its way through that?

I appreciate any insight and your patience :) if I'm learning something unfamiliar enough to be completely disorientating, reading through insights others have written down (ie, the questions others asked and documented the answers to) doesn't seem to help. I have to ask a thousand questions of my own in order to get my bearings.

This could be the fault of the assembly "tutorials" out there though. I can't remember the number of articles I've eye-glazedly read through; I understand binary, the stack, the basics, etc, incredibly incredibly well, but (as you can see) have tons of blocking issues and holes in my understanding. Some I'm aware of - for example, I know I have no mental model of how x86 memory management is done, and I've never been able to find anything concise that just deals with that - but unfortunately some of the holes are so big all I know is that something's missing and that my eyes glaze over and I don't know why. In any case, learning difficulties FTL. Been wanting to understand asm since 2006.

NB. While responding to another answer I stumbled on https://codegolf.stackexchange.com/a/135618/15675, a(n exhaustingly) exhaustive exploration of Fibonacci in x86, also for Linux.

saagarjha7y ago

> Why and how is argv[1] at [esp+8]? I realize this is a Linux-specific implementation detail but would like to understand.

The kernel puts them there when the program starts; I believe this is part of the System V calling convention so it's not Linux-specific.

> argv[1] is "gotten" by reading from esi. You then proceed to poke eax and ecx. TIL that SI, AX and CX were implicitly linked. What specifically is going on here?

They are not linked, the lodsb instruction loads a byte from the address pointed at by esi into ax.

> How does the LEA usage here work? You're putting the address of "[ecx+eax-'0']" into ecx. First question, how is ecx not clobbered? Secondly, how does that dereference work? I see similar semantics used in the 2nd and 3rd instructions, it seems AX and CX are linked in some way (in this situation).

lea just does math, it doesn't dereference anything. ecx is being clobbered, but that's what we want: the lea is doing the equivalent of ecx += eax - '0'.

> So the jecxz is bailing out (to an assumed label, that's okay) if ecx is 0. Cool. But why do you now zero and then increment eax?

This will be the return value of main and the exit code of the program. Since you errored, you want to return a nonzero value.

exikyut7y ago

>> argv[1] is "gotten" by reading from esi. You then proceed to poke eax and ecx. TIL that SI, AX and CX were implicitly linked. What specifically is going on here?

> Where do you see this?

Here:

  mov esi, [esp+8]  ; get argv[1]                value written to esi; okay
  xor eax, eax      ; clear the upper bits       \ how is writing to eax
  xor ecx, ecx      ; will hold converted value  / and ecx now relevant?!

> lea just does math, it doesn't dereference anything.

Right; I said "LEA usage"; the instruction itself I understand, but "[ecx+eax-'0']" is throwing me for a loop.

2 more replies

userbinator7y ago

They are not linked, the lodsb instruction loads a byte from the address pointed at by esi into ax.

lodsb (b for Byte) loads into al, not ax. It's lodsw (Word) which uses ax, and of course there's lodsd for eax and even lodsq for rax (64-bit only).

This will be the return value of main and the exit code of the program. Since you errored, you want to return a nonzero value.

Actually that's the non-error path; eax is set to 1 and edx to 0 as initial values for computing fib(n) in the fibloop.

1 more reply

exikyut7y ago

NB Addendum: https://i.imgur.com/u6jtIAZ.png

exikyut7y ago

FWIW, the article mentions a possible followup "using ... MMX, XMM or SIMD". (To clarify, XMM is SSE.) SIMD is noted last as though it's the newest development, when that would arguably be is at least AVX, and AVX-512 to be specific.

willvkOP7y ago

Thanks for that. I didn't intend for it to be a chronological list but will add AVX to it. Cheers.

exikyut7y ago

Oh, okay, my bad!

And wow, fibonacci using hand-rolled AVX :) that would definitely be a sight to see.

...although, I should explicitly point out, AVX is a set of vector streaming/processing instructions. Fibonacci is inherently a single-step operation, with no lookahead.

Okay, so I googled "fibonacci avx". The results were quite a mess but some careful wading found me http://www.insideloop.io/blog/2014/10/14/arrays-part-iii-vec..., which states: "A a consequence, the following code that computes the sequence of Fibonacci can’t get vectorized."

So, forgive my own (bullish!) naivete. AVX, and _possibly_ SIMD, SSE, etc _may not_ be applicable to/usable for Fibonacci computation.

userbinator7y ago

Vector instructions are certainly usable for scalar computation, although it doesn't make much sense --- you basically just use one piece of one vector and throw away all the other results.

Fibonacci is a toy example but illustrates where the vector instructions do become useful: if you want to compute the n'th term of various Fibonacci-like sequences with different starting conditions but the same recurrence, then you could do them in parallel using those instructions.

leghifla7y ago

Unrolling the computation of the series, it is possible to compute f(n) using only f(n-4) and f(n-8):

f(n) = 7*f(n-4)-f(n-8)

So, using 4-vector operations, you can compute 4 numbers in one step.

pq0ak2nnd7y ago

every programmer needs to do this at some point in their life. it is too easy to take take modern languages for granted. but i'm a little disappointed that you dogged `make`. i think the programming landscape would be a lot more effective if people took the time to learn about what already exists (not just make but autoconf/automake/configure) instead of reinventing the wheel every three years.

willvkOP7y ago

Thanks for reading and thanks for the feedback. It wasn't my intention to 'dog' make, I just felt that for an entry level post that introducing the make syntax may detract from the focus on assembly (although make is an integral part of gas asm) and add extra complexity and length to the (already very long) article. Anything more complex, such as multiple .s files would definitely warrant a description of make. Thanks again.

emilfihlman7y ago

  movl $1, %eax	movl	$1	%eax	copy 0 into register eax
  movl $0, %ebx	movl	$0	%ebx	copy 1 into register ebx

Surely the explanations are flipped?

willvkOP7y ago

Yep. It's a typo. Thanks for the feedback.

saagarjha7y ago

I think you might be missing the NULL that terminates argv in your stack diagram.

willvkOP7y ago

Thanks for that. I've amended the diagram accordingly.

sdfsdfsdff7y ago

It's also a problem in 7 Billion Humans or Human Resource Machine (I forgot which - they are both great anyway), fun programming games that also use a kind of assembly language.

j / k navigate · click thread line to collapse

35 comments

emersonrsantos7y ago

In Forth language would be:

  : fib 2dup + ;

2dup word duplicates n and n-1 elements and put it on the top of the stack.

+ word takes n and n-1 elements from the stack and sum them. The result goes on top of the stack.

Just start with 1 and 1 at the top of stack.

willvkOP7y ago

That's some very concise code golf! It would be interesting in seeing how Forth compiles this under the hood.

pjc507y ago

Normally Forth is interpreted. Compiled Forth code is unlikely to look particularly clean or elegant.

(If you like languages with a very high density of functionality to symbol count, you should look at APL and its successor J)

userbinator7y ago

As someone with considerable experience in using x86 Asm (over a decade), it's always nice to see more articles on it, but there are a few things I found odd about this one.

Making things easier to understand using function calls

Here is how I would write the first 2 blocks; I wrote this without testing at all so it could have errors, but it is really quite simple:

        mov esi, [esp+8]  ; get argv[1]
        xor eax, eax      ; clear the upper bits
        xor ecx, ecx      ; will hold converted value
    atoiloop:
        lodsb             ; get a character
        cmp al, '0'
        jb fibinit
        cmp al, '9'
        ja fibinit
        imul ecx, ecx, 10
        lea ecx, [ecx+eax-'0']
        jmps atoiloop
    fibinit:
        jecxz invalid       ; can't do fib(0)
        xor eax, eax
        inc eax             ; eax = 1
        cdq                 ; edx = 0
    fibloop:
        xadd eax, edx
        loop fibloop
        ; now eax and edx will contain fib(n) and fib(n-1)

willvkOP7y ago

That's some awesome feedback. Thanks so much. As I'm teaching myself asm I'm learning as I go. I'll incorporate it all into a revised version soon.

pkilgore7y ago

And thanks for the article, I love reading interesting excursions in low level programming!

willvkOP7y ago

exikyut7y ago

I mentioned this elsewhere, but just to make sure you see it, I also stumbled on https://codegolf.stackexchange.com/a/135618/15675, which may also be useful.

willvkOP7y ago

Thanks for that. Yep, I've seen that too.

pjc507y ago

Your article is actually pretty handy in terms of setting up the "workflow" for compilation and debugging from scratch. It's the sort of thing genuine beginners need to avoid getting tripped up.

willvkOP7y ago

Thank-you for your kind words about the post. That was what I was trying to do with it - to break down some of the nebulous fear that beginners have with assembly. Cheers!

saagarjha7y ago

It is explained:

userbinator7y ago

willvkOP7y ago

exikyut7y ago

As someone interested in understanding/getting into assembly language (actually for years), I have a small torrent of naive and hopefully not too annoying questions. :)

- Why and how is argv[1] at [esp+8]? I realize this is a Linux-specific implementation detail but would like to understand.

- argv[1] is "gotten" by reading from esi. You then proceed to poke eax and ecx. TIL that SI, AX and CX were implicitly linked. What specifically is going on here?

- Continuing along the implicit linked-ness train, how is accessing AL reading the value set in SI?

- So the jecxz is bailing out (to an assumed label, that's okay) if ecx is 0. Cool. But why do you now zero and then increment eax?

NB. While responding to another answer I stumbled on https://codegolf.stackexchange.com/a/135618/15675, a(n exhaustingly) exhaustive exploration of Fibonacci in x86, also for Linux.

saagarjha7y ago

> Why and how is argv[1] at [esp+8]? I realize this is a Linux-specific implementation detail but would like to understand.

The kernel puts them there when the program starts; I believe this is part of the System V calling convention so it's not Linux-specific.

> argv[1] is "gotten" by reading from esi. You then proceed to poke eax and ecx. TIL that SI, AX and CX were implicitly linked. What specifically is going on here?

They are not linked, the lodsb instruction loads a byte from the address pointed at by esi into ax.

lea just does math, it doesn't dereference anything. ecx is being clobbered, but that's what we want: the lea is doing the equivalent of ecx += eax - '0'.

> So the jecxz is bailing out (to an assumed label, that's okay) if ecx is 0. Cool. But why do you now zero and then increment eax?

This will be the return value of main and the exit code of the program. Since you errored, you want to return a nonzero value.

exikyut7y ago

>> argv[1] is "gotten" by reading from esi. You then proceed to poke eax and ecx. TIL that SI, AX and CX were implicitly linked. What specifically is going on here?

> Where do you see this?

Here:

  mov esi, [esp+8]  ; get argv[1]                value written to esi; okay
  xor eax, eax      ; clear the upper bits       \ how is writing to eax
  xor ecx, ecx      ; will hold converted value  / and ecx now relevant?!

> lea just does math, it doesn't dereference anything.

Right; I said "LEA usage"; the instruction itself I understand, but "[ecx+eax-'0']" is throwing me for a loop.

2 more replies

userbinator7y ago

They are not linked, the lodsb instruction loads a byte from the address pointed at by esi into ax.

lodsb (b for Byte) loads into al, not ax. It's lodsw (Word) which uses ax, and of course there's lodsd for eax and even lodsq for rax (64-bit only).

This will be the return value of main and the exit code of the program. Since you errored, you want to return a nonzero value.

Actually that's the non-error path; eax is set to 1 and edx to 0 as initial values for computing fib(n) in the fibloop.

1 more reply

exikyut7y ago

NB Addendum: https://i.imgur.com/u6jtIAZ.png

exikyut7y ago

willvkOP7y ago

Thanks for that. I didn't intend for it to be a chronological list but will add AVX to it. Cheers.

exikyut7y ago

Oh, okay, my bad!

And wow, fibonacci using hand-rolled AVX :) that would definitely be a sight to see.

...although, I should explicitly point out, AVX is a set of vector streaming/processing instructions. Fibonacci is inherently a single-step operation, with no lookahead.

So, forgive my own (bullish!) naivete. AVX, and _possibly_ SIMD, SSE, etc _may not_ be applicable to/usable for Fibonacci computation.

userbinator7y ago

Vector instructions are certainly usable for scalar computation, although it doesn't make much sense --- you basically just use one piece of one vector and throw away all the other results.

leghifla7y ago

Unrolling the computation of the series, it is possible to compute f(n) using only f(n-4) and f(n-8):

f(n) = 7*f(n-4)-f(n-8)

So, using 4-vector operations, you can compute 4 numbers in one step.

pq0ak2nnd7y ago

willvkOP7y ago

emilfihlman7y ago

  movl $1, %eax	movl	$1	%eax	copy 0 into register eax
  movl $0, %ebx	movl	$0	%ebx	copy 1 into register ebx

Surely the explanations are flipped?

willvkOP7y ago

Yep. It's a typo. Thanks for the feedback.

saagarjha7y ago

I think you might be missing the NULL that terminates argv in your stack diagram.

willvkOP7y ago

Thanks for that. I've amended the diagram accordingly.

sdfsdfsdff7y ago

It's also a problem in 7 Billion Humans or Human Resource Machine (I forgot which - they are both great anyway), fun programming games that also use a kind of assembly language.

j / k navigate · click thread line to collapse