: fib 2dup + ;
2dup word duplicates n and n-1 elements and put it on the top of the stack.+ word takes n and n-1 elements from the stack and sum them. The result goes on top of the stack.
Just start with 1 and 1 at the top of stack.
(If you like languages with a very high density of functionality to symbol count, you should look at APL and its successor J)
The basis of the algorithm to use was decided to be a space-optimised method using a dynamic programming approach. This effectively means using an array on the stack to hold our values while giving O(n) time and space complexity.
That isn't "space-optimised" at all --- when I think of a Fibonacci example I think of O(1) space. Unless it was chosen specifically to demonstrate array indexing (in which case it would be better to use an algorithm that actually requires an array, like sorting), the choice of algorithm is unusual. The same applies to finding the length of argv[1] before doing the equivalent of an atoi() on it --- if you're going to read the bytes of the string through, might as well accumulate and convert, and oddly enough the length appears to be unused too.
The unexplained and useless nop is also puzzling. Why is it there? What is it for? The same goes for the many other instructions which aren't actually contributing to the calculation, and initially made me think it was compiler output when I first glanced over the code. Writing a program that performs the given task is relatively straightforward even in Asm, but your solution is unnecessarily complex.
Making things easier to understand using function calls
I consider this a premature abstraction. A comment or even label will do just fine to delimit the blocks of functionality, as your "functions" are called only once. Your task has 3 parts [basically atoi(), fib(), itoa()+output] so you will have 3 blocks of code. It cannot be any simpler than that. Better is "making things easier to understand by avoiding unnecessary code".
Here is how I would write the first 2 blocks; I wrote this without testing at all so it could have errors, but it is really quite simple:
mov esi, [esp+8] ; get argv[1]
xor eax, eax ; clear the upper bits
xor ecx, ecx ; will hold converted value
atoiloop:
lodsb ; get a character
cmp al, '0'
jb fibinit
cmp al, '9'
ja fibinit
imul ecx, ecx, 10
lea ecx, [ecx+eax-'0']
jmps atoiloop
fibinit:
jecxz invalid ; can't do fib(0)
xor eax, eax
inc eax ; eax = 1
cdq ; edx = 0
fibloop:
xadd eax, edx
loop fibloop
; now eax and edx will contain fib(n) and fib(n-1)
Observe that I chose the specific registers for a reason: esi/al will be used by lodsb, ecx by the looping instructions, and edx for xadd. This is one of the things that compilers are not very good at, and also why carefully written Asm can easily beat them.If you want to learn more Asm I recommend any assembler besides GAS, and looking at some examples from advanced users: http://www.hugi.scene.org/compo/compoold.htm The demoscene in its smaller-size competitions has lots of particularly knowledgeable users too.
And thanks for the article, I love reading interesting excursions in low level programming!
It is explained:
> Line 5 is a no-operation (or noop), which means that it is a filler instruction and doesn’t actually do anything but take up a CPU clock cycle. Older versions of GDB require this proceeding _start, and so I have left it in for these examples.
That said, it must be a very old GDB since I've otherwise not ever seen a NOP as the very first instruction of any nontrivial program in all these years... maybe two NOPs in the early days for patching, but that's about it.
- Why and how is argv[1] at [esp+8]? I realize this is a Linux-specific implementation detail but would like to understand.
- argv[1] is "gotten" by reading from esi. You then proceed to poke eax and ecx. TIL that SI, AX and CX were implicitly linked. What specifically is going on here?
- Continuing along the implicit linked-ness train, how is accessing AL reading the value set in SI?
- How does the LEA usage here work? You're putting the address of "[ecx+eax-'0']" into ecx. First question, how is ecx not clobbered? Secondly, how does that dereference work? I see similar semantics used in the 2nd and 3rd instructions, it seems AX and CX are linked in some way (in this situation).
- So the jecxz is bailing out (to an assumed label, that's okay) if ecx is 0. Cool. But why do you now zero and then increment eax?
- At the end you have a loop around a single xadd instruction, (which, if I understand what's happening, is the equivalent of "edx += eax" here?). The only way I can interpret this is that the atoiloop routine was actually building up a stack (in the LEA bit) and this is ticker-tape-ing its way through that?
I appreciate any insight and your patience :) if I'm learning something unfamiliar enough to be completely disorientating, reading through insights others have written down (ie, the questions others asked and documented the answers to) doesn't seem to help. I have to ask a thousand questions of my own in order to get my bearings.
This could be the fault of the assembly "tutorials" out there though. I can't remember the number of articles I've eye-glazedly read through; I understand binary, the stack, the basics, etc, incredibly incredibly well, but (as you can see) have tons of blocking issues and holes in my understanding. Some I'm aware of - for example, I know I have no mental model of how x86 memory management is done, and I've never been able to find anything concise that just deals with that - but unfortunately some of the holes are so big all I know is that something's missing and that my eyes glaze over and I don't know why. In any case, learning difficulties FTL. Been wanting to understand asm since 2006.
NB. While responding to another answer I stumbled on https://codegolf.stackexchange.com/a/135618/15675, a(n exhaustingly) exhaustive exploration of Fibonacci in x86, also for Linux.
The kernel puts them there when the program starts; I believe this is part of the System V calling convention so it's not Linux-specific.
> argv[1] is "gotten" by reading from esi. You then proceed to poke eax and ecx. TIL that SI, AX and CX were implicitly linked. What specifically is going on here?
They are not linked, the lodsb instruction loads a byte from the address pointed at by esi into ax.
> How does the LEA usage here work? You're putting the address of "[ecx+eax-'0']" into ecx. First question, how is ecx not clobbered? Secondly, how does that dereference work? I see similar semantics used in the 2nd and 3rd instructions, it seems AX and CX are linked in some way (in this situation).
lea just does math, it doesn't dereference anything. ecx is being clobbered, but that's what we want: the lea is doing the equivalent of ecx += eax - '0'.
> So the jecxz is bailing out (to an assumed label, that's okay) if ecx is 0. Cool. But why do you now zero and then increment eax?
This will be the return value of main and the exit code of the program. Since you errored, you want to return a nonzero value.
And wow, fibonacci using hand-rolled AVX :) that would definitely be a sight to see.
...although, I should explicitly point out, AVX is a set of vector streaming/processing instructions. Fibonacci is inherently a single-step operation, with no lookahead.
Okay, so I googled "fibonacci avx". The results were quite a mess but some careful wading found me http://www.insideloop.io/blog/2014/10/14/arrays-part-iii-vec..., which states: "A a consequence, the following code that computes the sequence of Fibonacci can’t get vectorized."
So, forgive my own (bullish!) naivete. AVX, and _possibly_ SIMD, SSE, etc _may not_ be applicable to/usable for Fibonacci computation.
movl $1, %eax movl $1 %eax copy 0 into register eax
movl $0, %ebx movl $0 %ebx copy 1 into register ebx
Surely the explanations are flipped?