Good question. Why aren't processors optimized for functional programming? I think one reason is because of the way hardware memory works. So to answer what else you need, one other thing you need is physical memory and a way to organize the data in it. Forgive me for what follows if I misunderstand some functional programming aspect, because I'm primarily a digital hardware person, and programming in general is not my forte. Also I apologize if I explain things you already knew.
In hardware, CPU instructions are read sequentially from memory. These instructions are pretty basic... add two numbers, load a data word from a certain memory address, jump to a new address if two numbers are equal, etc. Modern processors do have some pretty fancy instructions but what I said is still basically true. So those instructions are our primitives. The only way to make abstract procedures from those primitives, from the perspective of assembly programming, is to make a sequential series of these primitive instructions starting at a known address, and then branch to that address and read those instructions in order. When you abstract many times out, as is common in functional programming, there starts to be a lot to keep track of. When you evaluate a function that is very abstract, how it looks in hardware is a whole lot of branching and returning to and from different memory addresses. Not necessarily bad, but it's starting to look pretty different in code vs. in hardware. And if you branch to uncached ("unexpected") locations, you add latency when you have to fetch instructions from RAM. You also have to keep track of any data needed at a higher level of the function, which necessitates automated memory management, including garbage collection. These things can introduce a lot of overhead in the program, especially when you have things in it like deep recursion.
tl;dr: There's no such thing as stateless assembly programming.