X86 mov is turing complete: mov-only compiler (opens in new tab)

(github.com)

238 pointsthejj10y ago54 comments

54 comments

M2Ys4U10y ago

Who needs instructions? Page faults are Turing complete: https://github.com/jbangert/trapcc

pokle10y ago

HN discussion of the paper that this is based on: https://news.ycombinator.com/item?id=6309631

jordigh10y ago

That paper is so funny:

    Finding Turing-completeness in unlikely places has long been a
    pastime of bored computer scientists.

And

    Removing all but the mov instruction from future iterations of the
    x86 architecture would have many advantages: the instruction
    format would be greatly simplified, the expensive decode unit
    would become much cheaper, and silicon currently used for complex
    functional units could be repurposed as even more cache. As long
    as someone else implements the compiler.

Well, I guess someone has implemented it now!

cwre10y ago

Also:

> Thus, while it has been known for quite some time that x86 has far too many instructions, we can now contribute the novel result that it also has far too many registers.

pslam10y ago

That's a fun program. Similar techniques to this are used to create "ROP" (return-oriented programming) exploits. You get enough building blocks to get general purpose programming, and it's "just" matter of mapping to those building blocks.

X86 "mov" is a bit of a cheat, though. It's a single mnemonic for what's really dozens of quite different instructions. For example, ARM has "mov" for register-register operations, but "ldr/str" for register-memory, while X86 just uses mov for everything. Even ARM mov optionally puts a barrel shifter into the path, so it's not that pure either.

userbinator10y ago

Even if they weren't a single mnemonic, if you read the paper you'll find that reg = mem[reg + C], mem[reg + C] = reg, and reg = C appear to be the only 3 instruction types that are actually needed; these are only register-memory and register-constant data transfers. Even register-register isn't needed.

awalton10y ago

register-register movs are needed because compilers are not the world's smartest code generators, and we don't have an infinite number of registers. The biggest problem arises when some calling convention expects certain things to be in certain registers - unless you do some pretty extreme long term planning in your register allocation code in the compiler, you're pretty likely to eventually wind up with some piece of information in a register that you need to move to another register before making or returning from a call.

Of course, modern processors don't really care about register-register moves because they have hundreds of hidden intermediate registers and register renaming which makes those types of moves "free." (e.g. Haswell has 168 GP registers, only 16 of which are exposed through the AMD64 ISA.)

gsg10y ago

Note that you don't need a dumb compiler, register pressure, or an instruction with physical register constraints to require a copy. Two-address instructions suffice: simply have a register for the first operand that contains a value that is used later. A copy is required to preserve the value stomped by the mutating instruction.

In any case I believe the point of the GP is that any necessary copy can be done by going through memory, so register-register moves can be dispensed with where minimalism is the goal.

rdc1210y ago

Do you have a citation for the 168GP registers, just wondering if Intel have "offically" stated the number they use in a given microarch.

2 more replies

pslam10y ago

Yes, and these are effectively 3 different instructions sharing the same mnenomic. If you assemble them:

        mov     0x1000(%rax,%rax), %eax
        mov     %eax, 0x1000(%rax,%rax)
        mov     $7, %eax

You get:

        0000000000000000	8b840000100000  	movl	0x1000(%rax,%rax), %eax
        0000000000000007	89840000100000  	movl	%eax, 0x1000(%rax,%rax)
        000000000000000e	b807000000      	movl	$0x7, %eax

Which is basically 0x8b->Load, 0x89->Store, 0xb8->Constant.

vardump10y ago

ARM mov can be used as a JMP <addr in reg>. Program counter is in fact register R15 in ARM.

StillBored10y ago

This changed with V8, IIRC the PC can only be updated on branch or an exception...

jimmaswell10y ago

If you're going down that path, all of x86 is used as a 'bytecode' that turns into microcode instructions anyway in modern processors.

pmalynin10y ago

Not exactly, most instructions are not "microcoded" for performance reasons. But there is a class of instructions that is.

delroth10y ago

Intel CPUs translate x86 to some internal RISC-like "uops" before doing optimizations and execution. This is a widely documented fact (though Intel doesn't really talk publicly about it as far as I know): see for example §2.1 in http://www.agner.org/optimize/microarchitecture.pdf

3 more replies

Rexxar10y ago

Is there a way to estimate how many time slower would be a program compiled to use only "mov" instead of standard instructions ? 3x, 10x, 100x, more ?

DSMan19527610y ago

It depends. In this case, I would say it would be at least 100x, probably much more. But, that's because this only compiles BF (It is basically a POC after-all), and thus you have to use that as an intermediate language. BF is extremely un-optimal, and the compiler here spits out a completely unoptimized one-to-one conversion of the BF source to assembly that uses only `mov` instructions. Thus, the size of the assembly will balloon extremely quickly simply because it takes a lot of BF to express fairly simple things. Branching using only `mov` is also impossible, so the code instead uses a flag to control where the assembly writes it's results. Because of this, even when you hit the condition of a loop, if your condition check is before the loop's code, then it results in you running the entire loop's code another time, and just throwing out the results. It's extremely wasteful.

Based on this POC alone, it's pretty much impossible to guess how much slower a real `mov` only compiler would be vs. a regular compiler. It depends a lot on how much of this can be optimized. Directly compiling C to this via clang or gcc would be a good start. I have no idea how much work that would take though, and compiling anything more complex then BF is a real challenge in itself because of the branching problem.

0xe2-0x9a-0x9b10y ago

"compiling anything more complex then BF is a real challenge in itself because of the branching problem."

That's not true.

0xe2-0x9a-0x9b10y ago

It is very hard to estimate the relative speed of a mov-only CPU (with a jump at the end of the program). The mov-only restriction applies only to how a program is input into the machine, it does not restrict the CPU architecture in any way. Obviously, the CPU would first analyze the whole mov-only program and convert it into something that is closer to x86 instructions: arithmetic instructions, unconditional jumps, conditional jumps. The success of this procedure, and ultimately the execution speed, depends on how many mov-patterns the CPU can recognize.

Compilers would need to produce code that consists of standardized mov-only code patterns. This would help keep the CPU architectures relatively simple.

Coming very close to native x86 performance seems possible, at least in theory.

cordite10y ago

Sounds like if we make a JS to BF transpiler, then we can complete this with any language that can be transpiled to JS.

Just imagine the applications of this!

The next step would be to make this into an eventual Quine.

shaftoe10y ago

http://copy.sh/v86/

It's turtles all the way down.

mastax10y ago

I'm not sure if I'm more amazed by how slow that is or how fast it is.

Oh the world we live in: playing minesweeper in a graphical OS running in a javascript x86 emulator.

Sir_Cmpwn10y ago

The Github description is misleading:

> The single instruction C compiler

I was seriously impressed until I got further down.

andrewflnr10y ago

Well, if it reaches a point where it can compile itself, you'll get your compiler written with just one instruction (assuming that's what you were hoping for).

Sir_Cmpwn10y ago

I meant that this software compiles brainfuck, rather than C. I was expecting a C compiler from the description.

ackalker10y ago

From README.md:

    M/o/Vfuscator 2.0 is a complete C compiler, and will be available soon.

Just be patient, or submit patches :-)

1 more reply

billforsternz10y ago

Forgive my ignorance but I thought being Turing complete was important because a Turing complete machine can in principle perform any arbitrary computation. As far as I know the program counter cannot be a destination for an X86 MOV. So you can't branch with only MOVs. How can you do an arbitrary computation if you can't branch?

Maybe you have to reorganize if(a) { b; } else { c; } as a kind of permuted b; followed by c; such that either the permuted b; or the permuted c; are effectively no-ops depending on the value of a.

DSMan19527610y ago

You're right, they never actually do any jumping. From what I can tell, they use a variable called `on` to control where every set of commands write their results too. They do some funky `mov` stuff to do a comparison, and then set `on` with that comparison. They use a stack of `on` variables and then pop off the value of `on` from that stack when you exit a loop.

They cheat to get a jump without using anything but `mov`'s by executing an illegal set of `cs` using `mov`, and then use the sigill generated to execute a signal handler, which is just the same spot that the code starts at. IMO, it doesn't really count, so a `mov`-only program requires at least one jmp instruction to work.

I think loops are achieved by simply running the code to the end without saving any of the results, and then hitting the singal and going back to the beginning - ignoring all the code until you hit the loop you're executing. But, the `mov` code is fairly hard to understand so I could be wrong on that.

nitrogen10y ago

Could one use self-modifying code (such as rewriting the SIGILL handler) to simplify this a bit?

Buge10y ago

It also uses a single jmp instruction to branch back to the start in a loop.

Two values a and b can be compared by storing 0 into * a and 1 into * b. Then a == b if * a is 1. Then this value can be used to index into an array to provide different behavior based on if a == b.

Edit: apparently HN doesn't allow me to escape * with \, which the markdown spec says I should be able to do.

thaumasiotes10y ago

> HN doesn't allow me to escape * with \, which the markdown spec says I should be able to do

So? What does the markdown spec have to do with... anything? The XML spec says you have to begin your comment with a version number declaration, but you're not doing that.

Buge10y ago

I thought it was trying to implement markdown by italicizing half my sentence when I put * in. Markdown was what popped into my mind when I saw that behavior. It would be nice if I could find a way of escaping them without having to insert spaces where I don't want spaces.

1 more reply

nickodell10y ago

It's not just mov's. There's a non-branching jump at the end that loops back to the beginning. Termination is accomplished by reading from address 0.

It's based on this paper: http://www.cl.cam.ac.uk/~sd601/papers/mov.pdf

jacobwcarlson10y ago

http://www.cl.cam.ac.uk/~sd601/papers/mov.pdf

bsder10y ago

It's a superset of the One Instruction Computer https://en.wikipedia.org/wiki/One_instruction_set_computer

spiritplumber10y ago

Could be good to defeat RF snooping attacks.

exabrial10y ago

"-O" enables optimizations

Replaces movs with less movs??

makeset10y ago

Fewer.

gnu810y ago

This is almost surely a dumb question, but what would happen if someone made a processor that only executed mov instructions, which would be extremely simple, and presumably would be really goddamn fast. Would that be competitive with today's fastest CPUs? If it were, what would be the advantages and disadvantages of this design?

gizmo68610y ago

Well, Turing machines do not have random access memory (which changes the big-theta complexity of algorithms), and that operations such as multiplications would be complicated and multi-cycle, whereas conventional CPUs can easily parralize them.

If you were comparing the performs of mov-only programs, a specialized cpu would likely be faster. Otherwise, there is a good reason we do not actually use Turing machines.

sklogic10y ago

An extreme example:

http://laughtonelectronics.com/Arcana/One-bit%20computer/One...

A bit less extreme: http://www.sccs.swarthmore.edu/users/06/adem/engin/e25/final...

userbinator10y ago

what would happen if someone made a processor that only executed mov instructions, which would be extremely simple, and presumably would be really goddamn fast.

The laws of physics get in the way. It's essentially the same reason why clock frequencies have stopped going up.

0xe2-0x9a-0x9b10y ago

"a processor that only executed mov instructions, which would be extremely simple"

I believe such a CPU wouldn't end up simple due to competition for better performance. It would end up being more complex than x86 CPUs.

asdfaoeu10y ago

Might want to look at the examples. They turn an equality check into two register stores and a load.

liotier10y ago

Extremely Reduced Instruction Set Computer.

Or is it a Ridiculous Instruction Set ?

__alexs10y ago

One instruction set computer.

https://en.wikipedia.org/wiki/One_instruction_set_computer

raverbashing10y ago

When I read the title I thought "ah of course cmov is turing complete", but no, it's only mov apparently.

Which is quite amazing (and probably dog slow)

j / k navigate · click thread line to collapse

54 comments

M2Ys4U10y ago

Who needs instructions? Page faults are Turing complete: https://github.com/jbangert/trapcc

pokle10y ago

HN discussion of the paper that this is based on: https://news.ycombinator.com/item?id=6309631

jordigh10y ago

That paper is so funny:

    Finding Turing-completeness in unlikely places has long been a
    pastime of bored computer scientists.

And

    Removing all but the mov instruction from future iterations of the
    x86 architecture would have many advantages: the instruction
    format would be greatly simplified, the expensive decode unit
    would become much cheaper, and silicon currently used for complex
    functional units could be repurposed as even more cache. As long
    as someone else implements the compiler.

Well, I guess someone has implemented it now!

cwre10y ago

Also:

> Thus, while it has been known for quite some time that x86 has far too many instructions, we can now contribute the novel result that it also has far too many registers.

pslam10y ago

userbinator10y ago

awalton10y ago

gsg10y ago

In any case I believe the point of the GP is that any necessary copy can be done by going through memory, so register-register moves can be dispensed with where minimalism is the goal.

rdc1210y ago

Do you have a citation for the 168GP registers, just wondering if Intel have "offically" stated the number they use in a given microarch.

2 more replies

pslam10y ago

Yes, and these are effectively 3 different instructions sharing the same mnenomic. If you assemble them:

        mov     0x1000(%rax,%rax), %eax
        mov     %eax, 0x1000(%rax,%rax)
        mov     $7, %eax

You get:

        0000000000000000	8b840000100000  	movl	0x1000(%rax,%rax), %eax
        0000000000000007	89840000100000  	movl	%eax, 0x1000(%rax,%rax)
        000000000000000e	b807000000      	movl	$0x7, %eax

Which is basically 0x8b->Load, 0x89->Store, 0xb8->Constant.

vardump10y ago

ARM mov can be used as a JMP <addr in reg>. Program counter is in fact register R15 in ARM.

StillBored10y ago

This changed with V8, IIRC the PC can only be updated on branch or an exception...

jimmaswell10y ago

If you're going down that path, all of x86 is used as a 'bytecode' that turns into microcode instructions anyway in modern processors.

pmalynin10y ago

Not exactly, most instructions are not "microcoded" for performance reasons. But there is a class of instructions that is.

delroth10y ago

3 more replies

Rexxar10y ago

Is there a way to estimate how many time slower would be a program compiled to use only "mov" instead of standard instructions ? 3x, 10x, 100x, more ?

DSMan19527610y ago

0xe2-0x9a-0x9b10y ago

"compiling anything more complex then BF is a real challenge in itself because of the branching problem."

That's not true.

0xe2-0x9a-0x9b10y ago

Compilers would need to produce code that consists of standardized mov-only code patterns. This would help keep the CPU architectures relatively simple.

Coming very close to native x86 performance seems possible, at least in theory.

cordite10y ago

Sounds like if we make a JS to BF transpiler, then we can complete this with any language that can be transpiled to JS.

Just imagine the applications of this!

The next step would be to make this into an eventual Quine.

shaftoe10y ago

http://copy.sh/v86/

It's turtles all the way down.

mastax10y ago

I'm not sure if I'm more amazed by how slow that is or how fast it is.

Oh the world we live in: playing minesweeper in a graphical OS running in a javascript x86 emulator.

Sir_Cmpwn10y ago

The Github description is misleading:

> The single instruction C compiler

I was seriously impressed until I got further down.

andrewflnr10y ago

Well, if it reaches a point where it can compile itself, you'll get your compiler written with just one instruction (assuming that's what you were hoping for).

Sir_Cmpwn10y ago

I meant that this software compiles brainfuck, rather than C. I was expecting a C compiler from the description.

ackalker10y ago

From README.md:

    M/o/Vfuscator 2.0 is a complete C compiler, and will be available soon.

Just be patient, or submit patches :-)

1 more reply

billforsternz10y ago

Maybe you have to reorganize if(a) { b; } else { c; } as a kind of permuted b; followed by c; such that either the permuted b; or the permuted c; are effectively no-ops depending on the value of a.

DSMan19527610y ago

nitrogen10y ago

Could one use self-modifying code (such as rewriting the SIGILL handler) to simplify this a bit?

Buge10y ago

It also uses a single jmp instruction to branch back to the start in a loop.

Two values a and b can be compared by storing 0 into * a and 1 into * b. Then a == b if * a is 1. Then this value can be used to index into an array to provide different behavior based on if a == b.

Edit: apparently HN doesn't allow me to escape * with \, which the markdown spec says I should be able to do.

thaumasiotes10y ago

> HN doesn't allow me to escape * with \, which the markdown spec says I should be able to do

So? What does the markdown spec have to do with... anything? The XML spec says you have to begin your comment with a version number declaration, but you're not doing that.

Buge10y ago

1 more reply

nickodell10y ago

It's not just mov's. There's a non-branching jump at the end that loops back to the beginning. Termination is accomplished by reading from address 0.

It's based on this paper: http://www.cl.cam.ac.uk/~sd601/papers/mov.pdf

jacobwcarlson10y ago

http://www.cl.cam.ac.uk/~sd601/papers/mov.pdf

bsder10y ago

It's a superset of the One Instruction Computer https://en.wikipedia.org/wiki/One_instruction_set_computer

spiritplumber10y ago

Could be good to defeat RF snooping attacks.

exabrial10y ago

"-O" enables optimizations

Replaces movs with less movs??

makeset10y ago

Fewer.

gnu810y ago

gizmo68610y ago

If you were comparing the performs of mov-only programs, a specialized cpu would likely be faster. Otherwise, there is a good reason we do not actually use Turing machines.

sklogic10y ago

An extreme example:

http://laughtonelectronics.com/Arcana/One-bit%20computer/One...

A bit less extreme: http://www.sccs.swarthmore.edu/users/06/adem/engin/e25/final...

userbinator10y ago

what would happen if someone made a processor that only executed mov instructions, which would be extremely simple, and presumably would be really goddamn fast.

The laws of physics get in the way. It's essentially the same reason why clock frequencies have stopped going up.

0xe2-0x9a-0x9b10y ago

"a processor that only executed mov instructions, which would be extremely simple"

I believe such a CPU wouldn't end up simple due to competition for better performance. It would end up being more complex than x86 CPUs.

asdfaoeu10y ago

Might want to look at the examples. They turn an equality check into two register stores and a load.

liotier10y ago

Extremely Reduced Instruction Set Computer.

Or is it a Ridiculous Instruction Set ?

__alexs10y ago

One instruction set computer.

https://en.wikipedia.org/wiki/One_instruction_set_computer

raverbashing10y ago

When I read the title I thought "ah of course cmov is turing complete", but no, it's only mov apparently.

Which is quite amazing (and probably dog slow)

j / k navigate · click thread line to collapse