How a fix in Go 1.9 sped up our Gitaly service by 30x (opens in new tab)

(about.gitlab.com)

252 pointsrfks8y ago68 comments

68 comments

Each Gitaly server instance was fork/exec'ing Git processes about 20 times per second so we seemed to finally have a very promising lead.

What's really wrong here is that they're apparently spawning processes like crazy. Do they spawn a new process for each API call? That's like running CGI programs under Apache, like it's 1995.

revelation8y ago

It's even worse. Gitaly is a program that takes loosely-validated externally-triggered requests and turns them into Git command lines to be exec()ed. So every API request transmutes its input into one or more Git command lines that are exec()ed, each one invoking fork() on the main massively-parallel Gitaly process (well, used to anyway).

It's like a terrible China router firmware, without the C. Bonus points for every straightforward way of running a throwaway command on Linux invoking fork().

I guess it's a good thing because it sets us up for another blog post once they learn of the latency gains to be had when you are not creating new processes on API requests. Hell, when someone starts looking into how this Git thing works, we might be in for a whole series.

arghwhat8y ago

I suspect you are misunderstanding "exec" as "shell". China router firmwares call the shell.

Putting arbitrary input into a shell is dangerous, as missed escaping can result in control of the shell.

When you call exec yourself, however, you are passing the individual arguments as NULL-terminated list of strings (char*). There is no shell to abuse. Calling a process this way is about as safe as calling a function that takes strings for arguments. The function can still have vulnerabilities, but the process of calling it is safe.

Animats8y ago

Do they just do that for commands that make changes, or do they do it for pure read commands as well? Most of the volume is in reads, especially since so many build systems now read directly from Github.

1 more reply

dchest8y ago

Nothing wrong with it. In fact, I wish spawning processes was more common. It's beneficial for security.

sreque8y ago

Spawning a process, even if its just vfork() followed by exec(), is very very expensive compared to spawning a thread, and even then most apps created threads on startup and then just cache them for the lifetime of the application. Spawning a process 20 times a second honestly isn't terrible, but even hundreds per second can be noticeably slow IIRC.

1 more reply

kjksf8y ago

Process isolation is good for security.

Parsing text data in ad-hoc, non-standardized, not documented, not defined format is really bad for security.

Just spawning a process creates as many security problems as it solves.

If it was done right, it would look like Chrome architecture, where untrusted, isolated processes can do dangerous work but communicate with trusted process via well defined IPC protocol.

1 more reply

mschuster918y ago

> It's beneficial for security

... and for RAM usage. Java applications all have a tendency to bloat the longer you keep them running.

2 more replies

CapacitorSet8y ago

I wonder how much it would speed up if they were using libgit2 directly.

jacobvosmaer8y ago

I'm on the Gitaly team.

As zegerjan wrote, Gitaly is a Go/Ruby hybrid.

The main Go process doesn't use libgit2 (for now) because we didn't want to have to deal with cgo. We already know how to deal with C extensions in Ruby, and we have a lot of existing Ruby application code that uses libgit2, so we still use it there. And that code works fine so I don't see us removing it.

In practice, sometimes spawning a Git process is faster than using libgit2, so why then not do that. Also for parts of our workload (handling Git push/pull operations), spawning a one-off process (git-upload-pack) is the most boring / tried-and-true approach.

sytse8y ago

I'm at GitLab but not on the Gitaly team. I think we are using libgit2 but that it doesn't contain all the calls we need.

1 more reply

zegerjan8y ago

The Gitaly server has a Ruby component, but also a Go component. The Ruby server uses Rugged[1] and Gollum-lib[2] which both use libgit2.

The Go component doesn't have libgit2 binding yet, although we're looking into adding that later. That or maybe go-git[3]. But for now Gitaly is mainly focussed at migrating all git calls from the Rails monolith. Not introducing a new component now reduces the risks this project has.

[1]: https://github.com/libgit2/rugged/ [2]: https://gitlab.slack.com/archives/C027N716H/p151695430400026... [3]: https://github.com/src-d/go-git/

masklinn8y ago

Assuming they're currently pure git, binding to libgit2 would require using cgo, which could have far-ranging (negative) consequences.

chriscool8y ago

libgit2 has had bugs related to file locking (for example when you make some ref changes while garbage collecting) and libgit2 does not implement all the git features, so you cannot do everything using libgit2 anyway.

paulddraper8y ago

I am tired of this "processes are expensive" bullcrap. (At least for Linux.)

    $ time seq 1000 | while read; do sleep 0 & done

    real        0m0.185s
    user        0m0.546s
    sys         0m0.265s

That's less than .2ms to start a process.

Processes give you operational control (CPU, memory, permissions, isolation, monitoring) that other constructs simple cannot. Decades ago when we had far slower computers, people were doing process-oriented development and forking as if it was okay (CGI, make, git).

Somehow, separate processes came to be avoided like the plague, when in reality, they are probably the smallest resource "waste" in 99% of systems.

arghwhat8y ago

This is a terrible microbenchmark.

First of all, you're only benchmarking the time it takes for fork(2) to return in the parent subshell, nothing else. The new processes don't exist yet at this point, and certainly hasn't exec'd (which tends to be why you're forking).

Second, you're not measuring the cost at all. The forked children will, at some point, start executing on other CPUs, which includes finishing configuration and running exec, which takes time. The cost is the total cycles it takes before the child is executing the intended code.

Fork is damn expensive, but whether they're too expensive depends on the usecase, and the cost of expanding hardware.

Fork time scales with the virtual memory of the forking process, and you're forking from a fresh subshell that hardly has anything allocated. It's even mentioned in the linked post that their issue stemmed from this (specifically fork lock contention spiking as fork time increased).

1 more reply

_urga8y ago

We currently have the same problem in Node, where fork is still being called synchronously from the event loop instead of asynchronously from the thread pool.

Calling exec() or spawn() in Node is therefore not asynchronous and can block your event loop for hundreds of milliseconds or even seconds as RSS increases.

https://github.com/nodejs/node/issues/14917

alecthomas8y ago

That looked like a very frustrating exchange.

jsiepkes8y ago

"Recompiling with Go 1.9 solved the problem, thanks to the switch to posix_spawn"

I never understood why so many people use fork() instead of POSIX spawn(). For example OpenJDK (Java) also does this as the default for starting a process. Which leads to interesting results when you use it on a OS which does do memory over committing like Solaris. Since the process briefly doubles in memory use with fork() your process will die with an out of memory error.

nitwit0058y ago

I thought of creating a fix myself way back, and the issue was that Go made use of system calls directly. You basically have to re-implement posix_spawn in Go. If you look at their change, it includes updates to chipset specific files, and the fix only seems to work on a CPU that reports as amd64.

ithkuil8y ago

I must say I didn't go to look at the sources of the patch, but what you say sounds so odd that I'll take the chance and suggest that perhaps the fact that in golang "amd64" is, for historical reasons, the name of the architecture more neutrally known as "x86_64", is the source of confusion (I.e. it doesn't just work on AMD or on CPUs that claim/report having a specific model/maker etc).

Low level syscall ABI is architecture dependent.

2 more replies

AnIdiotOnTheNet8y ago

Go uses system calls directly because the alternative in UNIX land is linking with libc.

revelation8y ago

Because every straightforward way of running an external command on Unix involves fork(). So someone wrote that API not thinking much of it.

Then shock horror they realize running a throwaway command is fork()ing the main process. But now everyone is too angsty to change it because someone out there might rely on the environment copy functionality, even when they shouldn't.

dboreham8y ago

Because decades of written material about Unix says fork() is really cool (even though it isn't)?

f2f8y ago

fork was alright before other people tacked on multiples of cruft like threads and whatnot onto commercial unixes and they became mainstream. the current problem is that you don't want to have to copy all file descriptors if all you're going to do is call "exec" and reduce them to three: in, out, err.

for example, here's the caveats section from the macOS fork man page:

     There are limits to what you can do in the child process.  To be totally safe you should restrict your-
     self to only executing async-signal safe operations until such time as one of the exec functions is
     called.  All APIs, including global data symbols, in any framework or library should be assumed to be
     unsafe after a fork() unless explicitly documented to be safe or async-signal safe.  If you need to use
     these frameworks in the child process, you must exec.  In this situation it is reasonable to exec your-
     self.

That spells defeat :)

Earlier in the game, copy-on-write had to be created for the same reasons.

1 more reply

ploxiln8y ago

fork() is a pretty simple way to be able to modify the environment for a process you will spawn. fork(), the child can modify its own environment using various orthogonal system calls, e.g. to redirect stdout/stderr or drop permissions, and then exec the target executable.

Threads throw a wrench in things. But fork() existed for decades before threads. O_CLOEXEC etc helps. Lots of command-line utilities don't use threads.

fork() isn't the fastest way - but in many situations it's not a problem, it's just convenient. In that respect it's somewhat like using python when you could have used go.

jacobvosmaer8y ago

Another nice example is changing the working directory for the new process. With fork+exec, you can do a chdir after fork but before exec. With posix_spawn you're stuck with the working directory of the parent.

khc8y ago

because posix_spawn() in linux often calls fork(). I just looked at the manpage now and it says under some conditions it'd call vfork() instead, but I don't remember that being the case when I last looked at this (6-7 years ago?)

noselasd8y ago

Nowadays, posix_spawn() calls clone() on linux, with the CLONE_VM flag, behaving much like vfork() as far as I can tell.

That means the child and parent process shares the memory (until exec() is performed).

Especially if the parent process is multi-threaded this avoids a whole lot of pagefaults that would occur if using fork() when another thread touches memory, possibly triggering a lot of copy-on-writes in the time window between calling fork() and the child calling exec()

Code: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/uni...

jsiepkes8y ago

Typo: "does do" should read "doesn't do".

kevincox8y ago

It's often to remember the other point of view when you see huge performance differences.

> A bug in Go <1.9 was causing a 30x slowdown in our Gitaly service.

00038y ago

The author does say fix.

empath758y ago

68 comments

Animats8y ago

Each Gitaly server instance was fork/exec'ing Git processes about 20 times per second so we seemed to finally have a very promising lead.

What's really wrong here is that they're apparently spawning processes like crazy. Do they spawn a new process for each API call? That's like running CGI programs under Apache, like it's 1995.

revelation8y ago

It's like a terrible China router firmware, without the C. Bonus points for every straightforward way of running a throwaway command on Linux invoking fork().

arghwhat8y ago

I suspect you are misunderstanding "exec" as "shell". China router firmwares call the shell.

Putting arbitrary input into a shell is dangerous, as missed escaping can result in control of the shell.

Animats8y ago

1 more reply

dchest8y ago

Nothing wrong with it. In fact, I wish spawning processes was more common. It's beneficial for security.

sreque8y ago

1 more reply

kjksf8y ago

Process isolation is good for security.

Parsing text data in ad-hoc, non-standardized, not documented, not defined format is really bad for security.

Just spawning a process creates as many security problems as it solves.

If it was done right, it would look like Chrome architecture, where untrusted, isolated processes can do dangerous work but communicate with trusted process via well defined IPC protocol.

1 more reply

mschuster918y ago

> It's beneficial for security

... and for RAM usage. Java applications all have a tendency to bloat the longer you keep them running.

2 more replies

CapacitorSet8y ago

I wonder how much it would speed up if they were using libgit2 directly.

jacobvosmaer8y ago

I'm on the Gitaly team.

As zegerjan wrote, Gitaly is a Go/Ruby hybrid.

sytse8y ago

I'm at GitLab but not on the Gitaly team. I think we are using libgit2 but that it doesn't contain all the calls we need.

1 more reply

zegerjan8y ago

The Gitaly server has a Ruby component, but also a Go component. The Ruby server uses Rugged[1] and Gollum-lib[2] which both use libgit2.

[1]: https://github.com/libgit2/rugged/ [2]: https://gitlab.slack.com/archives/C027N716H/p151695430400026... [3]: https://github.com/src-d/go-git/

masklinn8y ago

Assuming they're currently pure git, binding to libgit2 would require using cgo, which could have far-ranging (negative) consequences.

chriscool8y ago

paulddraper8y ago

I am tired of this "processes are expensive" bullcrap. (At least for Linux.)

    $ time seq 1000 | while read; do sleep 0 & done

    real        0m0.185s
    user        0m0.546s
    sys         0m0.265s

That's less than .2ms to start a process.

Somehow, separate processes came to be avoided like the plague, when in reality, they are probably the smallest resource "waste" in 99% of systems.

arghwhat8y ago

This is a terrible microbenchmark.

Fork is damn expensive, but whether they're too expensive depends on the usecase, and the cost of expanding hardware.

1 more reply

_urga8y ago

We currently have the same problem in Node, where fork is still being called synchronously from the event loop instead of asynchronously from the thread pool.

Calling exec() or spawn() in Node is therefore not asynchronous and can block your event loop for hundreds of milliseconds or even seconds as RSS increases.

https://github.com/nodejs/node/issues/14917

alecthomas8y ago

That looked like a very frustrating exchange.

jsiepkes8y ago

"Recompiling with Go 1.9 solved the problem, thanks to the switch to posix_spawn"

nitwit0058y ago

ithkuil8y ago

Low level syscall ABI is architecture dependent.

2 more replies

AnIdiotOnTheNet8y ago

Go uses system calls directly because the alternative in UNIX land is linking with libc.

revelation8y ago

Because every straightforward way of running an external command on Unix involves fork(). So someone wrote that API not thinking much of it.

dboreham8y ago

Because decades of written material about Unix says fork() is really cool (even though it isn't)?

f2f8y ago

for example, here's the caveats section from the macOS fork man page:

     There are limits to what you can do in the child process.  To be totally safe you should restrict your-
     self to only executing async-signal safe operations until such time as one of the exec functions is
     called.  All APIs, including global data symbols, in any framework or library should be assumed to be
     unsafe after a fork() unless explicitly documented to be safe or async-signal safe.  If you need to use
     these frameworks in the child process, you must exec.  In this situation it is reasonable to exec your-
     self.

That spells defeat :)

Earlier in the game, copy-on-write had to be created for the same reasons.

1 more reply

ploxiln8y ago

Threads throw a wrench in things. But fork() existed for decades before threads. O_CLOEXEC etc helps. Lots of command-line utilities don't use threads.

fork() isn't the fastest way - but in many situations it's not a problem, it's just convenient. In that respect it's somewhat like using python when you could have used go.

jacobvosmaer8y ago

khc8y ago

noselasd8y ago

Nowadays, posix_spawn() calls clone() on linux, with the CLONE_VM flag, behaving much like vfork() as far as I can tell.

That means the child and parent process shares the memory (until exec() is performed).

Code: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/uni...

jsiepkes8y ago

Typo: "does do" should read "doesn't do".

kevincox8y ago

It's often to remember the other point of view when you see huge performance differences.

> A bug in Go <1.9 was causing a 30x slowdown in our Gitaly service.

00038y ago

The author does say fix.

empath758y ago