Russ's example was just the "gofmt" program.
> Perhaps the reason you're seeing increases in Cockroach DB is because you keep writing more code for Cockroach DB?
If that was the only reason, then the % overhead would remain constant-ish. But it is increasing. So there is a non-linear factor for _some_ go programs (like cockroachdb) and it's still unclear what that factor is.
If there is superlinear growth in binary sizes as the project grows – for example, if some part is O(n^2) in the number of interfaces – then that's certainly interesting. If you demonstrated that such superlinear growth is happening, and wrote an article based on that, people wouldn't be so critical.
If Go binaries are getting bigger because Go produces bigger binaries for the same source code over time, then that's also interesting. If you demonstrated that Go binaries are getting more and more bloated over time for the same source code, and wrote an article based on that, people wouldn't be critical.
But as it is, you kind of just complained that CockroachDB is getting bigger, tried to blame it partly on the Go compiler producing more bloated code over time, partly on a mystical "dark area" which you don't understand, you mentioned superlinear growth only in the comment section, and you didn't actually gather data or do experiments to prove or disprove any of the things you're claiming as a cause. That's why people are complaining.
Where? The argument is _precisely_ that the growth is occurring in non-code areas.
> partly on a mystical "dark area" which you don't understand
The _observation_ is that the growth is happening in an area of the file that's not accounted for in the symtable. That's what makes it "dark". It's not mythical: it's _there_ and you can observe it just as well as anyone else.
> you mentioned superlinear growth only in the comment section
it's in the reported measurements.
> and you didn't actually gather data or do experiments to prove or disprove any of the things you're claiming as a cause
The analysis is stating observations and reporting that the size is increasingly due to non-accounted data. That observation is substantiated by measurements. There's no claim of cause in the text!
But how is this important? If the thing you're optimizing for is "total Go binary size", then all that matters is the total size of binary! How bytes are organized internally is irrelevant to this metric.
You should redo the analysis where you compile an old version of Cockroach DB (say v1.0.0) with Go versions 1.8 through 1.16, and then see what the numbers say. Your current analysis, which doesn't account for growth in the code base at all, or tries to account for it by deep-diving into the internal organization of the binary, is not sound.
Not quite so if the task is to work on reducing the metric.
When the size is attributed to data/code that's linked to the source code, then we know how to reduce the final file size (by removing data/code from the source code, or reducing them).
When the size is non-attributed and/or non-explained (“dark”) we are lacking a control to make the size smaller over time.
I'm not going to respond point by point because those points are kind of moot if my accusations were based on an incorrect reading.
Everything has trade offs. Go is not the easiest language in which to write highly efficient code. Cockroach generates some code to help here. Certainly at this point there’s pain around tracking memory usage so as to not OOM and there’s pain around just controlling scheduling priorities. But then again, had it been C++ or Rust perhaps getting to table stakes correctness would have taken so long it wouldn’t have mattered.
Some cost just comes from running distributed replication and concurrency control. That’s unavoidable. Some also comes from lack of optimization. Postgres has been around and has some very optimized things in its execution engineer.
Also, if you run Postgres in SERIALIZABLE, it actually does quite badly, largely because that isolation level was bolted on and the concurrency control isn’t optimized for it. Crdb was core-for-core competitive in serializable on some workloads last time I checked.