Unity builds lurked into the Firefox Build System (opens in new tab)

(serge-sans-paille.github.io)

65 pointssylvestre3y ago61 comments

61 comments

Note that this is not referring to the Game Engine Unity. It's just referring to #including .cpp files.

jb19913y ago

Indeed, the title almost makes no sense grammatically in its current form, a consequence of the word “how“ being removed. It would be obvious it was not the game engine when the word “unity“ appeared as the second, lowercases word.

10000truths3y ago

With the advent of LTO, unity builds are mostly a band-aid for poor management of header files. The Linux kernel project was able to net a ~40% reduction in compilation CPU-time just by pruning the contents of some key header files [1].

It really boils down to two rules:

1. Don't declare anything in header files that is only used in one compilation unit. Internal structs and functions should be declared and defined in source files, and internal linkage used wherever possible. gcc and clang's -fvisibility=hidden is useful here.

2. The more frequently a header file is included (whether transitively or directly), the more it should be split up. If a "common" or "utility" header file is included in 10000 source files, then any struct, function, etc. that you add to that file will have to be parsed 10000 times by the compiler every time you build from scratch, even if only 10 source files actually use the struct/function that you added. gcc and clang's -H flag is useful here.

[1] https://lore.kernel.org/lkml/YdIfz+LMewetSaEB@gmail.com/

pm2153y ago

I think "just" is perhaps not the right word for something that took a senior dev over a year and more than 2000 commits just to get to an RFC patchset that doesn't compile for all architectures... Tremendous work, but it clearly wasn't easy or a matter of "follow these simple rules".

flohofwoe3y ago

> unity builds are mostly a band-aid for poor management of header files

That's what its always was about (to improve build times), better optimization is just a welcome side effect. But header hygiene is hard because the problem will creep back into the code base over time.

> The Linux kernel project was able to net a ~40% reduction in compilation CPU-time

Linux is a C codebase. Header hygiene is much easier in C, because C headers usually only contain interface declarations (usually at most a few hundred lines of function prototypes and struct declarations), while C++ headers often need to include implementation code inside template functions, or are plastered with inline functions (which in turn means more dependencies to include in the header). And even if the user headers are reasonably 'clean', they still often need to include C++ stdlib headers which then indirectly introduce the same problem.

For instance your point (2) only makes sense if this header doesn't need to include any of the C++ stdlib headers, which will add tens of thousands of lines of code to each compilation unit. For such cases you might actually make the problem worse by splitting big headers into many smaller ones.

PS: the most effective, but also most radical and controversial solution is also a very simple one: don't include headers in headers.

omoikane3y ago

> This generally leads to faster compilation time in part because it aggregates the cost of parsing the same headers over and over.

But this also reduces the opportunity to parallelize compilation across multiple files because they have been concatenated into fewer build units, and each unit now requires more memory to deal with the non-header parts. For some build systems and repositories, this actually increases build time.

simplotek3y ago

> But this also reduces the opportunity to parallelize compilation across multiple files because they have been concatenated into fewer build units (...)

Irrelevant. There is always significant overhead in handling multiple translation units, and unity builds simply eliminate that overhead.

> and each unit now requires more memory to deal with the non-header parts.

And that's perfectly ok. You can control how large unity builds are at the component level.

> For some build systems and repositories, this actually increases build time.

You're creating hypothetical problems where there are none.

In the meantime, you're completely missing the main risk of unity builds: increasing the risk of introducing problems associated with internal linkage.

tomjakubowski3y ago

unity builds do often have worse performance than separate compilation for "incremental rebuilds" during development. that all depends on how the code is split up and how bad of a factor linking is.

as in the article, it's best to support both

flohofwoe3y ago

You also need to consider that (at least in C++), your own code is just a very small snippet dangling off the end of a very large included stdlib code block, and that's for each source file which needs to include any C++ stdlib header.

For instance, just including <vector> in a C++ source file adds nearly 20kloc of code to the compilation unit:

https://www.godbolt.org/z/56ncqEqYs

If your project has 100 source files, each with 100 lines of code but each file includes the <vector> header (assuming this resolves to 20kloc), you will compile around 2mloc overall (100 * 20100 = 2010000).

If the same project code is in a single 10kloc source file which includes <vector>, you're only compiling 30kloc overall (100 * 100 + 20000 = 30000).

In such a case (which isn't even all that theoretical), you are just wasting a lot of energy keeping all your CPU cores busy compiling <vector> a hundred times over, versus compiling <vector> once on a single core ;)

stephc_int133y ago

On very large projects you can always cut them into several libraries, and compile them on different cores. Quite easy to do in practice.

cpeterso3y ago

I believe Firefox builds only unify files within the same directory and a maximum of a ~dozen cpp files per unit. So there are still plenty of build parallelism across directories.

rsaxvc3y ago

Not necessarily - I've been prototyping a fork of tcc that does both. It's multi-threaded rather than multiprocess.

stephc_int133y ago

I used Unity builds for my projects basically forever, at some point I discovered the practice had a name and some debates around it.

It is a simple thing to do, and the gains are substantial, faster and simpler, less maintenance, especially across different platforms.

For big projects I simply cut them into several libraries.

I've seen some incredulous reactions, mostly from young coders, and I know that makefiles should be faster, but in practice I never found that to be true.

andybak3y ago

"Lurked into"?

You can lurk but surely you can't lurk into something?

bragr3y ago

I assume the author is not a native speaker based on some of the odd grammar and phrasing in the post. It doesn't really detract from the work

Edit: They appear to be french: http://serge.liyun.free.fr/serge/

loufe3y ago

The word "lurk" doesn't exactly exist in French. "se rôder" fits in some cases, but another translation "se cacher" (to hide) fits others. I'd write "X crept into Y" as "X s'est glissé dans Y", but that has a connatation moreso as an accidental short-term mistake. I don't know how I'd express the idea concisely in French. Also hard to tell exactly how he wanted to convey it, something between "crept", "were hidden", "were lurking" probably? As I've discovered the hard way, there is not always an analogous term for the same fundamental idea/concept between two given languages; mastering the nuances of these differences is important for proper fluency. I probably make errors like this frenquently writing/speaking French.

okeuro493y ago

Probably meant "crept into". https://dictionary.cambridge.org/dictionary/english/creep-in...

Y_Y3y ago

This used to be mandatory for nvcc/CUDA, if you had multiple source files (not just headers) you had to #include all of them in your main file. It made me very uncomfortable.

thinkling3y ago

I’ve been out of C/C++ development for a long time but seem to remember that precompiled headers were a thing back in the day. That approach didn’t have the name space issues pointed out here. Why are precompiled headers not used anymore?

pjmlp3y ago

As far as I am aware, they never work that great on UNIX compilers, as no big effort was ever spent improving them.

About 20 years ago, on UNIX workloads we used to speed the compilation via ClearMake, a kind of distributed version of code cache that would plug into the compilers, however it has part of ClearCase SCM product.

On Windows, with Microsoft and Borland (nowadays Embarcadero), they work quite alright.

Also, modules will fix that, as per VC++ reports, importing the whole standard library (import std, as per C++23) takes a fraction of only including iostream.

bgmeister3y ago

They are still used in some places. But they have some downsides:

Precompiled headers don't play nicely with distributed compilation or shared build caches (which are perhaps the fastest way to build large C++ codebases). So while they can work well for local builds, they exclude the use of (IMO) better build-time optimisations.

They also require maintenance over time- if you precompile a bad set of headers it can make your compile times worse.

maccard3y ago

They're very much alive and well on MSVC. Our work projects use both unity builds _and_ precompiled headers.

flohofwoe3y ago

In a project that already has good header hygiene, precompiled headers don't help much to speed up builds. They're just a bandaid when the situation is already completely out of control.

mastax3y ago

Interesting. I've been aware of this technique for years because of the SQLite Amalgamation, but that was always sold as a way to simplify distribution and perhaps improve performance of the binary. I hadn't considered it as a build speed optimization, though that seems somewhat obvious in hindsight.

simplotek3y ago

> I hadn't considered it as a build speed optimization, though that seems somewhat obvious in hindsight.

Some build systems like cmake already support unity builds, as this is a popular strategy to speed up builds.

Nevertheless, if speed is the main concern them it's preferable to just use a build cache like ccache, and modularize a project appropriately.

maccard3y ago

Why not both?

Also, does ccache work with MSVC?

simplotek3y ago

> Also, does ccache work with MSVC?

Technically it works, but it requires some work. You need to pass off ccache's executable as the target compiler, and you need to configure the settings in all vsproj files to allow calls to the compiler to be cacheable, like disabling compilation batching.

Using cmake to generate make/ninja projects and use compilers other than msvc is far simpler and straight-forward: set two cmake vars and you're done.

leni5363y ago

Unity builds mean that you can no longer use internal linkage safely anymore, and that's not something I like to give up. It forces the codebase to follow a certain style that I don't like. Hopefully modules will give the advantage of unity builds without this downside.

Scubabear683y ago

Headers and C style macros are probably the most unfortunate aspects of C (and by extension, C++).

So many hacks in compilers to try to work around this. A shame there is no language level fix for this nonsense.

Really wish there could be a C++—- that would improve on C in areas like this, and avoid all the incredible nonsense of C++. And no, not Rust or Go.

pxeger13y ago

Have you tried Zig? I think it fits those criteria, and is known for its good build system, although AIUI it is quite a large language compared to C

flohofwoe3y ago

> Headers and C style macros are probably the most unfortunate aspects of C (and by extension, C++).

Headers only became a massive problem in C++ because of templates and the unfortunate introduction of the inline keyword (which then unfortunately also slipped into C99, truly the biggest blunder of the C committee next to VLAs).

Typical C headers (including the C stdlib headers) are at most a few hundred lines of function prototypes and struct declarations.

Typical C++ headers on the other hand (include the C++ stdlib headers) contain a mix of declarations and implementation code in template and inline functions and will pull in tens of thousands of lines of code into each compilation unit.

This is also the reason why typical C projects compile orders of magnitude faster than typical C++ projects with a comparable line count and number of source files.

pphysch3y ago

Headers (in new code) will hopefully become optional due to modules. That would be such a big boost to the language.

C Macros are pretty much considered code smell in C++, right?

zyedidia3y ago

I think Hare (https://harelang.org/) might fit the bill: it retains the minimalism and simplicity of C, but fixes issues like this (and others). Unfortunately I don't think it's ready for real use yet, but I am keeping an eye on it.

StellarScience3y ago

We leverage many third party C++ libraries with complex templates, concepts, and constexpr expressions that seem to require lots of CPU to compile. We've found unity builds to be almost 3X faster, so we make it the default for both developer and CI jobs.

But we keep a separate CI job that checks the non-unity build, so developers have to add the right #include statements and can't accidentally reference file-scoped functions from other files. While working on a given library or project, developers often disable the unity build for just that project to reduce incremental build times. It seems to offer the benefits of both approaches.

Precompiled headers don't give nearly the same speedup. We're excited for C++ modules of course, but we're trying to temper any expectations that modules will improve build speed.

firstlink3y ago

The compilation-unit-per-file model (and in fact the whole concept of linking) are a legacy incremental build solution for C which somehow metastasized into fundamental requirements of building software on current OSes. It is an atrocity and should be disavowed by all developers.

robalni3y ago

I always use unity builds for all my projects now. That combined with using tcc as compiler (for C code) makes builds really fast. Another nice feature of unity builds is that I don't need to declare functions twice and keep the declarations synced. It's also nice to only have one place to find information about a function; people often put comments in header files that you can miss if you go to the definition.

All of those things combined make C programming more enjoyable.

simplotek3y ago

> Another nice feature of unity builds is that I don't need to declare functions twice and keep the declarations synced.

What exactly leads you to have multiple declarations in sync, and thus creating the to "keep [multiple] declarations synced"?

robalni3y ago

I mean if you use multiple translation units and header files, you need to have a copy of the function declaration in that header file to be able to call it from other translation units.

dundarious3y ago

Why can’t static analyzers analyze the main cpp that #include-s the actual code? I don’t understand that point.

And what were the resulting affects on build times?

monocasa3y ago

They can. In addition to that, they get confused by a .cpp that isn't the top level file of a compilation unit.

tyleo3y ago

Lots of game studios use Unity builds like this. It saves a massive amount of time. Last I heard it also improves Incredibuild performance which is another popular tool for decreasing build timed.

cpeterso3y ago

Another benefit not mentioned is optimization. The compiler may be able to inline more function calls when function definitions and callers are in the same unified compilation unit.

xcdzvyn3y ago

Link is 404: https://web.archive.org/web/20230505055736/https://serge-san...

zdimension3y ago

The post was renamed, the URL changed accordingly: https://serge-sans-paille.github.io/pythran-stories/how-unit...

(HTTP 301 on the old URL would have been appreciated)

kccqzy3y ago

To avoid some of these issues, it can be helpful in a project to require that all files including header files must be compilable on their own. Doesn't get rid of all the problems (you can still depend on transitive includes without explicitly including them) but enforces a minimal amount of code hygiene.

zX41ZdbW3y ago

Tried Unity builds recently for ClickHouse, but without success: https://github.com/ClickHouse/ClickHouse/pull/18952#issuecom...

nine_k3y ago

It's another reminder how <expletives omitted> C++ is, but for a long time nothing better existed.

Grum93y ago

Been using unity builds forever. The trick is to also have a standard build and try to compile it every week or so to catch anything that might have been missed, like a source file that is missing a header and wont compile alone because it got the header through the unity build ordering.

vinyl73y ago

Compilation units are a relic of a time where computers only had a few KB of memory. At this point computers are fast enough and have enough memory to compile the whole thing in one go faster than whatever gains doing change detection and linking will have.

vitno3y ago

This, is just deeply untrue. Do you really think everybody working on compilers and linkers are deeply ignorant? I can easily saturate my 64 GB RAM home setup during a compile.

dagmx3y ago

While everyone else is (rightfully) correcting you, I am curious what sort of codebases you’re working with?

Are you working on large compiled software? Any game, rendering engine or large application benefits from compilation units in my experience.

Some of my libraries that I work with take upwards of an hour for a fresh compile. Having sane compilation units cuts down subsequent iteration to minutes instead.

dblohm73y ago

Yeah, no. To this day Firefox developers building Gecko need a beefy desktop machine to be able to do it in a reasonable amount of time. I could do a clean build in 6 minutes with a ThreadRipper whose cores were all pegged, but forget doing the same in under an hour on a laptop.

And that was with unified builds enabled.

glandium3y ago

And more importantly, on that same machine the build would take more than ten minutes without unified builds.

regnerba3y ago

Hahaha tell that to my Unreal Engine build times.

A brand new AMD Epyc, 64 core machine, will take over an hour to compile. Good times.

dagmx3y ago

I’d really like to see a comparison someday between Epics weird C# based build system and something like CMake+Ninja.

I suspect there’s compilation optimizations to be made, but I don’t think it would save more than 30% here and there.

maccard3y ago

> I suspect there’s compilation optimizations to be made

There definitely are. I've spent a lot of time with UBT, and a "reasonable" amount of time with cmake and friends. UBT isn't quite the same as CMake + Ninja. UBT does "adaptive" unity builds, globbing, and a couple of other things.

> but I don’t think it would save more than 30% here and there.

Agreed. The clean build with UBT is painfully slow compared to Cmake + Ninja, but the full builds themselves are pretty good, and I'd bet that there's probably less low hanging fruit there.

I did a good chunk of work on improving compile times in Unreal, and there is definitely just low hanging fruit in the engine for improving compile times. Some changes to how UHT works around forward declares would also make a significant difference too.

1 more reply

regnerba3y ago

I would as well! It’s honestly a bit beyond me, the Unreal build tools run deep, so I imagine it would take some effort.

smabie3y ago

Why do clean builds of my code take like 30m then?

j / k navigate · click thread line to collapse

61 comments

Dwedit3y ago

Note that this is not referring to the Game Engine Unity. It's just referring to #including .cpp files.

jb19913y ago

10000truths3y ago

It really boils down to two rules:

[1] https://lore.kernel.org/lkml/YdIfz+LMewetSaEB@gmail.com/

pm2153y ago

flohofwoe3y ago

> unity builds are mostly a band-aid for poor management of header files

> The Linux kernel project was able to net a ~40% reduction in compilation CPU-time

PS: the most effective, but also most radical and controversial solution is also a very simple one: don't include headers in headers.

omoikane3y ago

> This generally leads to faster compilation time in part because it aggregates the cost of parsing the same headers over and over.

simplotek3y ago

> But this also reduces the opportunity to parallelize compilation across multiple files because they have been concatenated into fewer build units (...)

Irrelevant. There is always significant overhead in handling multiple translation units, and unity builds simply eliminate that overhead.

> and each unit now requires more memory to deal with the non-header parts.

And that's perfectly ok. You can control how large unity builds are at the component level.

> For some build systems and repositories, this actually increases build time.

You're creating hypothetical problems where there are none.

In the meantime, you're completely missing the main risk of unity builds: increasing the risk of introducing problems associated with internal linkage.

tomjakubowski3y ago

unity builds do often have worse performance than separate compilation for "incremental rebuilds" during development. that all depends on how the code is split up and how bad of a factor linking is.

as in the article, it's best to support both

flohofwoe3y ago

For instance, just including <vector> in a C++ source file adds nearly 20kloc of code to the compilation unit:

https://www.godbolt.org/z/56ncqEqYs

If the same project code is in a single 10kloc source file which includes <vector>, you're only compiling 30kloc overall (100 * 100 + 20000 = 30000).

stephc_int133y ago

On very large projects you can always cut them into several libraries, and compile them on different cores. Quite easy to do in practice.

cpeterso3y ago

I believe Firefox builds only unify files within the same directory and a maximum of a ~dozen cpp files per unit. So there are still plenty of build parallelism across directories.

rsaxvc3y ago

Not necessarily - I've been prototyping a fork of tcc that does both. It's multi-threaded rather than multiprocess.

stephc_int133y ago

I used Unity builds for my projects basically forever, at some point I discovered the practice had a name and some debates around it.

It is a simple thing to do, and the gains are substantial, faster and simpler, less maintenance, especially across different platforms.

For big projects I simply cut them into several libraries.

I've seen some incredulous reactions, mostly from young coders, and I know that makefiles should be faster, but in practice I never found that to be true.

andybak3y ago

"Lurked into"?

You can lurk but surely you can't lurk into something?

bragr3y ago

I assume the author is not a native speaker based on some of the odd grammar and phrasing in the post. It doesn't really detract from the work

Edit: They appear to be french: http://serge.liyun.free.fr/serge/

loufe3y ago

okeuro493y ago

Probably meant "crept into". https://dictionary.cambridge.org/dictionary/english/creep-in...

Y_Y3y ago

This used to be mandatory for nvcc/CUDA, if you had multiple source files (not just headers) you had to #include all of them in your main file. It made me very uncomfortable.

thinkling3y ago

pjmlp3y ago

As far as I am aware, they never work that great on UNIX compilers, as no big effort was ever spent improving them.

On Windows, with Microsoft and Borland (nowadays Embarcadero), they work quite alright.

Also, modules will fix that, as per VC++ reports, importing the whole standard library (import std, as per C++23) takes a fraction of only including iostream.

bgmeister3y ago

They are still used in some places. But they have some downsides:

They also require maintenance over time- if you precompile a bad set of headers it can make your compile times worse.

maccard3y ago

They're very much alive and well on MSVC. Our work projects use both unity builds _and_ precompiled headers.

flohofwoe3y ago

In a project that already has good header hygiene, precompiled headers don't help much to speed up builds. They're just a bandaid when the situation is already completely out of control.

mastax3y ago

simplotek3y ago

> I hadn't considered it as a build speed optimization, though that seems somewhat obvious in hindsight.

Some build systems like cmake already support unity builds, as this is a popular strategy to speed up builds.

Nevertheless, if speed is the main concern them it's preferable to just use a build cache like ccache, and modularize a project appropriately.

maccard3y ago

Why not both?

Also, does ccache work with MSVC?

simplotek3y ago

> Also, does ccache work with MSVC?

Using cmake to generate make/ninja projects and use compilers other than msvc is far simpler and straight-forward: set two cmake vars and you're done.

leni5363y ago

Scubabear683y ago

Headers and C style macros are probably the most unfortunate aspects of C (and by extension, C++).

So many hacks in compilers to try to work around this. A shame there is no language level fix for this nonsense.

Really wish there could be a C++—- that would improve on C in areas like this, and avoid all the incredible nonsense of C++. And no, not Rust or Go.

pxeger13y ago

Have you tried Zig? I think it fits those criteria, and is known for its good build system, although AIUI it is quite a large language compared to C

flohofwoe3y ago

> Headers and C style macros are probably the most unfortunate aspects of C (and by extension, C++).

Typical C headers (including the C stdlib headers) are at most a few hundred lines of function prototypes and struct declarations.

This is also the reason why typical C projects compile orders of magnitude faster than typical C++ projects with a comparable line count and number of source files.

pphysch3y ago

Headers (in new code) will hopefully become optional due to modules. That would be such a big boost to the language.

C Macros are pretty much considered code smell in C++, right?

zyedidia3y ago

StellarScience3y ago

Precompiled headers don't give nearly the same speedup. We're excited for C++ modules of course, but we're trying to temper any expectations that modules will improve build speed.

firstlink3y ago

robalni3y ago

All of those things combined make C programming more enjoyable.

simplotek3y ago

> Another nice feature of unity builds is that I don't need to declare functions twice and keep the declarations synced.

What exactly leads you to have multiple declarations in sync, and thus creating the to "keep [multiple] declarations synced"?

robalni3y ago

I mean if you use multiple translation units and header files, you need to have a copy of the function declaration in that header file to be able to call it from other translation units.

dundarious3y ago

Why can’t static analyzers analyze the main cpp that #include-s the actual code? I don’t understand that point.

And what were the resulting affects on build times?

monocasa3y ago

They can. In addition to that, they get confused by a .cpp that isn't the top level file of a compilation unit.

tyleo3y ago

Lots of game studios use Unity builds like this. It saves a massive amount of time. Last I heard it also improves Incredibuild performance which is another popular tool for decreasing build timed.

cpeterso3y ago

Another benefit not mentioned is optimization. The compiler may be able to inline more function calls when function definitions and callers are in the same unified compilation unit.

xcdzvyn3y ago

Link is 404: https://web.archive.org/web/20230505055736/https://serge-san...

zdimension3y ago

The post was renamed, the URL changed accordingly: https://serge-sans-paille.github.io/pythran-stories/how-unit...

(HTTP 301 on the old URL would have been appreciated)

kccqzy3y ago

zX41ZdbW3y ago

Tried Unity builds recently for ClickHouse, but without success: https://github.com/ClickHouse/ClickHouse/pull/18952#issuecom...

nine_k3y ago

It's another reminder how <expletives omitted> C++ is, but for a long time nothing better existed.

Grum93y ago

vinyl73y ago

vitno3y ago

This, is just deeply untrue. Do you really think everybody working on compilers and linkers are deeply ignorant? I can easily saturate my 64 GB RAM home setup during a compile.

dagmx3y ago

While everyone else is (rightfully) correcting you, I am curious what sort of codebases you’re working with?

Are you working on large compiled software? Any game, rendering engine or large application benefits from compilation units in my experience.

Some of my libraries that I work with take upwards of an hour for a fresh compile. Having sane compilation units cuts down subsequent iteration to minutes instead.

dblohm73y ago

And that was with unified builds enabled.

glandium3y ago

And more importantly, on that same machine the build would take more than ten minutes without unified builds.

regnerba3y ago

Hahaha tell that to my Unreal Engine build times.

A brand new AMD Epyc, 64 core machine, will take over an hour to compile. Good times.

dagmx3y ago

I’d really like to see a comparison someday between Epics weird C# based build system and something like CMake+Ninja.

I suspect there’s compilation optimizations to be made, but I don’t think it would save more than 30% here and there.

maccard3y ago

> I suspect there’s compilation optimizations to be made

> but I don’t think it would save more than 30% here and there.

Agreed. The clean build with UBT is painfully slow compared to Cmake + Ninja, but the full builds themselves are pretty good, and I'd bet that there's probably less low hanging fruit there.

1 more reply

regnerba3y ago

I would as well! It’s honestly a bit beyond me, the Unreal build tools run deep, so I imagine it would take some effort.

smabie3y ago

Why do clean builds of my code take like 30m then?

j / k navigate · click thread line to collapse