/*
* The point of this file is to contain all the LLVM C++ API interaction so that:
* 1. The compile time of other files is kept under control.
* 2. Provide a C interface to the LLVM functions we need for self-hosting purposes.
* 3. Prevent C++ from infecting the rest of the project.
*/
// copied from include/llvm/ADT/Triple.h
enum ZigLLVM_ArchType {
ZigLLVM_UnknownArch,
ZigLLVM_arm, // ARM (little endian): arm, armv.*, xscale
ZigLLVM_armeb, // ARM (big endian): armeb
ZigLLVM_aarch64, // AArch64 (little endian): aarch64
...
and then in the .cpp file: static_assert((Triple::ArchType)ZigLLVM_UnknownArch == Triple::UnknownArch, "");
static_assert((Triple::ArchType)ZigLLVM_arm == Triple::arm, "");
static_assert((Triple::ArchType)ZigLLVM_armeb == Triple::armeb, "");
static_assert((Triple::ArchType)ZigLLVM_aarch64 == Triple::aarch64, "");
static_assert((Triple::ArchType)ZigLLVM_aarch64_be == Triple::aarch64_be, "");
static_assert((Triple::ArchType)ZigLLVM_arc == Triple::arc, "");
...
I found it more convenient to redefine the enum and then static assert all the values are the same, which has to be updated with every LLVM upgrade, than to use the actual enum, which would include a bunch of other C++ headers.The file that has to use C++ headers takes about 3x as long to compile than Zig's ir.cpp file which is nearing 30,000 lines of code, but only depends on C-style header files.
The compiler firewall strategy works fairly well in C++11 and even better in C++14. Create a public interface with minimal dependencies, and encapsulate the details for this interface in a pImpl (pointer to implementation). The latter can be defined in implementation source files, and it can use unique_ptr for simple resource management. C++14 added the missing make_unique, which eases the pImpl pattern.
That being said, compile times in C++ are going to typically be terrible if you are used to compiling in C, Go, and other languages known for fast compilation times. A build system with accurate dependency tracking and on-demand compilation (e.g. a directory watcher or, if you prefer IDEs, continuous compilation in the background) will eliminate a lot of this pain.
Does a client of the framework you are writing -- which is probably using STL internally -- need a single instruction operation for adding a value for a call that you make less than 0.001% of the time?
Optimization is about end results. Apply the Pareto Principle, and don't forget that your users also need to compile your code in a reasonable amount of time.
It's a tradeoff between compile time and complexity.
Large-Scale C++ Software Design[0]
The techniques set forth therein are founded in real-world experience and can significantly address large-scale system build times. Granted, the book is dated and likely not entirely applicable to modern C++, yet remains the best resource regarding insulating modules/subsystems and optimizing compilation times IMHO.
0 - https://www.pearson.com/us/higher-education/program/Lakos-La...
Recently, after a ten year absence of not using ccache, I was playing with it again.
The speed-up from ccache you obtain today is quite a bit more more than a decade ago; I was amazed.
ccache does not cache the result of preprocessing. Each time you build an object, ccache passes it through the preprocessor to obtain the token-level translation unit which is then hashed to see if there is a hit (ready made .o file can be retrieved) or miss (preprocessed translation unit can be compiled).
There is now more than a 10 fold difference between preprocessing, hashing and retrieving a .o file from the cache, versus doing the compile job. I just did a timing on one program: 750 milliseconds to rebuild with ccache (so everything is preprocessed and ready-made .o files are pulled out and linked). Without ccache 18.2 seconds. 24X difference! So approximately speaking, preprocessing is less than 1/24th of the cost.
Ancient wisdom about C used to be that more than 50% of the compilation time is spent on preprocessing. That's the environment from which came the motivations for devices like precompiled headers, #pragma once and having compilers recognize the #ifndef HEADER_H trick to avoid reading files.
Nowadays, those things hardly matter.
Nowdays when you're building code, the rate at which .o's "pop out" of the build subjectively appears no faster than two decades ago, even though the memories, L1 and L2 cache sizes, CPU clock speeds, and disk spaces are vastly greater. Since not a lot of development has gone into preprocessing, it has more or less sped up with the hardware, but overall compilation hasn't.
Some of that compilation laggardness is probably due to the fact that some of the algorithms have tough asymptotic complexity. Just extending the scope of some of the algorithms to do a bit of better job causes the time to rise dramatically. However, even compiling with -O0 (optimization off), though faster, is still shockingly slow, given the hardware. If I build that 18.2 second program with -O2, it still takes 6 seconds: an 8X difference compared to preprocessing and linking cached .o files in 750 ms. A far cry from the ancient wisdom that character and token level processing of the source dominates the compile time.
Ancient wisdom was that more than 50% of the time is spent compiling the headers, after they become a part of your translation unit after preprocessing. I don't see why preprocessing itself would ever be singled out, given that it's comparatively much simpler than actual compilation.
Only if you explicitly disable 'direct mode'.
In my opinion, this makes any conclusion dubious. If you really care about compile times in C++, step 0 is to make sure you have an adequate machine (at least quadcore CPU/ lot of RAM/SSD). If the choice is between spending programmer time trying to optimize compile times, versus spending a couple hundred dollars for an SSD, 99% of the time, spending money on an SSD will be the correct solution.
Presumably, 127/128 runs have both the test file and the single header file in memory cache, so the distinction is moot.
Also, I find the conclusion that we should all just buy top end machines and ignore performance problems that don't manifest there fairly unconvincing. I think that kind of thinking is responsible for a good chunk of the reason the web is so bloated today. :-)
For any kind of even vaguely profitable software, your developers should all have kick arse machines.
But they should test on a $200 laptop :)
You don't need to develop on a $200 notebook to care about performance.
I've seen this problem a few times, someone looks at their N core machine with M GB and says, oh look i'm only using 3/4 of M so when I buy the 4xN cores machine I'm going to put M ram in it again. Then everything runs poorly because the disks are getting hammered now that there are another 32 jobs (or whatever) each consuming a GB. Keep adding ram until their is still free RAM during the build. Its going to run faster from ram that waiting for a super speedy disk to read the next c/.o/etc file.
Note that Visual Studio, for example, does a poor job of this because it only spawns one compilation per CPU thread. This results in individual threads being idle more than they ought to be.
I've just tested one of my ~300 KLOC C++ projects, broken into 479 .cpp files and 583 .h files.
Using Linux (GCC) after dropping the disk cache, on a 5400 RPM HD, the full build on 14 threads took: 78 seconds.
On a fast SSD (same machine, after dropping caches again) it took 61 seconds.
Linking was ~7 seconds faster on the SSD, so arguably you could say that actual compilation wasn't the same ratio as fast, but overall build time is most definitely faster.
Source was on the same drive as the build target directory.
At a previous company I worked at, we got SSDs to speed up compilation (and it did).
If it doesn't make a difference, all that means is that your project is small, or doesn't have too many dependencies. Good for you. But that's not the reality for all projects.
We'll have them Soon™.
It probably partially depends on whether old-style headers can be used simultaneously with new-style modules.
The drawback is that sources from the jumbo can not be compiled in parallel. So if one has access to extremely parallel compilation farm, like developers at Google, it will slow down things.
Generally the way this works is rather than compiling into one jumbo file, you combine into multiple files, and you can then compile them in parallel. UE4 supports it (disclosure, I work for them). and it works by including 20 files at a time, and compiling the larger files normally.
There is also a productivity slow down where a change to any of those files causes the all the other files to be recompiled, so you can remove those files from the individual file.
> The compilation time saving can be up to factor of 2 or more on a laptop.
The compilation time savings are orders of magnitude in my experience, even on a high end desktop. That's for a full build. For an incremental, there is a penalty (see above for workarounds)
[1] https://api.unrealengine.com/INT/Programming/UnrealBuildSyst...
Compilation at the time took over 2 hours.
At some point I wrote a macro that replaced all those automatically generated const uints with #defines, and that cut compilation time to half an hour. It was quickly declared the biggest productivity boost by the project lead.
Precompiled headers are a pretty ugly solution and the way they've been implemented in the past could be really nasty. (IIRC in old GCC versions it would copy some internal state to disk, then later load it from disk and manually adjust pointers!)
Basically, instead of defining a real serialization format (and thus having to write serializer/deserializer code), it's way easier to just `fwrite` out your internal structs to disk, one after another, and write some much simpler walker code to walk through any pointed fields appropriately. At some point though this becomes technical debt which needs to be repaid in the form of a total serialization rewrite.
Blender, the popular open-source 3D modelling tool, uses a format like this for their .blend files, and it is really gross. IIRC a few releases back they started working to improve the format to be a little less dependent on low-level internal details, but now they have the nightmare of backwards compatibility to deal with.
The basic problem is that C/C++ have no mechanism for native serialization, unlike e.g. Java, Python, or any number of other languages, so you're either stuck `fwrite`ing structs or reinventing the wheel.
Yes, I was measuring time to rebuild everything (including the PCH) from scratch. So it's probable that incremental compilation is slightly faster using PCH, it's just not nearly as much as I was hoping for.
They force programmers to tell the compiler what intermediate result to cache. Finding the best intermediate result to cache is a black art, and that set will change when your source code grows, forcing you to either accept that your precompiled headers may not help that much, or to spend large amounts of time optimizing build times.
Simply put - if it's #include <...>, it goes into the precompiled header. Otherwise, it goes directly into the source.
The downside of this is that every time you add a new dependency, the entire project is rebuilt, since the change in your precompiled header affects all translation units. But adding dependencies is rare, and changing code and rebuilding is far more common.