Looks like it's similar in some ways. But they also don't tell too much and even the self-hosting variant is "Talk to us" pricing :/
I think it was made by Microsoft; https://github.com/microsoft/p4vfs
The VFS gets you a few main benefits.
1. You can lazy download and checkout data as you need it. Assuming your build system is sufficiently realized to hide the latency this will save a lot of time as long as accessed data is significantly less then used data (say <80%).
2. You don't need to scan the filesystem to see what has changed. So things like commits, status checks and even builds checking for changes can target just what has actually changed.
Neither of these are particularly complex. The hardest part is integrating with the VCS and build system to take advantage of the change tracking. Git has some support for this (see fsmonitor) build I'm not aware of any build systems that do. (But you still get a lot of benefits without that.)
And as for pricing... are there really that many people working on O(billion) lines of code that can't afford $TalkToUs? I'd reckon that Linux is the biggest source of hobbyist commits and that checks out on my laptop OK (though I'll admit I don't really do much beyond ./configure && make there...)
I.e. this isn't something battel tested for hundreds of thousands of developers 24/7 over the last years. But a simple commercial product sold by people that liked what they used.
Well, since android is their flagship example, anyone that wants to build custom android releases for some reason. With the way things are, you don't need billions of code of your own code to maybe benefit from tools that handle billions of lines of code.
> As the build runs, any step that exactly matches a prior record is skipped and the results are automatically reused
> SourceFS delivers the performance gains of modern build systems like Bazel or Buck2 – while also accelerating checkouts – all without requiring any migration.
Which sounds way too good to be true.
At the start snapshot the filesystem. Record all files read & written during the step.
Then when this step runs again with the same inputs you can apply the diff from last time.
Some magic to automatically hook into processes and doing this automatically seems possible.
I’m actually disappointed this type of thing never caught on, it’s fairly easy on Linux to track every file a program accesses, so why do I need to write dependency lists?
Builds were audited by somehow intercepting things like open(2) and getenv(3) invoked by a compiler or similar tool, and each produced object had an associated record listing the full path to the tool that produced it, its accurate dependencies (exact versions), and environment variables that were actually used. Anything that could affect the reproducibility was captured.
If an object was about to be built with the exact same circumstances as those in an existing record, the old object was reused, or "winked-in", as they called it.
It also provided versioning at filesystem level, so one could write something like file.c@@/trunk/branch/subbranch/3 and use it with any program without having to run a VCS client. The version part of the "filename" was seen as regular subdirectories, so you could autocomplete it even with ancient shells (I used it on Solaris).
I was a build engineer in a previous life. Not for Android apps, but some of the low-effort, high-value tricks I used involved:
* Do your building in a tmpfs if you have the spare RAM and your build (or parts of it) can fit there.
* Don't copy around large files if you can use symlinks, hardlinks, or reflinks instead.
* If you don't care about crash resiliency during the build phase (and you normally should not, each build should be done in a brand-new pristine reproducible environment that can be thrown away), save useless I/O via libeatmydata and similar tools.
* Cross-compilers are much faster than emulation for a native compiler, but there is a greater chance of missing some crucial piece of configuration and silently ending up with a broken artifact. Choose wisely.
The high-value high-effort parts are ruthlessly optimizing your build system and caching intermediate build artifacts that rarely change.
I was brought into a team of 70-ish engineers working across 4-5 products. Big enterprise products written by very bright programmers. But build systems and infrastructure were not their core competency. Their flagship application took 6 hours to build when I was hired. I got it down to 30-45 minutes using a combination of the techniques above and a revamped build infrastructure. When I finally left that position, the build was much more modular, so you could rebuild a small part of it and glue it into a bunch of existing artifacts and have a final product in just a few minutes.
Objection! Long build times are better for sword-fighting time. The longer it takes, the more sword-fighting we have time for!
For example, Mercedes’ MB.OS: “is powered by more than 650 million lines of code” - see: https://www.linkedin.com/pulse/behind-scenes-mbos-developmen...
I’d be pretty happy if Git died and it was replacing with a full Sapling implementation. Git is awful so that’d be great. Sigh.
We're going to 1 billion LoC codebases and there's nothing stopping us!
Though from what I gather form the story, part of the spedup comes from how android composes their build stages.
I.e. speeding up by not downloading everything only helps if you don't need everything you download. And adds up when you download multiple times.
I'm not sure they can actually provide a speedup in a tight developer cycle with a local git checkout and a good build system.
Incremental builds and diff only pulls are not enough in a modern workflow. You either need to keep a fleet of warm builders or you need to store and sync the previous build state to fresh machines.
Games and I'm sure many other types of apps fall into this category of long builds, large assets, and lots of intermediate build files. You don't even need multiple apps in a repo to hit this problem. There's no simple off the shelf solution.
For example, the League of Legends source repo is millions of files and hundreds of GB in size, because it includes things like game assets, vendored compiler toolchains for all of our target platforms, etc. But to compile the game code, only about 15,000 files and 600MB of data are needed from the repo.
That means 99% of the repo is not needed at all for building the code, and that is why we are seeing a lot of success using VFS-based tech like the one described in this blog. In this case, we built our own virtual filesystem for source code based on our existing content-defined patching tech (which we wrote about a while ago [1]). It's similar to Meta's EdenFS in that we built it on top of the ProjFS API on Windows and NFSv3 on macOS and Linux. We can mount a view into the multimillion-file repo in 3 seconds, and file data (which is compressed and deduplicated and served through a CDN) is downloaded transparently when a process requests it. We use a normal caching build system to actually run the build, in our case FASTBuild.
I recently timed it, and I can go from having nothing at all on disk to having locally built versions of the League of Legends game client and server in 20 seconds on a 32-core machine. This is with 100% cache hits, similar to the build timings mentioned in the article.
[1] https://technology.riotgames.com/news/supercharging-data-del...
And a fleet of warm builders seems pretty reasonable at that scale.
SourceFS sounds useful for extra smart caching but some of these problems do sound like they're just bad fixable configuration.