The convergence of compilers, build systems and package managers (opens in new tab)

(blog.ezyang.com)

100 pointsvimes65610y ago40 comments

40 comments

This is a good analysis. A potential option for his proposed build output format is http://nixos.org/nix/ expressions.

Nix is already kind of a fusion of a package manager and a build system. It's rather mature, has a decent ecosystem and does a lot of what he is looking for:

- Complete determinism, handling of multiple versions, total dependency graphs - Parallel builds (using that dependency graph) - Distribution

One major benefit of a solution like generating Nix files over compiler integration is that it works for cross-language dependencies. A lot of the time integrated solutions break down is for things like if a C extension of a Ruby gem relies on the presence of imagemagick. Nix has no problem handling that kind of dependency.

Also of course it is a lot less work to generate Nix expressions than it is to write a package manager. There are already scripts like https://github.com/NixOS/cabal2nix which already solve problems of the packaging system they replace.

nwmcsween10y ago

Language specific tools (compiler in this case) could allow for things such as checking if library foo's api which library bar 'emulates' is sufficient for program c which depends on foo's API. With a lot more effort you could automate all dependency tracking and alternative library compatibility.

jimrandomh10y ago

I'd prefer to see package managers be more separate. Downloading dependencies involves interacting with third parties and has security implications. That's fine when you're downloading software intentionally from a source you trust, but I've had the experience of downloading source code and noticing that the build system is downloading from sources I don't recognize or without https, and that's not so good.

glenjamin10y ago

I think this is a good argument for having the location of sources be configurable separately from the place where sources are included - but not necessarily in a different tool.

fizixer10y ago

There is a problem but you're looking at it the wrong way.

What's in the compiler's machine code generation phase that the build system needs to know about? If nothing, then making a monolithic system is only going to make your life miserable.

Well-designed compilers are already split into (at least) two subsystems: frontend and backend. Frontend takes the program and spits an AST (very roughly speakign, although there is also the semantic-analysis phase, and intermediate-representation-generation phase). Backend is concerned with code generation. What your build systems needs is just the frontend part. Not only your build system, but also an IDE can benefit greatly from a frontend (which as one of the commenter pointed out, results in wasteful duplication of effort when and IDE writer decides to roll his/her own language parser embedded in the tool).

I think AST and semantic-analyzer are going to play an increasing role in a variety of software development activities and it's a folly to keep them hidden inside the compiler like a forbidden fruit.

And that's the exact opposite of monolith. It's more fragmentation, and splitting the compiler into useful and resusable pieces.

(At this point I have a tendency to gravitate towards recommending LLVM. Unfortunately I think it's a needlessly complicated project, not the least because of being written in a needlessly complicated language. But if you're okay with that it might be of assistance to your problems)

ezyang10y ago

I think it is uncontroversial that many tools would benefit from access to the frontend. But what is the interface to the frontend? There is tremendous diversity among frontends; it's the key distinguishing feature of most languages. What exactly are you going to expose? How should the build system interact with this?

LLVM is a great example of a modular compiler which is a pain to program against, because it is being constantly being refactored with BC-breaking changes. As silvas has said, "LLVM is loosely coupled from a software architecture standpoint, but very tightly coupled from a development standpoint". <https://lwn.net/Articles/583271/> In contrast, I can generally expect a command mode which dumps out Makefile-formatted dependencies to be stable across versions. Modularity is not useful for external developers without stability!

fizixer10y ago

> What exactly are you going to expose? How should the build system interact with this?

The case of AST is relatively clear. You create a library module, or a standalone program, that takes source code and generates an AST. The AST could be in JSON format, s-expression (that would be my choice), heck even XML. It's better if the schema adheres to some standard (I think LLVM AST can be studied for inspiration, or even taken as standard) but even if it's not, that's a problem orthogonal to that of monolithic vs modular. Once you have an source-to-AST converter, it can be used inside a compiler, text-editor, build system, or something else. These are all "clients" of the source-to-AST module.

I'm not too sure about semantic-analysis since I'm studying it myself at the moment. All I can say at the moment (without expert knowledge of formal semantics) that once AST is used in more and more tools, semantic analysis would follow, and hopefully conventions would emerge for treating it in a canonical fashion. Short of that, every "AST client" can roll their own ad-hoc semantic-analyzer built-into the tool itself. Note that it would still be way more modular than a monolithic design.

henrikschroder10y ago

> I think AST and semantic-analyzer are going to play an increasing role in a variety of software development activities

It would be fantastic if source control systems would work on the AST instead of the plain text files, so many annoying problems could be solved there.

Ace1710y ago

There are simpler and less intrusive ways to solve outside-AST issues (like an automatic reformatting pass before each build or each diff)

What you're suggesting raises a bunch of new (non-trivial) issues:

- What would you do with code comments? Things like "f(/+old_value+/new_value)".

- How to store code before preprocessing (C and C++) ?

- How to store files mixing several languages (PHP, HTML, Javascript) ?

- How do you store code for a DSL?

1 more reply

sitkack10y ago

If version control stored refactorings (tree operations) against the AST it would open up a whole new world of possibilities.

GFK_of_xmaspast10y ago

What kinds of problems you talking about.

1 more reply

johncolanduoni10y ago

I think a good way to handle this convergence would be to move away from repeated invocations of the compiler. Instead, your build system/IDE could keep an active instance of the compiler running that keeps the last generated ASG/intermediate representation/machine code in memory, along with bookkeeping information that remembers the relations between these. When a file changes/a rebuild is requested, this lets the build system to not only correctly identify dependencies, but also allows far more granular incremental compilation than is possible now. For example, if you really only change one function, then only that function's code (and possibly any code which relies on optimizations based on that function's form) need to be changed. You could even add more to this by tracking dependencies in unit tests and automatically re-running them when the result may have changed, or using it to facilitate some sort of live-coding.

This sounds like it would need a huge amount of memory, but IDEs already do this to the ASG level, and much memory and computation is wasted on the compiler re-generating the ASG in parallel when the IDE has a similar one already analyzed. The main disadvantage is it would restrict how build systems could be structured, as to pull this off the build system would need to have much more ability to reason about the build (call command X if Y has changed after Z won't cut it). Macro systems would also need to be more predictable.

As far as keeping things non-monolithic, you could still have plenty of separation between each phase of the compilation process, the only extra interface you would need between passes is the more granular dependency tracking.

edit: grammar

spc47610y ago

At work I do multi-language development. One program has a mixture of C, C++ and Lua as a (more or less) static executable (Lua modules written in C and Lua are included in the executable). It was easy enough to add the steps to make to ensure the Lua files were recompiled (into an object file) if they changed. I'm bullish (skeptical) that this will be easy with an all-singing-all-dancing compiler/build/package program.

anewhnaccount10y ago

Bullish means optimistic.

1 more reply

taylanub10y ago

I was thinking "wow that's a great idea" while reading through this and then noticed it's basically what a Lisp environment does. :-)

(And probably the environments of many other languages, but Lisp implementations (including Scheme implementations) tend to do this exceptionally well. A very good example might be Emacs, which has just an amazing help system and all.)

qznc10y ago

There is certainly a trend to provide the frontend separately. C# with "Compiler as a Service", libclang, etc.

The reason is not just IDE integration. You want a separate lexer (and sometimes parser) for tools like go-fmt or good syntax highlighting. Static analyzers are getting more important and there are many.

iainmerrick10y ago

This is a great writeup of what I think is an unfortunate trend. When your compiler, build system and IDE are all tightly coupled, you end up locked into a single language.

It's hard to develop pieces of your code in multiple languages and have everything play well together. But for many projects that's a good way to do things. For example, in games programming, you might want to use an offline texture compression tool. Ideally that should be integrated into the overall build; but you shouldn't have to write your texture compressor in Haskell or C++ or whatever just because that's what the game is written in.

I think Xcode is what a lot of other IDEs and build systems are moving towards. Xcode is nice as long as you're working with normal code in permitted languages (C++, Obj-C, Swift) and permitted resource formats (NIBs). But if you need to do something slightly unusual, like calling a shell script to generate resources, it's horrible.

Oh, and I didn't even mention package managers! Having those tightly coupled to the other tools is horrible too.

bluetomcat10y ago

> But if you need to do something slightly unusual, like calling a shell script to generate resources, it's horrible.

Not quite true. Xcode provides a "Run Script" build phase that lets you enter your shell script right into the IDE. A lot of handy environment variables are also there. You can easily reach your project via $SRCROOT, or modify the resources of the output bundle via "${CONFIGURATION_BUILD_DIR}/${PRODUCT_NAME}".

iainmerrick10y ago

That's the sort of stuff I mean when I say "horrible". :)

It'll just run the script every time, rather than doing anything smart with dependencies. Output from the script might or might not be picked up and tracked properly by the IDE. If you accidentally mess something up nothing will detect or prevent that.

(Edit: should add that I haven't given it a proper try in recent Xcode versions. I probably should.)

cat-dev-null10y ago

Configuration management systems (chef, puppet, cfe2/3) is a superset of package management#, which is, in turn, depends on frozen static artifacts of each project's build system.

# This is because configuration management installs files, packages, templates files and runs commands pre/post, similar to how most package managers work, but at a fine-grain level of user customized as opposed to maintainer customized.

The meta is that one could consider a "system" or approach by where the project build and configuration management systems were seamless. One main challenge in doing so would be that the staticness of artifacts allows for reproducible compatibility, whereas end-to-end configurability can easily become Gentoo.

chrispeel10y ago

Julia (http://julialang.org/) is an example of a system that combines the three. The JIT compiler is available at a REPL, pre-compiled packages available in v0.4 may be considered as coming from a build system, and the package manager comes with every Julia distribution.

jacquesm10y ago

In the days of Turbo Pascal even the editor was part of the whole. Compiler, build system, editor all tied together as a single executable. Package management wasn't on the horizon yet for that environment so it wasn't included. But there is definitely a precedent for this and this kind of convergence is hardly a new thing.

Personally I don't like it much. I prefer my tools to be separate and composable.

toolslive10y ago

This is yet another problem caused by the fact compilers have a 'file in, file out' interface. (the other problem is performance of the compilation/linking/packaging process). There's simply no reason why input to a compiler should be a file, and no reason why the result should be one.

jmnicolas10y ago

Can you give some examples of what we could pass to a compiler that is not a file ?

dgellow10y ago

Could you be more explicit? What do you want instead of a file?

toolslive10y ago

Suppose the IDE has the code in memory, it wants it compiled (or syntax checked, or ...). Currently, it needs to write out a file, then call the compiler, and pick up the result from the file system, and parse the result. Ideal, would be a programmer's API to do this. But a compilation service, with RPC calls would already be a major improvement.

Also, if you store code, intermediate results, dependencies, ..., in a database (could even be in memory), you can reuse these intermediate results. The dependency graph can guide you to decide what needs to be changed etc.

1 more reply

doug100110y ago

is sbt an early prototype of what the OP has in mind? I like sbt because it seems to me that by combining (at least parts) of each of compiling, building, and packaging, the entire workflow is streamlined. So for instance, sbt has an incremental re-compiler which applies a suite of heuristics to minimize the code required to be recompiled and is triggered automatically by any change to the source. In practice this is a huge time saver, but it wouldn't work without relying on sbt's obviously detailed knowledge of the dependency graph. Another example: sbt can also handle package management and deployment, largely via plugins (eg, "native-packager", "sbt-ghpages", "assembly" (uber jar), "sbt-elasticbeanstalk"

haberman10y ago

This analysis misses one major part of the equation: configuring the build. Almost every non-trivial piece of software can be built in multiple configurations. Debug vs. Release, with/without feature X, using/not using library Y.

The configuration of the build can affect almost every aspect of the build. Which tool/compiler is called, whether certain source files are included in the build or not, compiler flags (including what symbols are predefined), linker flags, etc. One tricky part about configuration is that it often needs a powerful (if not Turing-complete) language to fully express. For example, "feature X can only be enabled if feature Y is also enabled." If you use the autotools, you write these predicates in Bourne Shell. Linux started with Bourne Shell, then Eric Raymond tried to replace it with CML2 (http://www.catb.org/~esr/cml2/), until a different alternative called LinuxKernelConf won out in the end (http://zippel.home.xs4all.nl/lc/).

Another thing missing from the analysis are build-time abstractions over native OS facilities. The most notable example of this is libtool. The fundamental problem libtool solves is: building shared libraries is so far from standardized that it is not reasonable for individual projects that want to be widely portable to attempt to call native OS tools directly. They call libtool, which invokes the OS tools.

In the status quo, the separation between configuration and build system is somewhat delineated: ./configure spits out Makefile. But this interface isn't ideal. "make" has way too much smarts in it for this to be a clean separation. Make allows predicates, complex substitutions, implicit rules, it inherits the environment, etc. If "make" was dead simple and Makefiles were not allowed any logic, then you could feasibly write an interface between "make" and IDEs. The input to make would be the configured build, and it could vend information about specific inputs/outputs over a socket to an IDE. It could also do much more sophisticated change detection, like based on file fingerprints instead of timestamps.

But to do that, you have to decide what format "simple make" consumes, and get build configuration systems to output their configured builds in this format.

I've been toying around with this problem for a while and this is what I came up with for this configuration->builder interface. I specified it as a protobuf schema: https://github.com/haberman/taskforce/blob/master/taskforce....

Tyr4210y ago

"simple make" is ninja, right?

https://ninja-build.org/

Already works with cmake and so on.

GFK_of_xmaspast10y ago

Doesn't work with fortran, which I found out to my chagrin just this afternoon.

1 more reply

GFK_of_xmaspast10y ago

I hadn't heard of CML2, but it looks like even several years later it was remembered as a "no don't do that": https://lkml.org/lkml/2007/7/28/145 (also: http://www.linuxtoday.com/developer/2002021700120NWKNDV " What ESR should do is try create less controvery. Do not change megabyte to mebibyte in the help files. Find out why Linus doesn't include the updated configure.help files, then fix them and get them included right away. Do not change the UI, then old UI is documented and people are familiar with it." )

Ace1710y ago

Thanks for this overview and analysis. You inspired me a new blogpost, which might be seen as an answer to your post: http://code.alaiwan.org/wp/?p=84

j / k navigate · click thread line to collapse

40 comments

trishume10y ago

This is a good analysis. A potential option for his proposed build output format is http://nixos.org/nix/ expressions.

Nix is already kind of a fusion of a package manager and a build system. It's rather mature, has a decent ecosystem and does a lot of what he is looking for:

- Complete determinism, handling of multiple versions, total dependency graphs - Parallel builds (using that dependency graph) - Distribution

nwmcsween10y ago

jimrandomh10y ago

glenjamin10y ago

I think this is a good argument for having the location of sources be configurable separately from the place where sources are included - but not necessarily in a different tool.

fizixer10y ago

There is a problem but you're looking at it the wrong way.

What's in the compiler's machine code generation phase that the build system needs to know about? If nothing, then making a monolithic system is only going to make your life miserable.

I think AST and semantic-analyzer are going to play an increasing role in a variety of software development activities and it's a folly to keep them hidden inside the compiler like a forbidden fruit.

And that's the exact opposite of monolith. It's more fragmentation, and splitting the compiler into useful and resusable pieces.

ezyang10y ago

fizixer10y ago

> What exactly are you going to expose? How should the build system interact with this?

henrikschroder10y ago

> I think AST and semantic-analyzer are going to play an increasing role in a variety of software development activities

It would be fantastic if source control systems would work on the AST instead of the plain text files, so many annoying problems could be solved there.

Ace1710y ago

There are simpler and less intrusive ways to solve outside-AST issues (like an automatic reformatting pass before each build or each diff)

What you're suggesting raises a bunch of new (non-trivial) issues:

- What would you do with code comments? Things like "f(/+old_value+/new_value)".

- How to store code before preprocessing (C and C++) ?

- How to store files mixing several languages (PHP, HTML, Javascript) ?

- How do you store code for a DSL?

1 more reply

sitkack10y ago

If version control stored refactorings (tree operations) against the AST it would open up a whole new world of possibilities.

GFK_of_xmaspast10y ago

What kinds of problems you talking about.

1 more reply

johncolanduoni10y ago

edit: grammar

spc47610y ago

anewhnaccount10y ago

Bullish means optimistic.

1 more reply

taylanub10y ago

I was thinking "wow that's a great idea" while reading through this and then noticed it's basically what a Lisp environment does. :-)

qznc10y ago

There is certainly a trend to provide the frontend separately. C# with "Compiler as a Service", libclang, etc.

iainmerrick10y ago

This is a great writeup of what I think is an unfortunate trend. When your compiler, build system and IDE are all tightly coupled, you end up locked into a single language.

Oh, and I didn't even mention package managers! Having those tightly coupled to the other tools is horrible too.

bluetomcat10y ago

> But if you need to do something slightly unusual, like calling a shell script to generate resources, it's horrible.

iainmerrick10y ago

That's the sort of stuff I mean when I say "horrible". :)

(Edit: should add that I haven't given it a proper try in recent Xcode versions. I probably should.)

cat-dev-null10y ago

Configuration management systems (chef, puppet, cfe2/3) is a superset of package management#, which is, in turn, depends on frozen static artifacts of each project's build system.

chrispeel10y ago

jacquesm10y ago

Personally I don't like it much. I prefer my tools to be separate and composable.

toolslive10y ago

jmnicolas10y ago

Can you give some examples of what we could pass to a compiler that is not a file ?

dgellow10y ago

Could you be more explicit? What do you want instead of a file?

toolslive10y ago

1 more reply

doug100110y ago

haberman10y ago

But to do that, you have to decide what format "simple make" consumes, and get build configuration systems to output their configured builds in this format.

Tyr4210y ago

"simple make" is ninja, right?

https://ninja-build.org/

Already works with cmake and so on.

GFK_of_xmaspast10y ago

Doesn't work with fortran, which I found out to my chagrin just this afternoon.

1 more reply

GFK_of_xmaspast10y ago

Ace1710y ago

Thanks for this overview and analysis. You inspired me a new blogpost, which might be seen as an answer to your post: http://code.alaiwan.org/wp/?p=84

j / k navigate · click thread line to collapse