It’s currently difficult to integrate models that use different technologies (e.g. TensorRT, Ludwig, TorchScript, JAX, GGML, etc) into your application, especially if you’re not using Python. Even if you learn the details of integrating each of these frameworks, running multiple frameworks in one process can cause hard-to-debug crashes.
Ideally, the ML framework a model was developed in should just be an implementation detail. Carton lets you decouple your application from specific ML frameworks so you can focus on the problem you actually want to solve.
At a high level, the way Carton works is by running models in their own processes and using an IPC system to communicate back and forth with low overhead. Carton is primarily implemented in Rust, with bindings to other languages. There are lots more details linked in the architecture doc below.
Importantly, Carton uses your model’s original underlying framework (e.g. PyTorch) under the hood to actually execute the model. This is meaningful because it makes Carton composable with other technologies. For example, it’s easy to use custom ops, TensorRT, etc without changes. This lets you keep up with cutting-edge advances, but decouples them from your application.
I’ve been working on Carton for almost a year now and I’m excited to open source it today!
Some useful links:
* Website, docs, quickstart - https://carton.run
* Explore existing models - https://carton.pub
* Repo - https://github.com/VivekPanyam/carton
* Architecture - https://github.com/VivekPanyam/carton/blob/main/ARCHITECTURE...
Please let me know what you think!
Perhaps we can (should?) have some universal package hub, where you can package and push a "thing" from any language, and then pull and use it from any other language. With some metadata describing the input/output schema. The underlying engine can use WASM or containers or something like that.
I guess some parts that are missing are having a schema for the CMD part, and being able to easily generate APIs for various languages from that schema
For exmaple, if your model contains arbitrary Python code, you'd pack it using [1] and then you could load it from another language using [2]. In this case, Carton transparently spins up an isolated Python interpreter under the hood to run your model (even if the rest of your application is in another language).
You can take it one step further if you're using certain DL frameworks. For example, you can create a TorchScript model in Python [3] and then use it from any programming language Carton supports without requiring python at runtime (i.e. your model runs completely in native code).
[1] https://carton.run/docs/packing/python
It uses the NVIDIA drivers on your system, but it should be possible to make the rest of CUDA somewhat portable. I have a few thoughts on how to do this, but haven't gotten around to it yet.
The current GPU enabled torch runners use a version of libtorch that's statically linked against the CUDA runtime libraries. So in theory, they just depend on your GPU drivers and not your CUDA installation. I haven't yet tested on a machine that has just the GPU drivers installed (i.e without CUDA), but if it doesn't already work, it should be very possible to make it work.
Potentially this could be a lot better but I’d be curious what speed overhead the IPC layer adds. At least with ONNX you get a nice speed bump :)
The currently supported platforms [1] were mostly driven by environments I've seen at various tech companies.
I do have active plans to support inference from WASM/WebGPU so maybe that could be a good entrypoint to Windows support.
--
[1] Currently, the supported platforms are:
* `x86_64` Linux and macOS
* `aarch64` Linux (e.g. Linux on AWS Graviton)
* `aarch64` macOS (e.g. M1 and M2 Apple Silicon chips)
* WebAssembly (metadata access only for now, but WebGPU runners are coming soon)
[0] https://onnx.ai
> ONNX converts models while Carton wraps them. Carton uses the underlying framework (e.g. PyTorch) to actually execute a model under the hood. This is important because it makes it easy to use custom ops, TensorRT, etc without changes. For some sophisticated models, "conversion" steps (e.g. to ONNX) can be problematic and require additional validation. By removing these conversion steps, Carton enables faster experimentation, deployment, and iteration.
> With that said, we plan to support ONNX models within Carton. This lets you use ONNX if you choose and it enables some interesting use cases (like running models in-browser with WASM).
More broadly, Carton can compose with other interesting technologies in ways ONNX isn't able to because ONNX is an inference engine while Carton is an abstraction layer.
If someone already has an ONNX model, there's already an in-browser capable ONNX runtime: https://onnxruntime.ai/docs/get-started/with-javascript.html...
(It does use some parts compiled to WASM under the hood, presumably for performance.)
If carton took a TF/pytorch model and just dealt with the conversion into a real runtime, somehow using custom ops for the bits that don't convert, that would be amazing though.
[*] If "any programming language" is Python or Javascript.
It basically runs with the promise that you can package CUDA / PyTorch / Python interpreter into the host language in some way, and use it.
This is true for Android, not true for iOS, true for almost all desktop systems, somewhat true for web (packaging PyTorch + Python interpreter in WASM, the latter is easy, the former, I am unsure), probably not true for FAAS environments (such as Cloudflare worker, or AWS Lambda).
This is the actual hard problem in this domain, not packaging a model file in a zipfile.
In general writing pure Go inference libraries sucks. Not easy to do array/vector manipulation, not easy to do SIMD/CUDA acceleration, cgo is not go, etc. I wrote a fast XGBoost library at least (https://github.com/stillmatic/arboreal) - it's on par with C implementations, but doing anything more complex is going to be tricky.
This was tensorflow btw which has Go bindings support.
It is a smart & worthwhile move, we also needed to drop python for performance/cost gains.
Why a zip file?
This lets us do things like fetch model metadata [1] for a large remote model, by only fetching a few tiny byte ranges instead of the whole model archive.
It also means you can include sample data (images, etc) with your model and they're only fetched when necessary (for example with stable diffusion: https://carton.pub/stabilityai/sdxl)
To give two examples of prior art, it worked for Quake 3 data files (.pk3) & geospatial data files (.kmz)
Maybe it's not the best choice but it doesn't seem like a bad one.
It's hardly revolutionary to do this, here are some common examples of things that are zip files but don't label themselves as such:
- .jar
- .odt, .ods, .odp, .docx, .xlsx, .pptx
- .epub
- .apk
- .crx, .xpi
*As long as that language is python or rust.
What I think is that this is nothing more than a resume-bolstering effort that doesn't really need to exist and probably won't once OP lands a role at whatever FAANG company they're trying to impress.
We all think this. My initial thought was that this is probably a startup selling PyTorch-as-a-Service, and I did not bother to read the article. It turns out that I was wrong, and this might even be useful -- if not for the implementation, then perhaps for the idea.
However, it turns out to make Hacker News a nicer space if we follow these guidelines:
> Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something.
The selling point of this thing is cross-language interoperability, and while they advertise it, they don't deliver.
Sorry, but if your "any language" is "Python or Javascript" your project hasn't even reached the proof of concept stage, it's just a vague idea at this point.
Supporting C++ and C will be 90% of the work and the real challenge.
> What I think is that this is nothing more than a resume-bolstering effort that doesn't really need to exist and probably won't once OP lands a role at whatever FAANG company they're trying to impress.
The title of the post might be click-bait, but there is an obvious asterisk on the homepage of Carton, and even at a quick glance it is obvious that only very few languages are supported. The claim is so obviously false, that I don't mind. I would not expect support for INTERCAL or Awk.
Yes, it does not deliver, but that does not warrant the personal attack. The author of Carton actually already had internships at Google and Facebook, and currently works at Uber.
Additionally I would have some serious performance concerns when it comes to marshaling the data across languages boundaries.