Show HN: Carton – Run any ML model from any programming language (opens in new tab)

(carton.run)

196 pointsvpanyam2y ago53 comments

The goal of Carton is to let you use a single interface to run any machine learning model from any programming language.

It’s currently difficult to integrate models that use different technologies (e.g. TensorRT, Ludwig, TorchScript, JAX, GGML, etc) into your application, especially if you’re not using Python. Even if you learn the details of integrating each of these frameworks, running multiple frameworks in one process can cause hard-to-debug crashes.

Ideally, the ML framework a model was developed in should just be an implementation detail. Carton lets you decouple your application from specific ML frameworks so you can focus on the problem you actually want to solve.

At a high level, the way Carton works is by running models in their own processes and using an IPC system to communicate back and forth with low overhead. Carton is primarily implemented in Rust, with bindings to other languages. There are lots more details linked in the architecture doc below.

Importantly, Carton uses your model’s original underlying framework (e.g. PyTorch) under the hood to actually execute the model. This is meaningful because it makes Carton composable with other technologies. For example, it’s easy to use custom ops, TensorRT, etc without changes. This lets you keep up with cutting-edge advances, but decouples them from your application.

I’ve been working on Carton for almost a year now and I’m excited to open source it today!

Some useful links:

* Website, docs, quickstart - https://carton.run

* Explore existing models - https://carton.pub

* Repo - https://github.com/VivekPanyam/carton

* Architecture - https://github.com/VivekPanyam/carton/blob/main/ARCHITECTURE...

Please let me know what you think!

53 comments

brap2y ago

Just some random brain dump: Why limit to ML models?

Perhaps we can (should?) have some universal package hub, where you can package and push a "thing" from any language, and then pull and use it from any other language. With some metadata describing the input/output schema. The underlying engine can use WASM or containers or something like that.

mikeravkine2y ago

..isn't this just Docker?

brap2y ago

Well... yeah, kind of.

I guess some parts that are missing are having a schema for the CMD part, and being able to easily generate APIs for various languages from that schema

armchairhacker2y ago

Dynamic libraries, command-line executables, …

jcrash2y ago

So this means if I want to use a ML model I made in python, but don't want to code the rest of the application in python I can do that?

vpanyamOP2y ago

Yes, that's a use case Carton supports.

For exmaple, if your model contains arbitrary Python code, you'd pack it using [1] and then you could load it from another language using [2]. In this case, Carton transparently spins up an isolated Python interpreter under the hood to run your model (even if the rest of your application is in another language).

You can take it one step further if you're using certain DL frameworks. For example, you can create a TorchScript model in Python [3] and then use it from any programming language Carton supports without requiring python at runtime (i.e. your model runs completely in native code).

[1] https://carton.run/docs/packing/python

[2] https://carton.run/docs/loading

[3] https://carton.run/docs/packing/torchscript

ZeroCool2u2y ago

Seems almost too good to be true, but I really hope it's not. How does it handle things like CUDA dependencies? Can it somehow make those portable too? Or is GPU acceleration not quite there yet?

vpanyamOP2y ago

Thanks :)

It uses the NVIDIA drivers on your system, but it should be possible to make the rest of CUDA somewhat portable. I have a few thoughts on how to do this, but haven't gotten around to it yet.

The current GPU enabled torch runners use a version of libtorch that's statically linked against the CUDA runtime libraries. So in theory, they just depend on your GPU drivers and not your CUDA installation. I haven't yet tested on a machine that has just the GPU drivers installed (i.e without CUDA), but if it doesn't already work, it should be very possible to make it work.

jcrash2y ago

That’s awesome! Thanks for making this

jarym2y ago

This looks interesting - I use OONX to call my PyTorch models from .NET but so far it’s meant I’ve not been able to test out JAX based libraries since they don’t have ONNX export and it has also meant I had to write C# boilerplate code to preprocess my input data into the form required by the model.

Potentially this could be a lot better but I’d be curious what speed overhead the IPC layer adds. At least with ONNX you get a nice speed bump :)

civilitty2y ago

Any plans to support Windows? That would make Carton the ultimate library to embed LLMs into desktop applications

vpanyamOP2y ago

I'm definitely open to it if there's interest (or if someone wants to help), but I don't have plans to implement Windows support myself at the moment.

The currently supported platforms [1] were mostly driven by environments I've seen at various tech companies.

I do have active plans to support inference from WASM/WebGPU so maybe that could be a good entrypoint to Windows support.

[1] Currently, the supported platforms are:

* `x86_64` Linux and macOS

* `aarch64` Linux (e.g. Linux on AWS Graviton)

* `aarch64` macOS (e.g. M1 and M2 Apple Silicon chips)

* WebAssembly (metadata access only for now, but WebGPU runners are coming soon)

Nischalj102y ago

is this ancillary to what [these guys](https://github.com/unifyai/ivy) are trying to do?

carbocation2y ago

That seems different to me. OP is talking about using ML models outside of python (well, in python, too). That link seems to be talking about using ML models across frameworks (pytorch, tensorflow, jax, etc) in python.

Nischalj102y ago

got it. went through both of the codebases. what you say is the case. thanks!

gorenb2y ago

This HN post looks really weird on mobile (no, not the website, HN itself)

astronautas2y ago

Is this the same as Nvidia's Triton?

capableweb2y ago

I think this Carton project is on a lower level than Triton. With Triton you'd start the Triton server then make requests against it, while Carton is more like a library that you include in your application/library and code it with the same language you'd write your application/library.

astronautas2y ago

True!

gemaif1li2y ago

When will you release a Java client?

carbocation2y ago

I'd love to see this for golang (even without GPU support).

Areibman2y ago

Maybe I'm missing something here, isn't this largely achieved by ONNX already?

[0] https://onnx.ai

vpanyamOP2y ago

That's a good question! There's an FAQ entry on the homepage that touches on this, but let me know if I can improve it:

> ONNX converts models while Carton wraps them. Carton uses the underlying framework (e.g. PyTorch) to actually execute a model under the hood. This is important because it makes it easy to use custom ops, TensorRT, etc without changes. For some sophisticated models, "conversion" steps (e.g. to ONNX) can be problematic and require additional validation. By removing these conversion steps, Carton enables faster experimentation, deployment, and iteration.

> With that said, we plan to support ONNX models within Carton. This lets you use ONNX if you choose and it enables some interesting use cases (like running models in-browser with WASM).

More broadly, Carton can compose with other interesting technologies in ways ONNX isn't able to because ONNX is an inference engine while Carton is an abstraction layer.

WorldMaker2y ago

> This lets you use ONNX if you choose and it enables some interesting use cases (like running models in-browser with WASM)

If someone already has an ONNX model, there's already an in-browser capable ONNX runtime: https://onnxruntime.ai/docs/get-started/with-javascript.html...

(It does use some parts compiled to WASM under the hood, presumably for performance.)

Dayshine2y ago

ONNX runtime doesn't convert models, it runs them, and it has bindings in several languages. And most importantly it's tiny compared to the whole python package mess you get with TF or pytorch.

If carton took a TF/pytorch model and just dealt with the conversion into a real runtime, somehow using custom ops for the bits that don't convert, that would be amazing though.

ZeroCool2u2y ago

There's an ONNX runtime, but to use the runtime you do need to convert your model into ONNX format first. You can't just run a TF of PyTorch model using the ONNX runtime directly. (At least last time I checked.) Unfortunately this conversion process can be a pain and there needs to be an equivalent operator in ONNX for each op in your TF/Torch execution graph.

otabdeveloper42y ago

> From any [*] programming language.

[*] If "any programming language" is Python or Javascript.

liuliu2y ago

This is a reasonable approach for systems that allowed to load binaries (either the running artifact is a binary or semi-binary (WASM executable) or it allows to load .so / .dll from user-provided places).

It basically runs with the promise that you can package CUDA / PyTorch / Python interpreter into the host language in some way, and use it.

This is true for Android, not true for iOS, true for almost all desktop systems, somewhat true for web (packaging PyTorch + Python interpreter in WASM, the latter is easy, the former, I am unsure), probably not true for FAAS environments (such as Cloudflare worker, or AWS Lambda).

otabdeveloper42y ago

It's gonna fall apart in a spectacular way when they try to marshal data across compiled language boundaries.

This is the actual hard problem in this domain, not packaging a model file in a zipfile.

astronautas2y ago

Make it for Go, and I am sold. Running ML models in Go services is still an unsolved problem.

r0l12y ago

We have a similar high performance AI stack written in Go capable to load many different models from different frameworks. This is work of several years. Just saw your comment and thought about our company internal talk to release everything under an open source license. Thanks for reminding me :) What are your use-cases?

astronautas2y ago

Wow, make it open source quickly!!! :hype:. It's a classic Python REST API for model serving. But we have very low latency constraints. As such, rewriting in more high performant backend languages e.g. Go or Rust would substantially reduce resource usage (by reducing horizontal scaling need). Pre-baked model serving frameworks e.g. Nvidia's Triton aren't an option, since we have to query a feature store, and do some input feature tracking in between. Go seemed like an efficient, developer friendly choice, but there aren't any well maintained model inference libraries in Go up to this day...

huac2y ago

We used Triton Inference Server (with a Golang sidecar to translate requests) for model serving and a separate Go app that handled receiving the request, fetching features, sending to Triton, doing other stuff with the response, serving. This scaled to 100k QPS with pretty good performance but does require some hops.

In general writing pure Go inference libraries sucks. Not easy to do array/vector manipulation, not easy to do SIMD/CUDA acceleration, cgo is not go, etc. I wrote a fast XGBoost library at least (https://github.com/stillmatic/arboreal) - it's on par with C implementations, but doing anything more complex is going to be tricky.

1 more reply

ramoz2y ago

I’ve also ran models in Go, transformers even T5. There wasn’t that much overhead maybe some annoying compilation stuff but nothing crazy.

This was tensorflow btw which has Go bindings support.

It is a smart & worthwhile move, we also needed to drop python for performance/cost gains.

1 more reply

liuliu2y ago

This seems to be a reasonable approach for Go, but you did need to carry a lot in your containerized environment (Go tends to have very lean container, and this approach requires a fat container with CUDA, PyTorch, Python etc).

softg2y ago

Slightly related dumb question, I saw on GitHub that TensorFlow has Java support. Does anyone actually use TensorFlow with Java?

genewitch2y ago

Aruba networks does

conradev2y ago

> Carton wraps your model with some metadata and puts it in a zip file

Why a zip file?

vpanyamOP2y ago

In addition to the benefits mentioned in the sibling comment, zip files let you seek to and access individual files in the archive without extracting all files (vs tar files for example).

This lets us do things like fetch model metadata [1] for a large remote model, by only fetching a few tiny byte ranges instead of the whole model archive.

It also means you can include sample data (images, etc) with your model and they're only fetched when necessary (for example with stable diffusion: https://carton.pub/stabilityai/sdxl)

[1] https://carton.run/docs/metadata

shoo2y ago

zip-file-as-a-container-format seems pragmatic: it's a way to bundle multiple files into one file (easier to manage than scattering multiple files), it avoids introducing a new proprietary format, it can optionally be compressed, support for reading and writing the container format is already widespread.

To give two examples of prior art, it worked for Quake 3 data files (.pk3) & geospatial data files (.kmz)

Maybe it's not the best choice but it doesn't seem like a bad one.

janalsncm2y ago

Also docx as well I believe.

capableweb2y ago

It's a fairly common way of bundling multiple files into one that has large support and usually "good enough" compression.

It's hardly revolutionary to do this, here are some common examples of things that are zip files but don't label themselves as such:

- .jar

- .odt, .ods, .odp, .docx, .xlsx, .pptx

- .epub

- .apk

- .crx, .xpi

otteromkram2y ago

"...run any machine learning model from any programming language*."

*As long as that language is python or rust.

What I think is that this is nothing more than a resume-bolstering effort that doesn't really need to exist and probably won't once OP lands a role at whatever FAANG company they're trying to impress.

smokel2y ago

Replying to this to explain the downvotes.

We all think this. My initial thought was that this is probably a startup selling PyTorch-as-a-Service, and I did not bother to read the article. It turns out that I was wrong, and this might even be useful -- if not for the implementation, then perhaps for the idea.

However, it turns out to make Hacker News a nicer space if we follow these guidelines:

> Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something.

otabdeveloper42y ago

It's not a shallow dismissal.

The selling point of this thing is cross-language interoperability, and while they advertise it, they don't deliver.

Sorry, but if your "any language" is "Python or Javascript" your project hasn't even reached the proof of concept stage, it's just a vague idea at this point.

Supporting C++ and C will be 90% of the work and the real challenge.

smokel2y ago

The shallow dismissal that I was referring to is:

> What I think is that this is nothing more than a resume-bolstering effort that doesn't really need to exist and probably won't once OP lands a role at whatever FAANG company they're trying to impress.

The title of the post might be click-bait, but there is an obvious asterisk on the homepage of Carton, and even at a quick glance it is obvious that only very few languages are supported. The claim is so obviously false, that I don't mind. I would not expect support for INTERCAL or Awk.

Yes, it does not deliver, but that does not warrant the personal attack. The author of Carton actually already had internships at Google and Facebook, and currently works at Uber.

capableweb2y ago

Maybe you and I have different understandings about what "Proof of Concept" means, but if you're supposed to deliver cross-language interoperability and have successfully delivered it to three different languages with wildly different runtimes, I think I'd consider that a successful proof of concept and since you're demonstrated that the bindings works for at least two other languages, it's more or less trivial to get it to work for N other languages, so this is clearly beyond the proof of concept stage at this point, and trying to reach a maturity stage instead.

3vidence2y ago

I gotta agree here, I don't think the process of porting this to a wide array of languages is trivial.

Additionally I would have some serious performance concerns when it comes to marshaling the data across languages boundaries.

chriscosma2y ago

OP has already worked at both Facebook and Google, it's doubtful they need any more resume-bolstering.

j / k navigate · click thread line to collapse

53 comments

brap2y ago

Just some random brain dump: Why limit to ML models?

mikeravkine2y ago

..isn't this just Docker?

brap2y ago

Well... yeah, kind of.

I guess some parts that are missing are having a schema for the CMD part, and being able to easily generate APIs for various languages from that schema

armchairhacker2y ago

Dynamic libraries, command-line executables, …

jcrash2y ago

So this means if I want to use a ML model I made in python, but don't want to code the rest of the application in python I can do that?

vpanyamOP2y ago

Yes, that's a use case Carton supports.

[1] https://carton.run/docs/packing/python

[2] https://carton.run/docs/loading

[3] https://carton.run/docs/packing/torchscript

ZeroCool2u2y ago

Seems almost too good to be true, but I really hope it's not. How does it handle things like CUDA dependencies? Can it somehow make those portable too? Or is GPU acceleration not quite there yet?

vpanyamOP2y ago

Thanks :)

It uses the NVIDIA drivers on your system, but it should be possible to make the rest of CUDA somewhat portable. I have a few thoughts on how to do this, but haven't gotten around to it yet.

jcrash2y ago

That’s awesome! Thanks for making this

jarym2y ago

Potentially this could be a lot better but I’d be curious what speed overhead the IPC layer adds. At least with ONNX you get a nice speed bump :)

civilitty2y ago

Any plans to support Windows? That would make Carton the ultimate library to embed LLMs into desktop applications

vpanyamOP2y ago

I'm definitely open to it if there's interest (or if someone wants to help), but I don't have plans to implement Windows support myself at the moment.

The currently supported platforms [1] were mostly driven by environments I've seen at various tech companies.

I do have active plans to support inference from WASM/WebGPU so maybe that could be a good entrypoint to Windows support.

[1] Currently, the supported platforms are:

* `x86_64` Linux and macOS

* `aarch64` Linux (e.g. Linux on AWS Graviton)

* `aarch64` macOS (e.g. M1 and M2 Apple Silicon chips)

* WebAssembly (metadata access only for now, but WebGPU runners are coming soon)

Nischalj102y ago

is this ancillary to what [these guys](https://github.com/unifyai/ivy) are trying to do?

carbocation2y ago

Nischalj102y ago

got it. went through both of the codebases. what you say is the case. thanks!

gorenb2y ago

This HN post looks really weird on mobile (no, not the website, HN itself)

astronautas2y ago

Is this the same as Nvidia's Triton?

capableweb2y ago

astronautas2y ago

True!

gemaif1li2y ago

When will you release a Java client?

carbocation2y ago

I'd love to see this for golang (even without GPU support).

Areibman2y ago

Maybe I'm missing something here, isn't this largely achieved by ONNX already?

[0] https://onnx.ai

vpanyamOP2y ago

That's a good question! There's an FAQ entry on the homepage that touches on this, but let me know if I can improve it:

> With that said, we plan to support ONNX models within Carton. This lets you use ONNX if you choose and it enables some interesting use cases (like running models in-browser with WASM).

More broadly, Carton can compose with other interesting technologies in ways ONNX isn't able to because ONNX is an inference engine while Carton is an abstraction layer.

WorldMaker2y ago

> This lets you use ONNX if you choose and it enables some interesting use cases (like running models in-browser with WASM)

If someone already has an ONNX model, there's already an in-browser capable ONNX runtime: https://onnxruntime.ai/docs/get-started/with-javascript.html...

(It does use some parts compiled to WASM under the hood, presumably for performance.)

Dayshine2y ago

ONNX runtime doesn't convert models, it runs them, and it has bindings in several languages. And most importantly it's tiny compared to the whole python package mess you get with TF or pytorch.

If carton took a TF/pytorch model and just dealt with the conversion into a real runtime, somehow using custom ops for the bits that don't convert, that would be amazing though.

ZeroCool2u2y ago

otabdeveloper42y ago

> From any [*] programming language.

[*] If "any programming language" is Python or Javascript.

liuliu2y ago

It basically runs with the promise that you can package CUDA / PyTorch / Python interpreter into the host language in some way, and use it.

otabdeveloper42y ago

It's gonna fall apart in a spectacular way when they try to marshal data across compiled language boundaries.

This is the actual hard problem in this domain, not packaging a model file in a zipfile.

astronautas2y ago

Make it for Go, and I am sold. Running ML models in Go services is still an unsolved problem.

r0l12y ago

astronautas2y ago

huac2y ago

1 more reply

ramoz2y ago

I’ve also ran models in Go, transformers even T5. There wasn’t that much overhead maybe some annoying compilation stuff but nothing crazy.

This was tensorflow btw which has Go bindings support.

It is a smart & worthwhile move, we also needed to drop python for performance/cost gains.

1 more reply

liuliu2y ago

softg2y ago

Slightly related dumb question, I saw on GitHub that TensorFlow has Java support. Does anyone actually use TensorFlow with Java?

genewitch2y ago

Aruba networks does

conradev2y ago

> Carton wraps your model with some metadata and puts it in a zip file

Why a zip file?

vpanyamOP2y ago

In addition to the benefits mentioned in the sibling comment, zip files let you seek to and access individual files in the archive without extracting all files (vs tar files for example).

This lets us do things like fetch model metadata [1] for a large remote model, by only fetching a few tiny byte ranges instead of the whole model archive.

It also means you can include sample data (images, etc) with your model and they're only fetched when necessary (for example with stable diffusion: https://carton.pub/stabilityai/sdxl)

[1] https://carton.run/docs/metadata

shoo2y ago

To give two examples of prior art, it worked for Quake 3 data files (.pk3) & geospatial data files (.kmz)

Maybe it's not the best choice but it doesn't seem like a bad one.

janalsncm2y ago

Also docx as well I believe.

capableweb2y ago

It's a fairly common way of bundling multiple files into one that has large support and usually "good enough" compression.

It's hardly revolutionary to do this, here are some common examples of things that are zip files but don't label themselves as such:

- .jar

- .odt, .ods, .odp, .docx, .xlsx, .pptx

- .epub

- .apk

- .crx, .xpi

otteromkram2y ago

"...run any machine learning model from any programming language*."

*As long as that language is python or rust.

smokel2y ago

Replying to this to explain the downvotes.

However, it turns out to make Hacker News a nicer space if we follow these guidelines:

> Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something.

otabdeveloper42y ago

It's not a shallow dismissal.

The selling point of this thing is cross-language interoperability, and while they advertise it, they don't deliver.

Sorry, but if your "any language" is "Python or Javascript" your project hasn't even reached the proof of concept stage, it's just a vague idea at this point.

Supporting C++ and C will be 90% of the work and the real challenge.

smokel2y ago

The shallow dismissal that I was referring to is:

Yes, it does not deliver, but that does not warrant the personal attack. The author of Carton actually already had internships at Google and Facebook, and currently works at Uber.

capableweb2y ago

3vidence2y ago

I gotta agree here, I don't think the process of porting this to a wide array of languages is trivial.

Additionally I would have some serious performance concerns when it comes to marshaling the data across languages boundaries.

chriscosma2y ago

OP has already worked at both Facebook and Google, it's doubtful they need any more resume-bolstering.

j / k navigate · click thread line to collapse