Using gRPC for (local) inter-process communication (2021) (opens in new tab)

(mpi-hd.mpg.de)

121 pointszardinality1y ago93 comments

93 comments

pjmlp1y ago

> Using a full-featured RPC framework for IPC seems like overkill when the processes run on the same machine.

That is exactly what COM/WinRT, XPC, Android Binder, D-BUS are.

Naturally they have several optimisations for local execution.

charleslmunger1y ago

You can run grpc over binder:

https://github.com/grpc/grpc-java/blob/master/binder/src/mai...

The overhead is low, and you get best practices like oneway calls and avoiding the transaction limit for free. It also comes with built in security policies for servers and clients.

jeffbee1y ago

Binder seriously underappreciated, IMHO. But I think it makes sense to use gRPC or something like it if there is any possibility that in the future an "IPC" will become an "RPC" to a foreign host. You don't want to be stuck trying to change an IPC into an RPC if it was foreseeable that it would eventually become remote due to scale.

pjmlp1y ago

Kind of, as anyone with CORBA, DCOM, RMI, .NET Remoting experience has plenty of war stories regarding distributed computing with the expectations of local calls.

dietr1ch1y ago

In my mind the abstraction should allow for RPCs and being on the same machine should allow to optimise things a bit, this way you simply build for the general case and lose little to no performance.

Think of the loopback, my programs don't know (or at least shouldn't know) that IPs like 127.0.0.5 are special, but then the kernel knows that messages there are not going to go on any wire and handles that differently.

mgsouth1y ago

One does not simply walk into RPC country. Communication modes are architectual decisions, and they flavor everything. There's as much difference between IPC and RPC as there is between popping open a chat window to ask a question, and writing a letter on paper and mailing it. In both cases you can pretend they're equivalent, and it will work after a fashion, but your local communication will be vastly more inefficient and bogged down in minutia, and your remote comms will be plagued with odd and hard-to-diagnose bottlenecks and failures.

Some generalities:

Function call: The developer just calls it. Blocks until completion, errors are due to bad parameters or a resource availability problem. They are handled with exceptions or return-code checks. Tests are also simple function calls. Operationally everything is, to borrow a phrase from aviation regarding non-retractable landing gear, "down and welded".

IPC: Architectually, and as a developer, you start worrying about your function as a resource. Is the IPC recipient running? It's possible it's not; that's probably treated as fatal and your code just returns an error to its caller. You're more likely to have a m:n pairing between caller and callee instances, so requests will go into a queue. Your code may still block, but with a timeout, which will be a fatal error. Or you might treat it as a co-routine, with the extra headaches of deferred errors. You probably won't do retries. Testing has some more headaches, with IPC resource initialization and tear-down. You'll have to test queue failures. Operations is also a bit more involved, with an additional resource that needs to be baby-sat, and co-ordinated with multiple consumers.

RPC: IPC headaches, but now you need to worry about lost messages, and messages processed but the acknowledgements were lost. Temporary failures need to be faced and re-tried. You will need to think in terms of "best effort", and continually make decisions about how that is managed. You'll be dealing with issues such as at-least-once delivery vs. at-most-once. Consistency issues will need to be addressed much more than with IPC, and they will be thornier problems. Resource availability awareness will seep into everything; application-level back-pressure measures _should_ be built-in. Treating RPC as simple blocking calls will be a continual temptation; if you or less-enlightened team members subcumb then you'll have all kinds of flaky issues. Emergent, system-wide behavior will rear its ugly head, and it will involve counter-intuitive interactions (such as bigger buffers reducing throughput). Testing now involves three non-trivial parts--your code, the called code, and the communications mechanisms. Operations gets to play with all kinds of fun toys to deploy, monitor, and balance usage.

bunderbunder1y ago

This could possibly even be dead simple to accomplish if application-level semantics aren't being communicated by co-opting parts of the communication channel's spec.

I think that this factor might be the ultimate source of my discomfort with standards like REST. Things like using HTTP verbs and status codes, and encoding parameters into the request's URL, mean that there's almost not even an option to choose a communication channel that's lighter-weight than HTTP.

imtringued1y ago

Nobody does REST, because nobody needs the whole hateoas shtick. When people say REST, they mean "HTTP API" and I'm not being pendantic here. The difference is very real because REST doesn't really have a reason to exist.

1 more reply

loeg1y ago

Binder is totally unavailable outside of Android, right? IIRC it's pretty closely coupled to Android's security model and isn't a great fit outside of that ecosystem.

p_l1y ago

It's usable outside Android, it's more that there's "some assembly required" involved - you need essentially a nameserver process which also handles permissions and the main one is pretty much the one in android. Also Binder itself doesn't really specify what data it exchanges around, the userland on Android uses one based on BeOS (which is where Binder comes from) but also has at least one other variant used for drivers.

merb1y ago

Btw. Modern windows also superports Unix domain sockets, so if you have an app that has another service that will run on the same machine or on a different one it is not so bad to use grpc over uds.

pjmlp1y ago

Nice idea, although it is still slower than COM.

COM can run over the network (DCOM), inside the same computer on its own process (out-proc), inside the client (in-proc), designed for in-proc but running as out-proc (COM host).

So for max performance, with the caveat of possibly damaging the host, in-proc will do it, and be faster than any kind of sockets.

tgma1y ago

> COM can run over the network (DCOM)

Ah the good ol' Blaster worm...

https://en.wikipedia.org/wiki/Blaster_(computer_worm)

merb1y ago

Well if you need ipc to connect different languages, you will stay away from COM. Heck once you use anything but rust,c or c++ you should drop com. Even dotnet support for com is aweful. And if you ever written a outlook addin, than you will start hating com by yourself, thanks to god that Microsoft is going away from that

2 more replies

CharlieDigital1y ago

ZeroMQ is kinda nice for this; just the right level of thin abstraction.

jauntywundrkind1y ago

I super dug the talk Building SpiceDB: A gRPC-First Database - Jimmy Zelinskie, authzed which is about a high-performance auth system, which talks to this. https://youtu.be/1PiknT36218

It's a 4-tier arhcitecture (clients - front end service - query service - database) auth system, and all communication is over grpc (except to the database). Jimmy talks about the advantages of having a very clear contract between systems.

There's a ton of really great nitty gritty detail about being super fast with gRPC. https://github.com/planetscale/vtprotobuf for statical-size allocating protobuf rather than slow reflection-based dynamic size. Upcoming memory pooling work to avoid allocations at all. Tons of advantages for observability right out of the box. It's subtle but I also get the impression most gRPC stubs are miserably bad, that Authzed had to go long and far to get away from a lot of gRPC tarpits.

This is one of my favorite talks from 2024, and strongly sold me.on how viable gRPC is for internal services. Even if I were doing local multi-process stuff, I would definitely consider gRPC after this talk. The structure & clarity & observability are huge wins, and the performance can be really good if you need it.

https://youtu.be/1PiknT36218#t=12m 12min is the internal cluster details.

jzelinskie1y ago

Thank you! That really means a lot!

>It's subtle but I also get the impression most gRPC stubs are miserably bad, that Authzed had to go long and far to get away from a lot of gRPC tarpits.

They aren't terrible, but they also aren't a user experience you want to deliver directly to your customers.

palata1y ago

I have been in a similar situation, and gRPC feels heavy. It comes with quite a few dependencies (nothing compared to npm or cargo systems routinely bringing hundreds of course, but enough to be annoying when you have to cross-compile them). Also at first it sounds like you will benefit from all the languages that protobuf supports, but in practice it's not that perfect: some python package may rely on the C++ implementation, and therefore you need to compile it for your specific platform. Some language implementations are just maintain by one person in their free time (a great person, but still), etc.

On the other hand, I really like the design of Cap'n Proto, and the library is more lightweight (and hence easier) to compile. But there, it is not clear on which language implementation you can rely other than C++. Also it feels like there are maintainers paid by Google for gRPC, and for Cap'n Proto it's not so clear: it feels like it's essentially Cloudflare employees improving Cap'n Proto for Cloudflare. So if it works perfectly for your use-case, that's great, but I wouldn't expect much support.

All that to say: my preferred choice for that would technically be Cap'n Proto, but I wouldn't dare making my company depend on it. Whereas nobody can fire me for depending on Google, I suppose.

kentonv1y ago

> it feels like it's essentially Cloudflare employees improving Cap'n Proto for Cloudflare.

That's correct. At present, it is not anyone's objective to make Cap'n Proto appeal to a mass market. Instead, we maintain it for our specific use cases in Cloudflare. Hopefully it's useful to others too, but if you choose to use it, you should expect that if any changes are needed for your use case, you will have to make those changes yourself. I certainly understand why most people would shy away from that.

With that said, gRPC is arguably weird in its own way. I think most people assume that gRPC is what Google is built on, therefore it must be good. But it actually isn't -- internally, Google uses Stubby. gRPC is inspired by Stubby, but very different in implementation. So, who exactly is gRPC's target audience? What makes Google feel it's worthwhile to have 40ish(?) people working on an open source project that they don't actually use much themselves? Honest questions -- I don't know the answer, but I'd like to.

(FWIW, the story is a bit different with Protobuf. The Protobuf code is the same code Google uses internally.)

(I am the author of Cap'n Proto and also was the one who open sourced Protobuf originally at Google.)

mgsouth1y ago

My most vivid gRPC experience is from 10 years or so ago, so things have probably changed. We were heavily Go and micro-services. Switched from, IIRC, protobuf over HTTP, to gRPC "as it was meant to be used." Ran into a weird, flaky bug--after a while we'd start getting transaction timeouts. Most stuff would get through, but errors would build and eventually choke everything.

I finally figured out it was a problem with specific pairs of servers. Server A could talk to C, and D, but would timeout talking to B. The gRPC call just... wouldn't.

One good thing is you do have the source to everything. After much digging through amazingly opaque code, it became clear there was a problem with a feature we didn't even need. If there are multiple sub-channels between servers A and B. gRPC will bundle them into one connection. It also provides protocol-level in-flight flow limits, both for individual sub-channels and the combined A-B bundle. It does it by using "credits". Every time a message is sent from A to B it decrements the available credit limit for the sub-channel, and decrements another limit for the bundle as a whole. When the message is processed by the recipient process then the credit is added back to the sub-channel and bundle limits. Out of credits? Then you'll have to wait.

The problem was that failed transactions were not credited back. Failures included processing time-outs. With time-outs the sub-channel would be terminated, so that wasn't a problem. The issue was with the bundle. The protocol spec was (is?) silent as to who owned the credits for the bundle, and who was responsible for crediting them back in failure cases. The gRPC code for Go, at the time, didn't seem to have been written or maintained by Google's most-experienced team (an intern, maybe?), and this was simply dropped. The result was the bundle got clogged, and A and B couldn't talk. Comm-level backpressure wasn't doing us any good (we needed full app-level), so for several years we'd just patch new Go libraries and disable it.

tgma1y ago

While you are correct the majority of Google services internally are Stubby-based, it isn't correct to say gRPC is not utilized inside Google. Cloud and external APIs are an obvious use case but also internal stuff that are also used in open source world use gRPC even when deployed internally. TensorFlow etc comes to mind...

So "not used much" at Google scale probably justifies 40 people.

tgma1y ago

> I think most people assume that gRPC is what Google is built on, therefore it must be good.

Dropbox[1], Netflix[2], Apple[3], Microsoft[4], Slack[5], and lots more are all either built on, or heavily use, and/or contribute to various pieces of gRPC and the ecosystem around it.

[1]: https://dropbox.tech/infrastructure/courier-dropbox-migratio...

[2]: https://netflixtechblog.com/practical-api-design-at-netflix-...

[3]: https://github.com/grpc/grpc-swift

[4]: https://learn.microsoft.com/en-us/aspnet/core/grpc/?view=asp...

[5]: https://slack.engineering/how-big-technical-changes-happen-a...

jsnell1y ago

> What makes Google feel it's worthwhile to have 40ish(?) people working on an open source project that they don't actually use much themselves? Honest questions -- I don't know the answer, but I'd like to.

It's at least used for the public Google Cloud APIs. That by itself guarantees a rather large scale, whether they use gRPC in prod or not.

nicksnyder1y ago

If you are looking for a lightweight Protobuf based RPC framework, check out https://connectrpc.com/. It is part of the CNCF and is used and supported by multiple companies: https://buf.build/blog/connect-rpc-joins-cncf

gRPC ships with its own networking stack, which is one reason why those libs are heavy. Connect libraries leverage each ecosystem's native networking stack (e.g. net/http in Go, NSURLSession in Swift, etc.), which means any other libraries that work with the standard networking stack interop well with Connect.

1 more reply

zackangelo1y ago

Had to reach for a new IPC mechanism recently to implement a multi-GPU LLM inference server.

My original implementation just pinned one GPU to its own thread then used message passing between them in the same process but Nvidia's NCCL library hates this for reasons I haven't fully figured out yet.

I considered gRPC for IPC since I was already using it for the server's API but dismissed it because it was an order of magnitude slower and I didn't want to drag async into the child PIDs.

Serializing the tensors between processes and using the Servo team's ipc-channel crate[0] has worked surprisingly well. If you're using Rust and need a drop-in (ish) replacement for the standard library's channels, give it a shot.

[0] https://github.com/servo/ipc-channel

DashAnimal1y ago

What I loved about Fuchsia was its IPC interface, using FIDL which is like a more optimized version of protobufs.

https://fuchsia.dev/fuchsia-src/get-started/learn/fidl/fidl

DabbyDabberson1y ago

loved - in the past tense?

solarpunk1y ago

fuchsia was pretty deeply impacted by google layoffs iirc

mdhb1y ago

I was just looking at the repo this week. It’s under heavy active development. It’s just intentionally not talked about much at the moment.

HumblyTossed1y ago

> Using a full-featured RPC framework for IPC seems like overkill when the processes run on the same machine. However, if your project anyway exposes RPCs for public APIs or would benefit from a schema-based serialisation layer it makes sense to use only one tool that combines these—also for IPC.

It might make sense. Usually, if you're using IPC, you need it to be as fast as possible and there are several solutions that are much faster.

dmoy1y ago

I tend to agree. Usually you want as fast as possible. Sometimes you don't though.

E.g. Kythe (kythe.io) was designed so that its individual language indexers run with a main driver binary written in Go, and then a subprocess binary written in.... whatever. There's a requirement to talk between the two, but it's not really a lot of traffic (relative to e.g. the CPU cost of the subprocess doing compilation).

So what happens in practice is that we used Stubby (like gRPC, except not public), because it was low overhead* to write the handler code for it on both ends, and got us some free other bits as well.

* Except when it wasn't lol. It worked great for the first N languages written in langs with good stubby support. But then weird shit (for Google) crawled out of the weeds that didn't have stubby support, so there's some handwaving going on for the long tail.

monocasa1y ago

I'm not even sure that I'd say usually. Most of the time you're just saying "hey daemon, do this thing that you're already preconfigured for".

ks20481y ago

What are the other solutions that are much faster? (besides rolling your own mini format).

HumblyTossed1y ago

> What are the other solutions that are much faster?

Why ask the question, then

> (besides rolling your own mini format)

box it in?

Do you think I'm saying using an RPC is bad? I'm not. I simply took issue with the way the article was worded.

The thing about engineering is you don't want to do what some blogger says is best. You want to analyze YOUR particular needs and design your system from that. So, with every properly engineered solution there are trade-offs. You want easy? It may run slower. You want faster? You may have to roll your own.

lyu072821y ago

Cool that they mention buf, it's such a massive improvement over Google's own half abandoned crappy protobuf implementation

https://github.com/bufbuild/buf

anilakar1y ago

In addition to cloud connectivity, we've been using MQTT for IPC in our Linux IIoT gateways and touchscreen terminals and honestly it's been one of the better architectural decisions we've made. Implementing new components for specific customer use cases could not be easier and the component can be easily placed on the hardware or on cloud servers wherever it fits best.

I don't see how gRPC could be any worse than that.

(The previous iteration before MQTT used HTTP polling and callbacks worked on top of an SSH reverse tunnel abomination. Using MQTT for IPC was kind of an afterthought. The SSH Cthulhu is still in use for everyday remote management because you cannot do Ansible over MQTT, but we're slowly replacing it with Wireguard. I gotta admit that out of all VPN technologies we've experimented with, SSH transport has been the most reliable one in various hostile firewalled environments.)

goalieca1y ago

MQTT also lends itself to async very well. The event based approach is a real winner.

DanielHB1y ago

Trying to understand an existing IoT codebase is scarier than reading Lovecraft.

Out of curiosity, why were you using SSH reverse tunnel for IPC? Were you using virtualization inside your iot device and for some reason need a tunnel between the guests?

anilakar1y ago

Reverse tunnel SSH was used for remote management. Once we replaced it with MQTT, we realized we could use it for IPC, too. I was not too clear about this in the post above.

Before MQTT we used whatever we had at hand; the touchscreen frontend (that ran Qt, not an embedded browser) talked to the local backend with HTTP. SMS messages were first sent to another local HTTP gateway and then via UNIX shm to a master process that templated the message to the smstools3 spool directory. A customer-facing remote management GUI ran on the device's built-in HTTP server but I no longer remember how it interfaced with the rest of the system. The touchscreen GUI might actually have had its own HTTP server for that, because it also handled Modbus communication to external I/O.

This was back in the day when everyone was building their own IoT platforms and obviously we were required to dogfood our own product, trying to make it do things it was never meant for. The gateway was originally designed to be an easy solution for AV integrators to remotely manage whole campuses from a single GUI. I'm pretty sure no customer ever wrote a line of code on the platform.

DanielHB1y ago

Ah okay, that makes more sense. We used wireguard for remote management and we never got into MQTT (I wish we had).

DanielHB1y ago

At a project I worked at we were considering using protobuf for IPC between our desktop app and our network framework code which used different languages.

The performance was part of the reason (compared to serializing using JSON) but the main reason was just tooling support for automatic type checking. gRPC can generate types from a schema for all popular languages out there.

We ended up taking another route but I feel it is important to consider the existing tools ahead of any performance concerns for most cases

up2isomorphism1y ago

This the typical situation where : “you can do it” doesn’t mean you should do it, and there is very little sense to advertise it, if at all.

justinsaccount1y ago

> In our scenario of local IPC, some obvious tuning options exist: data is exchanged via a Unix domain socket (unix:// address) instead of a TCP socket

AFAIK at least on linux there is no difference between using a UDS and a tcp socket connected to localhost.

sgtnoodle1y ago

There's definitely differences, whether or not it matters for most usages. I've worked on several IPC mechanisms that specifically benefited from one vs. the other.

pengaru1y ago

There's a mountain of grpc-centric python code at $dayjob and it's been miserable to live with. Maybe it's less awful in c/c++, or at least confers some decent performance there. In python it's hot garbage.

andy_ppp1y ago

Strongly agree, it’s has loads of problems, my least favourite being the schema is not checked in the way you might think, there’s not even a checksum to say this message and this version of the schema match. So when there’s old services/clients around and people haven’t versioned their schema’s safely (there was no mechanism for this apart from manually checking in PRs) you can get gibberish back for fields that should contain data. It’s basically just a binary blob with whatever schema the client has overlaid so debugging is an absolute pain. Unless you are Google scale use a text based format like JSON and save yourself a lot of hassle.

jeffbee1y ago

There is an art to having forwards and backwards compatible RPC schemas. It is easy, but it is surprisingly difficult to get people to follow easy rules. The rules are as follows:

  1) Never change the type of a field
  2) Never change the semantic meaning of a field
  3) If you need a different type or semantics, add a new field

Pretty simple if you ask me.

andy_ppp1y ago

If I got to choose my colleagues this would be fine, unfortunately I had people who couldn’t understand eventual consistency. One of the guys writing Go admitted he didn’t understand what a pointer was etc. etc.

1 more reply

jayd161y ago

You can trivially make breaking changes in a JSON blob too. GRPC has well documented ways to make non-breaking changes. If you're working somewhere where breaking schema changes go in with little fanfare and much debugging then I'm not sure JSON will save you.

The only way to know is to dig through CLs? Write a test.

There's also automated tooling to compare protobuff schemas for breaking changes.

andy_ppp1y ago

JSON contains a description of the structure of the data that is readable by both machines and humans. JSON can certainly go wrong but it’s much simpler to see when it has because of this. GRPC is usually a binary black box that adds loads of developer time to upskill, debug, figure out error cases and introduces whole new classes of potential bugs.

If you are building something that needs binary performance that GRPC provides, go for it, but pretending there is no extra cost over doing the obvious thing is not true.

1 more reply

discreteevent1y ago

- JSON doesn't have any schema checking either.

- You can encode the protocol buffers as JSON if you want a text based format.

seanw4441y ago

I'm using it for a small-to-medium sized project, and the generated files aren't too bad to work with at that scale. The actual generation of the files is very awful for Python specifically, though, and I've had to write a script to bandaid fix them after they're generated. An issue has been open for this for years on the protobuf compiler repo, and it's basically a "wontfix" as Google doesn't need it fixed for their internal use. Which is... fine I guess.

The Go part I'm building has been much more solid in contrast.

lima1y ago

I guess you're talking about the relative vs. absolute import paths?

This solves it: https://github.com/cpcloud/protoletariat

seanw4441y ago

Yeah, my script works perfectly fine though, without pulling in another dependency. The point is that this shouldn't be necessary. It feels wrong.

eqvinox1y ago

It's equally painful in C, you have to wrap the C++ library :(

ryanisnan1y ago

Can you say more about what the pain points are?

cherryteastain1y ago

C++ generated code from protobuf/grpc is pretty awful in my experience.

bluGill1y ago

Do you need to look at that generated code though? I haven't used gRPC yet (some poor historical decisions mean I can't use it in my production code so I'm not in a hurry - architecture is rethinking those decisions in hopes that we can start using it so ask me in 5 years what I think). My experience with other generated code is that it is not readable but you never read it so who cares - instead you just trust the interface which is easy enough (or is terrible and not fixable)

cherryteastain1y ago

I meant the interfaces are horrible. As you said, as long as it has a good interface and good performance, I wouldn't mind.

For example, here's the official tutorial for using the async callback interfaces in gRPC: https://grpc.io/docs/languages/cpp/callback/

It encourages you to write code with practices that are quite universally considered bad in modern C++ due to a very high chance of introducing memory bugs, such as allocating objects with new and expecting them to clean themselves up via delete this;. Idiomatic modern C++ would be using smart pointers, or go a completely different route with co-routines and no heap-allocated objects.

1 more reply

jeffbee1y ago

Interesting that it is taken on faith that unix sockets are faster than inet sockets.

eqvinox1y ago

That's because it's logical that implementing network capable segmentation and flow control is more costly than just moving data with internal, native structures. And looking up random benchmarks yields anything from equal performance to 10x faster for Unix domain.

bluGill1y ago

It wouldn't surprise me if inet sockets were more optimized though and so unix sockets ended up slower anyway just because nobody has bothered to make them good (which is probably why some of your benchmarks show equal performance). Benchmarks are important.

sgtnoodle1y ago

I've spent several years optimizing a specialized IPC mechanism for a work project. I've spent time reviewing the Linux Kernel's unix socket source code to understand obscure edge cases. There isn't really much to optimize - it's just copying bytes between buffers. Most of the complexity of the code has to do with permissions and implementing the ability to send file descriptors. All my benchmarks have unambiguously showed unix sockets to be more performant than loopback TCP for my particular use case.

eqvinox1y ago

I agree, but practically speaking they're used en masse all across the field and people did bother to make them good [enough]. I suspect the benchmarks where they come up equal are cases where things are limited by other factors (e.g. syscall overhead), though I don't want to make unfounded accusations :)

yetanotherdood1y ago

Unix Domain Sockets are the standard mechanism for app->sidecar communication at Google (ex: Talking to the TI envelope for logging etc.)

jeffbee1y ago

Search around on Google Docs for my 2018 treatise/rant about how the TI Envelope was the least-efficient program anyone had ever deployed at Google.

eqvinox1y ago

Ok, now it sounds like you're blaming unix sockets for someone's shitty code...

No idea what "TI Envelope" is, and a Google search doesn't come up with usable results (oh the irony...) - if it's a logging/metric thing, those are hard to get to perform well regardless of socket type. We ended up using batching with mmap'd buffers for crash analysis. (I.e. the mmap part only comes in if the process terminates abnormally, so we can recover batched unwritten bits.)

1 more reply

yetanotherdood1y ago

I'm a xoogler so I don't have access. Do you have a TL;DR that you can share here (for non-Googlers)?

ithkuil1y ago

servo's Ipc-channel doesn't use Unix domain sockets to move data. It uses it to share a memfd file descriptor effectively creating a memory buffer shared between two processes

dangoodmanUT1y ago

Are there resources suggesting otherwise?

pjmlp1y ago

As often in computing, profiling is a foreign word.

aoeusnth11y ago

Tell me more, I know nothing about IPC

kats1y ago

Congrats and welcome to the 80s!

j / k navigate · click thread line to collapse

93 comments

pjmlp1y ago

> Using a full-featured RPC framework for IPC seems like overkill when the processes run on the same machine.

That is exactly what COM/WinRT, XPC, Android Binder, D-BUS are.

Naturally they have several optimisations for local execution.

charleslmunger1y ago

You can run grpc over binder:

https://github.com/grpc/grpc-java/blob/master/binder/src/mai...

The overhead is low, and you get best practices like oneway calls and avoiding the transaction limit for free. It also comes with built in security policies for servers and clients.

jeffbee1y ago

pjmlp1y ago

Kind of, as anyone with CORBA, DCOM, RMI, .NET Remoting experience has plenty of war stories regarding distributed computing with the expectations of local calls.

dietr1ch1y ago

In my mind the abstraction should allow for RPCs and being on the same machine should allow to optimise things a bit, this way you simply build for the general case and lose little to no performance.

mgsouth1y ago

Some generalities:

bunderbunder1y ago

This could possibly even be dead simple to accomplish if application-level semantics aren't being communicated by co-opting parts of the communication channel's spec.

imtringued1y ago

1 more reply

loeg1y ago

Binder is totally unavailable outside of Android, right? IIRC it's pretty closely coupled to Android's security model and isn't a great fit outside of that ecosystem.

p_l1y ago

merb1y ago

Btw. Modern windows also superports Unix domain sockets, so if you have an app that has another service that will run on the same machine or on a different one it is not so bad to use grpc over uds.

pjmlp1y ago

Nice idea, although it is still slower than COM.

COM can run over the network (DCOM), inside the same computer on its own process (out-proc), inside the client (in-proc), designed for in-proc but running as out-proc (COM host).

So for max performance, with the caveat of possibly damaging the host, in-proc will do it, and be faster than any kind of sockets.

tgma1y ago

> COM can run over the network (DCOM)

Ah the good ol' Blaster worm...

https://en.wikipedia.org/wiki/Blaster_(computer_worm)

merb1y ago

2 more replies

CharlieDigital1y ago

ZeroMQ is kinda nice for this; just the right level of thin abstraction.

jauntywundrkind1y ago

I super dug the talk Building SpiceDB: A gRPC-First Database - Jimmy Zelinskie, authzed which is about a high-performance auth system, which talks to this. https://youtu.be/1PiknT36218

https://youtu.be/1PiknT36218#t=12m 12min is the internal cluster details.

jzelinskie1y ago

Thank you! That really means a lot!

>It's subtle but I also get the impression most gRPC stubs are miserably bad, that Authzed had to go long and far to get away from a lot of gRPC tarpits.

They aren't terrible, but they also aren't a user experience you want to deliver directly to your customers.

palata1y ago

All that to say: my preferred choice for that would technically be Cap'n Proto, but I wouldn't dare making my company depend on it. Whereas nobody can fire me for depending on Google, I suppose.

kentonv1y ago

> it feels like it's essentially Cloudflare employees improving Cap'n Proto for Cloudflare.

(FWIW, the story is a bit different with Protobuf. The Protobuf code is the same code Google uses internally.)

(I am the author of Cap'n Proto and also was the one who open sourced Protobuf originally at Google.)

mgsouth1y ago

I finally figured out it was a problem with specific pairs of servers. Server A could talk to C, and D, but would timeout talking to B. The gRPC call just... wouldn't.

tgma1y ago

So "not used much" at Google scale probably justifies 40 people.

tgma1y ago

> I think most people assume that gRPC is what Google is built on, therefore it must be good.

Dropbox[1], Netflix[2], Apple[3], Microsoft[4], Slack[5], and lots more are all either built on, or heavily use, and/or contribute to various pieces of gRPC and the ecosystem around it.

[1]: https://dropbox.tech/infrastructure/courier-dropbox-migratio...

[2]: https://netflixtechblog.com/practical-api-design-at-netflix-...

[3]: https://github.com/grpc/grpc-swift

[4]: https://learn.microsoft.com/en-us/aspnet/core/grpc/?view=asp...

[5]: https://slack.engineering/how-big-technical-changes-happen-a...

jsnell1y ago

It's at least used for the public Google Cloud APIs. That by itself guarantees a rather large scale, whether they use gRPC in prod or not.

nicksnyder1y ago

1 more reply

zackangelo1y ago

Had to reach for a new IPC mechanism recently to implement a multi-GPU LLM inference server.

I considered gRPC for IPC since I was already using it for the server's API but dismissed it because it was an order of magnitude slower and I didn't want to drag async into the child PIDs.

[0] https://github.com/servo/ipc-channel

DashAnimal1y ago

What I loved about Fuchsia was its IPC interface, using FIDL which is like a more optimized version of protobufs.

https://fuchsia.dev/fuchsia-src/get-started/learn/fidl/fidl

DabbyDabberson1y ago

loved - in the past tense?

solarpunk1y ago

fuchsia was pretty deeply impacted by google layoffs iirc

mdhb1y ago

I was just looking at the repo this week. It’s under heavy active development. It’s just intentionally not talked about much at the moment.

HumblyTossed1y ago

It might make sense. Usually, if you're using IPC, you need it to be as fast as possible and there are several solutions that are much faster.

dmoy1y ago

I tend to agree. Usually you want as fast as possible. Sometimes you don't though.

So what happens in practice is that we used Stubby (like gRPC, except not public), because it was low overhead* to write the handler code for it on both ends, and got us some free other bits as well.

monocasa1y ago

I'm not even sure that I'd say usually. Most of the time you're just saying "hey daemon, do this thing that you're already preconfigured for".

ks20481y ago

What are the other solutions that are much faster? (besides rolling your own mini format).

HumblyTossed1y ago

> What are the other solutions that are much faster?

Why ask the question, then

> (besides rolling your own mini format)

box it in?

Do you think I'm saying using an RPC is bad? I'm not. I simply took issue with the way the article was worded.

lyu072821y ago

Cool that they mention buf, it's such a massive improvement over Google's own half abandoned crappy protobuf implementation

https://github.com/bufbuild/buf

anilakar1y ago

I don't see how gRPC could be any worse than that.

goalieca1y ago

MQTT also lends itself to async very well. The event based approach is a real winner.

DanielHB1y ago

Trying to understand an existing IoT codebase is scarier than reading Lovecraft.

Out of curiosity, why were you using SSH reverse tunnel for IPC? Were you using virtualization inside your iot device and for some reason need a tunnel between the guests?

anilakar1y ago

Reverse tunnel SSH was used for remote management. Once we replaced it with MQTT, we realized we could use it for IPC, too. I was not too clear about this in the post above.

DanielHB1y ago

Ah okay, that makes more sense. We used wireguard for remote management and we never got into MQTT (I wish we had).

DanielHB1y ago

At a project I worked at we were considering using protobuf for IPC between our desktop app and our network framework code which used different languages.

We ended up taking another route but I feel it is important to consider the existing tools ahead of any performance concerns for most cases

up2isomorphism1y ago

This the typical situation where : “you can do it” doesn’t mean you should do it, and there is very little sense to advertise it, if at all.

justinsaccount1y ago

> In our scenario of local IPC, some obvious tuning options exist: data is exchanged via a Unix domain socket (unix:// address) instead of a TCP socket

AFAIK at least on linux there is no difference between using a UDS and a tcp socket connected to localhost.

sgtnoodle1y ago

There's definitely differences, whether or not it matters for most usages. I've worked on several IPC mechanisms that specifically benefited from one vs. the other.

pengaru1y ago

andy_ppp1y ago

jeffbee1y ago

There is an art to having forwards and backwards compatible RPC schemas. It is easy, but it is surprisingly difficult to get people to follow easy rules. The rules are as follows:

  1) Never change the type of a field
  2) Never change the semantic meaning of a field
  3) If you need a different type or semantics, add a new field

Pretty simple if you ask me.

andy_ppp1y ago

1 more reply

jayd161y ago

The only way to know is to dig through CLs? Write a test.

There's also automated tooling to compare protobuff schemas for breaking changes.

andy_ppp1y ago

If you are building something that needs binary performance that GRPC provides, go for it, but pretending there is no extra cost over doing the obvious thing is not true.

1 more reply

discreteevent1y ago

- JSON doesn't have any schema checking either.

- You can encode the protocol buffers as JSON if you want a text based format.

seanw4441y ago

The Go part I'm building has been much more solid in contrast.

lima1y ago

I guess you're talking about the relative vs. absolute import paths?

This solves it: https://github.com/cpcloud/protoletariat

seanw4441y ago

Yeah, my script works perfectly fine though, without pulling in another dependency. The point is that this shouldn't be necessary. It feels wrong.

eqvinox1y ago

It's equally painful in C, you have to wrap the C++ library :(

ryanisnan1y ago

Can you say more about what the pain points are?

cherryteastain1y ago

C++ generated code from protobuf/grpc is pretty awful in my experience.

bluGill1y ago

cherryteastain1y ago

I meant the interfaces are horrible. As you said, as long as it has a good interface and good performance, I wouldn't mind.

For example, here's the official tutorial for using the async callback interfaces in gRPC: https://grpc.io/docs/languages/cpp/callback/

1 more reply

jeffbee1y ago

Interesting that it is taken on faith that unix sockets are faster than inet sockets.

eqvinox1y ago

bluGill1y ago

sgtnoodle1y ago

eqvinox1y ago

yetanotherdood1y ago

Unix Domain Sockets are the standard mechanism for app->sidecar communication at Google (ex: Talking to the TI envelope for logging etc.)

jeffbee1y ago

Search around on Google Docs for my 2018 treatise/rant about how the TI Envelope was the least-efficient program anyone had ever deployed at Google.

eqvinox1y ago

Ok, now it sounds like you're blaming unix sockets for someone's shitty code...

1 more reply

yetanotherdood1y ago

I'm a xoogler so I don't have access. Do you have a TL;DR that you can share here (for non-Googlers)?

ithkuil1y ago

servo's Ipc-channel doesn't use Unix domain sockets to move data. It uses it to share a memfd file descriptor effectively creating a memory buffer shared between two processes

dangoodmanUT1y ago

Are there resources suggesting otherwise?

pjmlp1y ago

As often in computing, profiling is a foreign word.

aoeusnth11y ago

Tell me more, I know nothing about IPC

kats1y ago

Congrats and welcome to the 80s!

j / k navigate · click thread line to collapse