DeepFlow – open-source eBPF Distributed Tracing (opens in new tab)

(deepflow.io)

172 pointsshlosky2y ago18 comments

18 comments

Nice project, have been following this project casually for a while. The standout feature is to trace RPC flow across network connections, through packet tracing.

nimrody2y ago

How can it tie requests arriving at a service and generating additional downstream requests?

Distributed tracing needs some common token all requests share to identify all RPCs that should be associated with a specific incoming request.

sharangxy2y ago

VP of DeepFlow here. Thank you for your interest in DeepFlow!

Yes, we have implemented distributed tracing using eBPF. In simple terms, we use thread-id, coroutine-id, and tcp-seq to automatically correlate all spans. Most importantly, we use eBPF to calculate a syscall-trace-id (without the need to propagate it between upstream and downstream), enabling automatic correlation of a service's ingress and egress requests. For more details, you can refer to our paper presented at SIGCOMM'23: https://dl.acm.org/doi/10.1145/3603269.3604823.

Of course, this kind of Zero Code distributed tracing currently has some limitations. For specific details, please see: https://deepflow.io/docs/features/distributed-tracing/auto-t...

These limitations are not entirely insurmountable. We are actively working on resolving them and continually making breakthroughs.

1 more reply

Eridrus2y ago

It looks like it depends on applications either using threads or go routines for concurrency:

> When collecting invocation logs through eBPF and cBPF, DeepFlow calculates information such as syscall_trace_id, thread_id, goroutine_id, cap_seq, tcp_seq based on the system call context. This allows for distributed tracing without modifying application code or injecting TraceID and SpanID. Currently, DeepFlow can achieve Zero Code distributed tracing for all cases except for cross-thread communication (through memory queues or channels) and asynchronous invocations.

archivator2y ago

Take a look at Core Feature #2 in this post - https://deepflow.io/ebpf-the-key-technology-to-observability...

It looks like it's using tcp flow tuple + tcp_seq to join things.

javierhonduco2y ago

Haven’t checked the source code yet, wondering if profiling of code without frame pointes is supported. Curious on their approach.

reactordev2y ago

It uses eBPF to provide instrumentation of the kernel calls up as well as hooking into networking for http2 pgsql etc. Since it’s running the process in eBPF it’s essentially sandboxed and all memory, kernel function calls, and even profiling, is an option. They have an agent that collects this information and sends to the server over RPC (protobuf/grpc). You should check it out (however, some of the docs are in Chinese).

sharangxy2y ago

We are in the process of translating more documents into English, and we also greatly welcome the help of the community!

progbits2y ago

> DeepFlow can even analyze code performance through network profiling under old version kernels (2.6+).

This is an interesting call-out, the last release of 2.6 is from 2011. I wonder who is still running that in production.

sharangxy2y ago

Some of our users’ ancestral processes are running on kernel 2.6, and the operations staff dare not upgrade the kernel. Indeed, there are many limitations in 2.6, but the simple traffic analysis has brought surprising insights to users. However, this also brings some troubles: even if a problem is known, no one dares to easily modify the code to fix it, unless absolutely necessary :)

robertheadley2y ago

I don't have any idea what this is, but the graphs are beautiful.

1 more reply

jinxiao20102y ago

good project. I'll take a try. ebpf is so popular, but we're still using traditional network plugins.

bobberkarl2y ago

How are you different from Pixie?

sharangxy2y ago

The biggest difference: DeepFlow enables *Distributed* Tracing.

In addition, DeepFlow combines the capabilities of eBPF and cBPF to achieve full-stack tracing of syscall + network_forward. You can take a look at our documentation: https://deepflow.io/docs/about/features/

ilovesnow2y ago

nice

mdaniel2y ago

Apache 2, if that interests you: https://github.com/deepflowio/deepflow/blob/v6.4.7/LICENSE

heh, GitHub also has "symbol navigation" turned on for that license file but I didn't dig into it to find out what source language it thinks the file is

1 more reply

j / k navigate · click thread line to collapse

18 comments

bigcat123456782y ago

Nice project, have been following this project casually for a while. The standout feature is to trace RPC flow across network connections, through packet tracing.

nimrody2y ago

How can it tie requests arriving at a service and generating additional downstream requests?

Distributed tracing needs some common token all requests share to identify all RPCs that should be associated with a specific incoming request.

sharangxy2y ago

VP of DeepFlow here. Thank you for your interest in DeepFlow!

Of course, this kind of Zero Code distributed tracing currently has some limitations. For specific details, please see: https://deepflow.io/docs/features/distributed-tracing/auto-t...

These limitations are not entirely insurmountable. We are actively working on resolving them and continually making breakthroughs.

1 more reply

Eridrus2y ago

It looks like it depends on applications either using threads or go routines for concurrency:

archivator2y ago

Take a look at Core Feature #2 in this post - https://deepflow.io/ebpf-the-key-technology-to-observability...

It looks like it's using tcp flow tuple + tcp_seq to join things.

javierhonduco2y ago

Haven’t checked the source code yet, wondering if profiling of code without frame pointes is supported. Curious on their approach.

reactordev2y ago

sharangxy2y ago

We are in the process of translating more documents into English, and we also greatly welcome the help of the community!

progbits2y ago

> DeepFlow can even analyze code performance through network profiling under old version kernels (2.6+).

This is an interesting call-out, the last release of 2.6 is from 2011. I wonder who is still running that in production.

sharangxy2y ago

robertheadley2y ago

I don't have any idea what this is, but the graphs are beautiful.

1 more reply

jinxiao20102y ago

good project. I'll take a try. ebpf is so popular, but we're still using traditional network plugins.

bobberkarl2y ago

How are you different from Pixie?

sharangxy2y ago

The biggest difference: DeepFlow enables *Distributed* Tracing.

ilovesnow2y ago

nice

mdaniel2y ago

Apache 2, if that interests you: https://github.com/deepflowio/deepflow/blob/v6.4.7/LICENSE

heh, GitHub also has "symbol navigation" turned on for that license file but I didn't dig into it to find out what source language it thinks the file is

1 more reply

j / k navigate · click thread line to collapse