Distributed tracing needs some common token all requests share to identify all RPCs that should be associated with a specific incoming request.
Yes, we have implemented distributed tracing using eBPF. In simple terms, we use thread-id, coroutine-id, and tcp-seq to automatically correlate all spans. Most importantly, we use eBPF to calculate a syscall-trace-id (without the need to propagate it between upstream and downstream), enabling automatic correlation of a service's ingress and egress requests. For more details, you can refer to our paper presented at SIGCOMM'23: https://dl.acm.org/doi/10.1145/3603269.3604823.
Of course, this kind of Zero Code distributed tracing currently has some limitations. For specific details, please see: https://deepflow.io/docs/features/distributed-tracing/auto-t...
These limitations are not entirely insurmountable. We are actively working on resolving them and continually making breakthroughs.
> When collecting invocation logs through eBPF and cBPF, DeepFlow calculates information such as syscall_trace_id, thread_id, goroutine_id, cap_seq, tcp_seq based on the system call context. This allows for distributed tracing without modifying application code or injecting TraceID and SpanID. Currently, DeepFlow can achieve Zero Code distributed tracing for all cases except for cross-thread communication (through memory queues or channels) and asynchronous invocations.
It looks like it's using tcp flow tuple + tcp_seq to join things.
This is an interesting call-out, the last release of 2.6 is from 2011. I wonder who is still running that in production.
In addition, DeepFlow combines the capabilities of eBPF and cBPF to achieve full-stack tracing of syscall + network_forward. You can take a look at our documentation: https://deepflow.io/docs/about/features/
heh, GitHub also has "symbol navigation" turned on for that license file but I didn't dig into it to find out what source language it thinks the file is