Tracing: Structured logging, but better (opens in new tab)

(andydote.co.uk)

263 pointspondidum2y ago129 comments

129 comments

One thing about logging and tracing is the inevitable cost (in real money).

I love observability probably more than most. And my initial reaction to this article is the obvious: why not both?

In fact, I tend to think more in terms of "events" when writing both logs and tracing code. How that event is notified, stored, transmitted, etc. is in some ways divorced from the activity. I don't care if it is going to stdout, or over udp to an aggregator, or turning into trace statements, or ending up in Kafka, etc.

But inevitably I bump up against cost. For even medium sized systems, the amount of data I would like to track gets quite expensive. For example, many tracing services charge for the tags you add to traces. So doing `trace.String("key", value)` becomes something I think about from a cost perspective. I worked at a place that had a $250k/year New Relic bill and we were avoiding any kind of custom attributes. Just getting APM metrics for servers and databases was enough to get to that cost.

Logs are cheap, easy, reliable and don't lock me in to an expensive service to start. I mean, maybe you end up integrating splunk or perhaps self-hosting kibana, but you can get 90% of the benefits just by dumping the logs into Cloudwatch or even S3 for a much cheaper price.

phillipcarter2y ago

FWIW part of the reason you're seeing that is, at least traditionally, APM companies rebranding as Observability companies stuffed trace data into metrics data stores, which becomes prohibitively expensive to query with custom tags/attributes/fields. Newer tools/companies have a different approach that makes cost far more predictable and generally lower.

Luckily, some of the larger incumbents are also moving away from this model, especially as OpenTelemetry is making tracing more widespread as a baseline of sorts for data. And you can definitely bet they're hearing about it from their customers right now, and they want to keep their customers.

Cost is still a concern but it's getting addressed as well. Right now every vendor has different approaches (e.g., the one I work for has a robust sampling proxy you can use), but that too is going the way of standardization. OTel is defining how to propagate sampling metadata in signals so that downstream tools can use the metadata about population representativeness to show accurate counts for things and so on.

_moog2y ago

> Newer tools/companies have a different approach that makes cost far more predictable and generally lower.

What newer tools/companies are in this category? Any that you recommend?

3 more replies

alexisread2y ago

Any of the Clickhouse-based Otel stores can dump the traces to s3 for long-term storage, and can be self-hosted. I know the following use CH: https://uptrace.dev/ https://signoz.io/ https://github.com/hyperdxio/hyperdx

hosh2y ago

I have made use of tracing, metrics, and logging all together and find each of them have its own place, as well as synergies of being able to work with all three together.

Cost is a real issue, and not just in terms of how much the vendor costs you. When tracing becomes a noticeable fraction of CPU or memory usage relative to the application, it's time to rethink doing 100% sampling. In practice, if you are sampling thousands of requests per second, you're very unlikely to actually look through each one of those thousands (thousands of req/s may not be a lot for some sites, but it is already exceeding human-scale without tooling). In order to keep accurate, useful statistics with sampling, you end up using metrics to store trace metrics prior to sampling.

thangalin2y ago

> In fact, I tend to think more in terms of "events" when writing both logs and tracing code.

They are events[1]. For my text editor, KeenWrite, events can be logged either to the console when run from the command-line or displayed in a dialog when running in GUI mode. By changing "logger.log()" statements to "event.publish()" statements, a number of practical benefits are realized, including:

* Decoupled logging implementation from the system (swap one line of code to change loggers).

* Publish events on a message bus (e.g., D-Bus) to allow extending system functionality without modifying the existing code base.

* Standard logging format, which can be machine parsed, to help trace in-field production problems.

* Ability to assign unique identifiers to each event, allowing for publication of problem/solution documentation based on those IDs (possibly even seeding LLMs these days).

[1]: https://dave.autonoma.ca/blog/2022/01/08/logging-code-smell/

jameshart2y ago

But events that another system relies upon are now an API. Be careful not to lock together things that are only superficially similar, as it affects your ability to change them independently.

1 more reply

jameshart2y ago

Observability costs feel high when everything’s working fine. When something snaps and everything is down and you need to know why in a hurry… those observability premiums you’ve been paying all along can pay off fast.

pondidumOP2y ago

As other posters have mentioned, the incument companies rebranding to Observability definitely are expensive, because they are charging in the same way as they do for logs and/or metrics: per entry and per unique dimension (metrics especially).

Honeycomb at least charges per event, which in this case means per span - however they don't charge per span attribute, and each span can be pretty large (100kb / 2000 attributes).

I run all my personal services in their free tier, which has plenty of capacity, and that's before I do any sampling.

csomar2y ago

How does one break into the industry though? I worked in a project tangentially related and the problem is that sales were done by corporate sales man rather than on technicality. The companies buying the product didn't care because the people involved were making "deals". The company selling the product didn't care about making the product better because it was selling and having high AWS bills sounds like they were doing something (even though they were burning money).

BiteCode_dev2y ago

You don't have to keep traces for long though.

Log for long term, traces for short debut and analisys is a fine compromise.

thinkharderdev2y ago

> I mean, maybe you end up integrating splunk or perhaps self-hosting kibana

I think this is the issue. Both Splunk and OpenSearch (even self-hosted OpenSearch) get really pricy as well especially with large volumes of log data. Cloudwatch can also get ludicrously expensive. They charge something like $0.50 per GB (!) and another $0.03 per GB to store. I've seen situations at a previous employer where someone accidentally deployed a lambda function with debug logging and ran up a few thousand $$ in Cloudwatch bills overnight.

You should look at Coralogix (disclaimer: I work there). We've built a platform that allows you to store your observability data in S3 and query it through our infrastructure. It can be dramatically more cost-effective than other providers in this space.

hahn-kev2y ago

Why would you ever have lockin if you're using open telemetry?

layer82y ago

> Log Levels are meaningless. Is a log line debug, info, warning, error, fatal, or some other shade in between?

I partly agree and disagree. In terms of severity, there are only three levels:

– info: not a problem

– warning: potential problem

– error: actual problem (operational failure)

Other levels like “debug” are not about severity, but about level of detail.

In addition, something that is an error in a subcomponent may only be a warning or even just an info on the level of the superordinate component. Thus the severity has to be interpreted relative to the source component.

The latter can be an issue if the severity is only interpreted globally. Either it will be wrong for the global level, or subcomponents have to know the global context they are running in to use the severity appropriate for that context. The latter causes undesirable dependencies on a global context. Meaning, the developer of a lower-level subcomponent would have to know the exact context in which that component is used, in order to chose the appropriate log level. And what if the component is used in different contexts entailing different severities?

So one might conclude that the severity indication is useless after all, but IMO one should rather conclude that severity needs to be interpreted relative to the component. This also means that a lower-level error may have to be logged again in the higher-level context if it’s still an error there, so that it doesn’t get ignored if e.g. monitoring only looks at errors on the higher-level context.

Differences between “fatal” and “error” are really nesting differences between components/contexts. An error is always fatal on the level where it originates.

Hermitian9092y ago

The OP is wrong, log levels are very valuable if you leverage them.

Here's a classic problem as an illustration: The storage cost of your logs is really prohibitive. You would like to cut out some of your logs from storage but cannot lower retention below some threshold (say 2 weeks maybe). For this example, assume that tracing is also enabled and every log has a traceId

A good answer is to run a compaction job that inspects each trace. If it contains an error preserve it. Remove X% of all other traces.

Log levels make the ergonomics for this excellent and it can save millions of dollars a year at sufficient scale.

beachy2y ago

> In addition, something that is an error in a subcomponent may only be a warning or even just an info on the level of the superordinate component.

Or, keep it simple.

- error means someone is alerted urgently to look at the problem

- warning means someone should be looking into it eventually, with a view to reclassifying as info/debug or resolving it.

IMO many people don't care much about their logs, until the shit hits the fan. Only then, in production, do they realise just how much harder their overly verbose (or inadequate) logging is making things.

The simple filter of "all errors send an alert" can go a long way to encouraging a bit of ownership and correctness on logging.

layer82y ago

> - error means someone is alerted urgently to look at the problem

The issue is that the code that encounters the problem may not have the knowledge/context to decide whether it warrants alerting. The code higher up that does have the knowledge, on the other hand, often doesn’t have the lower-level information that is useful to have in the log for analyzing the failure. So how do you link the two? When you write modular code that minimizes assumptions about its context, that situation is a common occurrence.

2 more replies

SkyPuncher2y ago

I agree with your premise, but do consider debug to be a fourth level.

Info is things like “processing X”

Debug is things like “variable is Y” or “made it to this point”

BillinghamJ2y ago

I tend to think of "warning" as - "something unexpected happened, but it was handled safely"

And then "error" as - "things are not okay, a developer is going to need to intervene"

And errors then split roughly between "must be fixed sometime", and "must be fixed now/ASAP"

layer82y ago

> I tend to think of "warning" as - "something unexpected happened, but it was handled safely"

It was handled safely at the level where it occurred, but because it was unusual/unexpected, the underlying cause may cause issues later on or higher up.

If one were sure it would 100% not indicate any issue, one wouldn’t need to warn about it.

1 more reply

fnordpiglet2y ago

Tracing is poor at both very long lived traces, at stream processing, and most tracing implementations are too heavy to run in computationally bound tasks beyond at a very coarse level. Logging is nice in that it has no context, no overhead, is generally very cheap to compose and emit, and with including transaction id and done in a structured way gives you most of what tracing does without all the other baggage.

That said for the spaces where tracing works well, it works unreasonably well.

riv9912y ago

I think Open Telemetry has solved the stream processing problem issue with span links[1]. Treating each unit of work as an individual trace but being able to combine them and see a causal relationship. Slack published a blog about it pretty recently [2]

[1] https://opentelemetry.io/docs/concepts/signals/traces/#span-...

[2] https://slack.engineering/tracing-notifications/

cschneid2y ago

When I worked at ScoutAPM, that list is basically the exact areas where we had issues supporting. We didn't do full-on tracing in the OpenTracing kind of way, but the agent was pretty similar, with spans (mostly automatically inserted), and annotations on those spans with timing, parentage, and extra info (like the sql query this represented in Active record).

The really hard things, which we had reasonable answers for, but never quite perfect: * Rails websockets (actioncable) * very long running background jobs (we stopped collecting at some limit, to prevent unbounded memory) * trying to profile code, we used a modified version of Stackprof to do sampling instead of exact profiling. That worked surprisingly well at finding hotspots, with low overhead.

All sorts of other tricks came along too. I should go look at that codebase again to remind me. That'd be good for my resume.... :)

https://github.com/scoutapp/scout_apm_ruby

phillipcarter2y ago

Hmmm, for long-lived processes and stream processing we use tracing just fine. What we do is make a cutoff of 60 seconds, which each chunk is its own trace. But our backend queries trace data directly, so we can still analyze the aggregate, long-term behavior and then dig into a particular 60 second chunk if it's problematic.

fnordpiglet2y ago

So, here are a few examples -

Suppose you have a long data pipeline that you want to trace jobs across. There are not an enormous number of jobs but each one takes 12 hours across many phases. In theory tracing works great here, but in practice most tracing platforms can’t handle this. This is especially true with tailed based tracing as traces can be unbounded and it has to assume at some point their time out. You can certainly build your own, but most of the value of tracing solutions is the user experience; which is also the hardest part.

On stream processing I’ve generally found it too expensive to instrument stream processors with tracing. Also there’s generally not enough variability to make it interesting. Context stitching and span management as well as sweeping and shipping of traces can be expensive in a lot of implementations and stream processing is often cpu bound.

A simple transaction id annotated log makes a lot more sense in both, queried in a log analytic platform.

alkonaut2y ago

I like a log to read like a book if it’s the result of a task taking a finite time, such as for example an installation, a compilation, a loading of a browser page or similar. Users are going to look into it for clues about what happened and they a) aren’t always related to those who wrote the tools b) don’t have access to the source code or any special log analytics/querying tools.

That’s when you want a log and that’s what the big traditional log frameworks were designed to handle.

A web backend/service is basically the opposite. End users don’t have access to the log, those who analyze it can cross reference with system internals like source code or db state and the log is basically infinite. In that situation a structured log and querying obviously wins.

It’s honestly not even clear that these systems are that closely related.

WatchDog2y ago

It’s a good distinction to make, logging for client based systems, is essentially UI design.

For a web app, serving lots of concurrent users, they are essentially unreadable without tools, so you may as well optimise the logs for tool based consumption.

mrkeen2y ago

> If you’re writing log statements, you’re doing it wrong.

I too use this bait statement.

Then I follow it up with (the short version):

1) Rewrite your log statements so that they're machine readable

2) Prove they're machine-readable by having the down-stream services read them instead of the REST call you would have otherwise sent.

3) Switch out log4j for Kafka, which will handle the persistence & multiplexing for you.

Voila, you got yourself a reactive, event-driven system with accurate "logs".

If you're like me and you read the article thinking "I like the result but I hate polluting my business code with all that tracing code", well now you can create an independent reader of your kafka events which just focuses on turning events into traces.

rewmie2y ago

> 3) Switch out log4j for Kafka, which will handle the persistence & multiplexing for you.

I don't think this is a reasonable statement. There are already a few logging agents that support structured logging without dragging in heavyweight dependencies such as Kafka. Bringing up Kafka sounds like a case of a solution looking for a problem.

lmm2y ago

> I don't think this is a reasonable statement. There are already a few logging agents that support structured logging without dragging in heavyweight dependencies such as Kafka. Bringing up Kafka sounds like a case of a solution looking for a problem.

If it's data you care about then you put it in Kafka, unless you're big enough to use something like Cassandra or rich enough to pay a cloud provider to make redundant data storage their problem. Logs are something that you need to write durably and reliably when shit is hitting the fan and your networks are flaking and machines are crashing - so ephemeral disks are out, NFS is out, ad-hoc log collector gossip protocols are out, and anything that relies on single master -> read replica and "promoting" that replica is definitely out.

Kafka is about as lightweight as it gets for anything that can't be single-machine/SPOF. It's a lot simpler and more consistent than any RDBMS. What else would you use? HDFS (or maybe OpenAFS if your ops team is really good) is the only half-reasonable alternative I can think of.

1 more reply

mrkeen2y ago

> There are already a few logging agents that support structured logging without dragging in heavyweight dependencies such as Kafka.

What are they? Because admittedly I've lost a little love for the operational side of Kafka, and I wish the client-side were a little "dumber", so I could match it better to my uses cases.

ahoka2y ago

I think OP meant event sourcing.

1 more reply

bowsamic2y ago

How to get me to leave your company 101

mrkeen2y ago

I did write a pretty glib description of what to do ;)

That said, I've had conflicts with a previous team-mate about this. He couldn't wrap his head around Kafka being a source of truth. But when I asked him whether he'd trust our Kafka or our Postgres if they disagreed, he conceded that he'd believe Kafka's side of things.

crabbone2y ago

> The second problem with writing logs to stdout

Who on Earth does that? Logs are almost always written to stderr... In part to prevent other problems author is talking about (eg. mixing with the output generated by the application).

I don't understand why this has to be either or... If you store the trace output somewhere you get a log... (let's call it "un-annotated" log, since trace won't have the human-readable message part). Trace is great when examining the application interactively, but if you use the same exact tool and save the results for later you get logs, with all the same problems the author ascribes to logs.

OJFord2y ago

Loads of people, it drives me around the twist too (especially when there's inevitably custom parsing to separate the log messages from the output) but it happens, probably well correlated with people that use more GUI tools, not that there's anything wrong with that, just I think the more you use a CLI the more you're probably aware of this being an issue, or other lesser best practices that might make life easier like newline and tab separation.

FridgeSeal2y ago

I do, as does everyone at my work? Along with basically everyone I’ve ever worked with, ever?

Like, I develop cli apps, so like, what else would go to stdout that you suppose will interfere?

crabbone2y ago

Nothing will go to stdout! Nothing is the best thing you can have when it comes to program output. Easiest validation! This is also how all Unix commands work -- they don't write to stdout unless you tell them to. But, if there's nothing extraordinary happening during the program execution -- nothing is written.

But why would you write your own logs instead of using something built into your language's library? I believe Python's logging module writes to stderr by default. Go's log package always goes to stderr.

But... today I've learned that console.log() in NodeJS writes to stdout... well, I've lots another tiny bit of faith in humanity.

1 more reply

FridgeSeal2y ago

Can’t edit this now, but this is supposed to say “I don’t develop cli apps”

dalyons2y ago

Being doing it for decade+, ever since the 12 factor app concept became popular. It’s way more common imho for web apps than stderr logging.

benreesman2y ago

As a historical critic of Rust-mania (and if I’m honest, kind of an asshole about it too many times, fail), I’ve recently bumped into stuff like tokio-tracing, eyre, tokio-console, and some others.

And while my historical gripes are largely still the status quo: stack traces in multi-threaded, evented/async code that actually show real line numbers? Span-based tracing that makes concurrent introspection possible by default?

I’m in. I apologize for everything bad I ever said and don’t care whatever other annoying thing.

That’s the whole show. Unless it deletes my hard drive I don’t really care about anything else by comparison.

hardwaresofton2y ago

I think there's an alternate universe out there where:

- we collectively realized that logs, events, traces, metrics, and errors are actually all just logs

- we agreed on a single format that encapsulated all that information in a structured manner

- we built firehose/stream processing tooling to provide modern o11y creature comforts

I can't tell if that universe is better than this one, or worse.

andrewstuart22y ago

Traces are just distributed "logs" (in the data structure sense; data ordered only by its appearance in something) where you also pass around the tiniest bit of correlation context between apps. Traces are structured, timestamped, and can be indexed into much more debug-friendly structures like a call tree. But you could just as easily ignore all the data and print them out in streaming sorted order without any correlation.

Honestly it sounds like you're pitching opentelemetry/otlp but where you only trace and leave all the other bits for later inside your opentelemetry collector, which can turn traces into metrics or traces into logs.

hardwaresofton2y ago

So this is kind of what I was talking about but it's more than that -- if your default is structured logs (simplest example is JSON) then all you have to do is put the data you care about into the log.

So I'm imagining something more like:

   {"level":"info", "otlp": { "trace": { ... }}}

   {"level":"info", "otlp": { "error": { ... }}}

   {"level":"info", "otlp": { "log": { ... }}}

   {"level":"info", "otlp": { "metric": { ... }}}

(standardizing this format would be non-trivial of course, but I could imagine a really minimal standard)

Your downstream collector only needs one API endpoint/ingestion mechanism -- unpacking the actual type of telemetry that came in (and persisting where necessary) can be left to other systems.

Basically I think the systems could have been massively simpler in most UNIX-y environments -- just hook up STDOUT (or scrape it, or syslog or whatever), and you're done -- no allowing ports out for jaeger, dealing with complicated buffering, etc -- just log and forget.

phillipcarter2y ago

That's more or less the model Honeycomb uses. Every signal type is just a structured event. Reality is a bit messier, though. In particular, metrics are the oddball in this world and required a lot of work to make economical.

hardwaresofton2y ago

Ah thanks for noting this, I that's exactly the insight I mean here.

Yeah I think the worst case you basically just exfiltrate metrics out to other subsystems (honestly, you could kind of exfiltrate all of this), but the default is pipe heavily compressed stuff to short and long term storage, and some processors for real time... blah blah blah.

Obviously Honeycomb is actually doing the thing and it's not as easy as it sounds, but it feels like if we had all thought like this earlier we might have skipped making a few protocols (zipkin, jaeger, etc), and focused on just data layout (JSON vs protobuf vs GELF, etc) and figuring out what shapes to expect across tools.

dalyons2y ago

Is that really an alternate universe? That’s the universe that splunk and friends are selling, everything’s a log. It’s really expensive.

hardwaresofton2y ago

Splunk does have margins and I think they're quite high. Same with Datadog (see: all the HN startups that are trying to grab some of that space).

There's a big gap between what it takes for the engineering to work and what all these companies charge.

My point is really more about the engineering time wasted on different protocols and stuff when we could have stuffed everything into minimally structured log lines (and figured out the rest of the insight machinery later). Concretely, that zipkin/jaeger/prometheus protocols and stuff may not have needed to exist, etc.

ec1096852y ago

Once you have logs, you can index them in a variety of ways to turn them into metrics, traces, etc., but having logs as the fundamental primitive is powerful.

jeffbee2y ago

This is a great article because everyone should understand the similarity between logging and tracing. One thing worth pondering though is the differences in cost. If I am not planning to centrally collect and index informational logs, free-form text logging is extremely cheap. Even a complex log line with formatted strings and numbers can be emitted in < 1µs on modern machines. If you are handling something like 100s or 1000s of requests per second per core, which is pretty respectable, putting a handful of informational log statements in the critical path won't hurt anyone.

Off-the-shelf tracing libraries on the other hand are pretty expensive. You have one additional mandatory read of the system clock, to establish the span duration, plus you are still paying for a clock read on every span event, if you use span events. Every span has a PRNG call, too. Distributed tracing is worthless if you don't send the spans somewhere, so you have to budget for encoding your span into json, msgpack, protobuf, or whatever. It's a completely different ball game in terms of efficiency.

xyzzy_plugh2y ago

I will agree that conceptually logging can be much cheaper than tracing ever can, but in practice any semi-serious attempt at structured logging ends up looking very, very close to tracing. In fact I'd go so far as to say that the two are effectively interchangeable at a point. What you do with that information, whether you index it or build a graph, is up to you -- and that is where the cost creeps in.

Adding timestamps and UUIDs and an encoding is par for the course in logging these days, I don't think that is the right angle to criticize efficiency.

Tracing can be very cheap if you "simply" (and I'm glossing over a lot here) search for all messages in a liberal window matching each "span start" message and index the result sets. Offering a way to view results as a tree is just a bonus.

Of course, in practice this ends up meaning something completely different, and far costlier. Why that is I cannot fathom.

nithril2y ago

It is actually simpler to conceptualize the difference, one is stateless, the other one is stateful.

Actually structured logging exists since years like in Java https://github.com/logfellow/logstash-logback-encoder

hyperpape2y ago

I don't generally disagree, but using json for structured logs is a growing thing as well.

perpil2y ago

I was recently musing about the 2 different types of logs:

1. application logs, emitted multiple times per request and serve as breadcrumbs

2. request logs emitted once per request and include latencies, counters and metadata about the request and response

The application logs were useless to me except during development. However the request logs I could run aggregations on which made them far more useful for answering questions. What the author explains very well is that the problem with application logs is they aren't very human-readable which is where visualizing a request with tracing shines. If you don't have tracing, creating request logs will get you most of the way there, it's certainly better than application logs. https://speedrun.nobackspacecrew.com/blog/2023/09/08/logging...

ec1096852y ago

Stripe is big believer in request logs: https://stripe.com/blog/canonical-log-lines

ducharmdev2y ago

Minor nitpick, but I wish this post started with defining what we mean by logging vs tracing, since some people use these interchangeably. The reader instead has to infer this from the criticisms of logging.

ryanklee2y ago

I've never encountered this confusion anywhere, so I wouldn't ever think to dispel it. Which isn't to say that I disagree with the more general point that defining your terms is good thing.

In any case, the post itself (which is not long) illustrates and marks out many of the differences.

ducharmdev2y ago

I would guess that you're either not around junior engineers, or people are very good at hiding their confusion.

1 more reply

jlokier2y ago

I agree. I'm working with code that uses 'verbose "message"' for level 1 verbosity logs and 'trace "message"' for level 2 verbosity. Makes sense in its world, but it's not the same meaning as how cloud-devops-observability culture uses those words.

waffletower2y ago

There are logging libraries that include syntactically scoped timers, such as mulog (https://github.com/BrunoBonacci/mulog). While a great library, we preferred timbre (https://github.com/taoensso/timbre) and rolled our own logging timer macro that interoperates with it. More convenient to have such niceties in a Lisp of course. Since we also have OpenTelemetry available, it would also be easy to wrap traces around code form boundaries as well. Thanks OP for the idea!

goalieca2y ago

Logging is essential for security. I think tracing is wonderful and so are metrics. I see these as more of a triad for observability.

waffletower2y ago

Indeed, the three legs (metrics, logs, traces) of OpenTelemetry's telescope. https://opentelemetry.io

candiddevmike2y ago

Something missing from OTel IMO is a standard way of linking all three together. It seems like an exercise left to the reader, but I feel like there should be standard metadata for showing a relationship between traces, metrics, and logs. Right now each of these functions is on an island (same with the tooling and storage of the data, but that's another rant).

1 more reply

gazpacho2y ago

One big failing of OpenTelemetry's traces in particular is that attaching structured data to them is difficult. Most structured logs can be JSON which for all of it's faults most things can be serialized to JSON. OpenTelemetry's attributes on traces are much more limited, they don't even support a null/None value! I wish they just accepted JSON-like data, it'd make it much easier to always use traces.

h1fra2y ago

Tracing is much more actionnable but barely usable without a platform. Which makes local programming dependent on third party. Also it requires passing context or have a way to get back the context in every function that requires it, which can be daunting.

On my side I have opted to mixed structured/text, a generic message that can be easily understood while glancing over logs, and a data object attached for more details.

hinkley2y ago

Someone got me excited about tracing and I started tweaking our stats API to optionally add tracing. Retrofitted it into a mature app, then immediately discovered that all of the data was being dropped because AWS only likes very tiny traces. Depth or fanout or both break it rather quickly.

And OpenTelemetry has a very questionable implementation. For a nested trace, events fire when the trace closes, meaning that a parent ID is reported before it is seen in the stream. That can’t be good for processing. Would be better to have a leading edge event (also helps with errors throwing and the parent never being reported).

Kind of a bummer. Needs work.

pcthrowaway2y ago

> OpenTelemetry has a very questionable implementation

The nice thing about OpenTelemetry is that it's a standard. The questionable implementation you're referencing isn't a source of truth. There isn't some canonical "questionable" implementation.

There are many, slightly different, questionable implementations.

1 more reply

candiddevmike2y ago

You can add Jaeger to your local dev containers and run it in memory, it's really lightweight and easy to use.

vkoskiv2y ago

Nit to the author: 'rapala' seems like a mistranslation. It is a brand name of a company that makes fishing lures, as far as I can tell. It is not the Finnish word for "to bait", and is therefore only used to refer to a that particular brand. I'm not sure what the purpose of the text in parenthesis is here, but 'houkutella' would be the most apt translation in this case.

pondidumOP2y ago

Thanks, I have fixed the definition in the post! Turns out its just company slang for bating, rather than Finnish slang!

jauntywundrkind2y ago

What's most incredible to me is how close tracing feels in spirit to me to event-sourcing.

Here's this log of every frame of compute going on, plus data or metadata about the frame.... but afaik we have yet to start using the same stream of computation for business processes as we do for it's excellent observability.

alexisread2y ago

Any of the Clickhouse-based Otel stores can do event sourcing - just set up materialised views on the trace tables. I know the following use CH: https://uptrace.dev/ https://signoz.io/ https://github.com/hyperdxio/hyperdx

juliogreff2y ago

As a matter of fact, at a previous job we used traces as a data source for event sourcing. One use case: we tracked usage of certain features in API calls in traces, and some batch job ran at whatever frequency aggregated which users were using which features. While it was far from real time because of the sheer amount of data, it was so simple to implement that we had dozens of use cases implemented like that.

koliber2y ago

Does this naive approach work for anyone to allow a log to be read like a trace:

1. At the start of a request, generate a globally unique traceId

2. Pass this traceId through the whole call stack.

3. Whenever logging, log the traceId as a parameter

Now you have a log with many of the plusses of a trace. The only additional cost to the log is the storage of the traceId on every message.

If you want to read a trace, search through your logs for "traceId: xyz123". If you use plain text storage you can grep. If you use some indexed storage, search for the key-value pair.

This way, you can retrieve something that looks like a trace from a log.

This does not solve all the issues named in the article. However, it is a decent tradeoff that I've used successfully in the past. Call it "poor man's tracing".

pondidumOP2y ago

Yes, but going to this effort, why not move to tracing instead?

A migration path I could see might be:

- replace current logging lib with otel logging (sending to same output) - setup tracing - replace logging with tracing over time (I prefer moving the most painful areas of code first)

koliber2y ago

One benefit is that you only need to send one string value (traceId) through the whole call stack, instead of passing around a trace object that gets built up. It seems lighter and simpler to add to an existing codebase.

skybrian2y ago

How would a hobbyist programmer get started with tracing for a simple web app? Where do the traces end up and how do I query it? Can tracing be used in a development environment?

Context: the last thing I wrote used Deno and Deno Deploy.

curioussavage2y ago

Just install opentelemetry libs. I found this example with a quick search: https://dev.to/grunet/leveraging-opentelemetry-in-deno-45bj

opentelemetry has a service you can run that will collect the telemetry data and you can export it to something like prometheus which can store it and let you query it. Example here https://github.com/open-telemetry/opentelemetry-collector-co...

Typically in dev environments trace spans are just emitted to stdout just like logs. I sometimes turn that off too though because it gets noisy.

eep_social2y ago

For local dev Jaegar provides a very nice UI to do trace inspection, searching etc.

spullara2y ago

It drives me insane that the standardized tracing libraries have you only report closed spans. What if it crashes? What if it stalls? Why should I keep open spans in memory when I can just write an end span event?

andersrs2y ago

I have a side project that I run in Kubernetes with a postgres database and a few Go/Nodejs apps. Recommend me a lightweight otel backend that isn't going to blow out my cloud costs.

hosh2y ago

That’s weird. I use both logging and tracing where I can. And metrics.

While there are better tools for alerting, metrics, or aggregations, it helps a lot in debugging and troubleshooting.

aero1422y ago

I think the author's point is that tracing is a better implementation of both logs and metrics, and I think it's a valid point. * metrics are pre-aggregated into timeseries data, which makes cardinality expensive. You could also aggregate a value from a trace statement. * Logs are hand crafted and unique, and are usually improved by adding structured attributes. Structured attributes are better as traces because you can have execution context and well defined attributes that provide better detail.

Traces can be aggregated or sampled to provide all of the information available from logs, but in a more flexible way. * Certain traces can be retained at 100%. This is equivalent to logs. * Certain trace attributes can be converted to timeseries data. This is equivalent to metrics. * Certain traces can be sampled and/or queried with streaming infrastructure. This is a way to observe data with high cardinality without hitting the high cost.

hosh2y ago

There are things you can do with metrics and logging that you cannot do with traces. These usually fall outside of debugging application performance and bottlenecks. So I think what the author says is true if you are only thinking about application, and not for gaining a holistic understanding of the entire system, including infrastructure.

Probably the biggest tradeoff with traces is that, in practice, you are not retaining 100% of all traces. In order to keep accurate statistics, it generally gets ingested as metrics before sampling. The other is that traces are not stored in such a way where you are looking at what is happening at a point-in-time -- which is what logging does well. If I want to ensure I have execution context for logging, I make the effort to add trace and span ids so that traces and logging can be correlated.

To be fair, I live in the devops world more often than not, and my colleagues on the dev teams rarely have to venture outside of traces.

I don't mind the points this author is making. My main criticism is that it is scoped to the world of applications -- which is fine -- but then taken as universal for all of software engineering.

marcus_holmes2y ago

I'm fundamentally uncomfortable with sending all my data to a third party.

The cool thing about logs is that they're just a text file and don't need to be sent over the internet to someone else. But yes, I've encountered some problems just using text logs and I'd like to solve them.

Is there an OpenTelemetry solution that is capable of being self-hosted (and preferably OS) that anyone recommends?

Too2y ago

Grafana Tempo or Jaeger all-in-one are both OpenTelemetry compatible and easy host yourself on small scale.

malinens2y ago

open source tracing tools can be easily locally hosted

jasonjmcghee2y ago

I really enjoyed the content- it's a great article.

Note to author: all but the last code block have a very odd mixture of rather large font sizes (at least on mobile) which vary line to line that make them pretty difficult to read.

Also the link to "Observability Driven Development." was a blank slide deck AFAICT

pondidumOP2y ago

They all look fine in "mobile view" in firefox, and on firefox in android.

It's all statically rendered html, and I don't see anything weird in the html either.

Do you have a screenshot and some device info so I can look a bit more? Thanks

jasonjmcghee2y ago

https://pasteboard.co/xNmrHz0YNmgg.png

Happened in safari and brave.

iOS 16, iPhone 13 Pro

amelius2y ago

This is stuff that a debugger is supposed to do for you, for free.

This should not require code at the application level, but it should be implemented at the tooling level.

Too2y ago

Are you saying every single variable and function call should be logged every time the code runs? In a dream world sure. While we are at it, let's make it possible to freeze all the state from the production system and let me add breakpoints to rewind in time. In the real world, someone has to make a decision what is noise and what is content.

Unless you are talking about profilers, that measure execution time and memory only, but traces are a lot more than only that.

Annotating the code with logs and traces is a UX activity, not for the end users, but for the ops-team. They don't have knowledge of the internals of the code. Logs should be written in the context of levers that ops have control over.

Take the example from the OP: nr of cache hits. It's something ops can control by configuring the cache size, it is something ops can observe and correlate with request-time and network bandwidth. It would require an immensely sophisticated debugger to make all these correlations automatically.

conradludgate2y ago

Are you running a debugger on a web service in production?

lmm2y ago

I wouldn't call it a "debugger", but plenty of people run an instrumentation agent like New Relic or AppDynamics that records tracing information on their production web services with little or even zero modification to their application code.

1 more reply

eep_social2y ago

Tracing would be on every web service in production! At the same time! And saving all the output!

lambda_garden2y ago

Couldn't this be injected into the runtime so that no code changes are required?

Perhaps really performance critical stuff could have a "notrace" annotation.

imiric2y ago

There are several projects that leverage eBPF for automatic instrumentation[1].

How accurate and useful these are vs. doing this manually will depend on the use case, but I reckon the automatic approach gets you most of the way there, and you can add the missing traces yourself, so if nothing else it saves a lot of work.

[1]: https://ebpf.io/applications/

austinsharp2y ago

Yes, OTel has autoinstrumentation libraries for some language that can pick up a fair amount by default. Though it's unlikely that that would ever be sufficient, it's a nice start.

For Java: https://opentelemetry.io/docs/instrumentation/java/automatic...

thinkharderdev2y ago

Sure, and a lot of tools will do this in one way or another. Either instrument code directly or provide annotations/macros to trace a specific method (something like tokio-tracing in the Rust ecosystem).

However, tracing literally every method call would probably be prohibitively expensive so typically you have either:

1. Instrumentation with "understands" common frameworks/libraries and knows what to instrument (eg request handlers in web frameworks)

2. Full opt-in. They make it easy to add a trace for a method invocation with a simple annotation but nothing gets instrumented by default

pondidumOP2y ago

Yes, and itel has instrumentation libraries which do this.

However, no automatic instrumentation can do everything for you; it can't know what are all the interesting properties or things to add as attributes. But adding tracing automatically to SQL clients, web frameworks etc is very valuable

killbot50002y ago

Logs should go to stderr. I will die on this hill.

hello12345672y ago

person writing this came to know some thing that he din't know earlier and decided to convert his light bulb moment into a blog post. not bad bad but failed to understand that logs are the generalisation of very thing they are talking about.

thegrizzlyking2y ago

Logs are mostly "Hi I reached this line of code, here is some metadata"

j / k navigate · click thread line to collapse

129 comments

zoogeny2y ago

One thing about logging and tracing is the inevitable cost (in real money).

I love observability probably more than most. And my initial reaction to this article is the obvious: why not both?

phillipcarter2y ago

_moog2y ago

> Newer tools/companies have a different approach that makes cost far more predictable and generally lower.

What newer tools/companies are in this category? Any that you recommend?

3 more replies

alexisread2y ago

hosh2y ago

I have made use of tracing, metrics, and logging all together and find each of them have its own place, as well as synergies of being able to work with all three together.

thangalin2y ago

> In fact, I tend to think more in terms of "events" when writing both logs and tracing code.

* Decoupled logging implementation from the system (swap one line of code to change loggers).

* Publish events on a message bus (e.g., D-Bus) to allow extending system functionality without modifying the existing code base.

* Standard logging format, which can be machine parsed, to help trace in-field production problems.

* Ability to assign unique identifiers to each event, allowing for publication of problem/solution documentation based on those IDs (possibly even seeding LLMs these days).

[1]: https://dave.autonoma.ca/blog/2022/01/08/logging-code-smell/

jameshart2y ago

But events that another system relies upon are now an API. Be careful not to lock together things that are only superficially similar, as it affects your ability to change them independently.

1 more reply

jameshart2y ago

pondidumOP2y ago

Honeycomb at least charges per event, which in this case means per span - however they don't charge per span attribute, and each span can be pretty large (100kb / 2000 attributes).

I run all my personal services in their free tier, which has plenty of capacity, and that's before I do any sampling.

csomar2y ago

BiteCode_dev2y ago

You don't have to keep traces for long though.

Log for long term, traces for short debut and analisys is a fine compromise.

thinkharderdev2y ago

> I mean, maybe you end up integrating splunk or perhaps self-hosting kibana

hahn-kev2y ago

Why would you ever have lockin if you're using open telemetry?

layer82y ago

> Log Levels are meaningless. Is a log line debug, info, warning, error, fatal, or some other shade in between?

I partly agree and disagree. In terms of severity, there are only three levels:

– info: not a problem

– warning: potential problem

– error: actual problem (operational failure)

Other levels like “debug” are not about severity, but about level of detail.

Differences between “fatal” and “error” are really nesting differences between components/contexts. An error is always fatal on the level where it originates.

Hermitian9092y ago

The OP is wrong, log levels are very valuable if you leverage them.

A good answer is to run a compaction job that inspects each trace. If it contains an error preserve it. Remove X% of all other traces.

Log levels make the ergonomics for this excellent and it can save millions of dollars a year at sufficient scale.

beachy2y ago

> In addition, something that is an error in a subcomponent may only be a warning or even just an info on the level of the superordinate component.

Or, keep it simple.

- error means someone is alerted urgently to look at the problem

- warning means someone should be looking into it eventually, with a view to reclassifying as info/debug or resolving it.

The simple filter of "all errors send an alert" can go a long way to encouraging a bit of ownership and correctness on logging.

layer82y ago

> - error means someone is alerted urgently to look at the problem

2 more replies

SkyPuncher2y ago

I agree with your premise, but do consider debug to be a fourth level.

Info is things like “processing X”

Debug is things like “variable is Y” or “made it to this point”

BillinghamJ2y ago

I tend to think of "warning" as - "something unexpected happened, but it was handled safely"

And then "error" as - "things are not okay, a developer is going to need to intervene"

And errors then split roughly between "must be fixed sometime", and "must be fixed now/ASAP"

layer82y ago

> I tend to think of "warning" as - "something unexpected happened, but it was handled safely"

It was handled safely at the level where it occurred, but because it was unusual/unexpected, the underlying cause may cause issues later on or higher up.

If one were sure it would 100% not indicate any issue, one wouldn’t need to warn about it.

1 more reply

fnordpiglet2y ago

That said for the spaces where tracing works well, it works unreasonably well.

riv9912y ago

[1] https://opentelemetry.io/docs/concepts/signals/traces/#span-...

[2] https://slack.engineering/tracing-notifications/

cschneid2y ago

All sorts of other tricks came along too. I should go look at that codebase again to remind me. That'd be good for my resume.... :)

https://github.com/scoutapp/scout_apm_ruby

phillipcarter2y ago

fnordpiglet2y ago

So, here are a few examples -

A simple transaction id annotated log makes a lot more sense in both, queried in a log analytic platform.

alkonaut2y ago

That’s when you want a log and that’s what the big traditional log frameworks were designed to handle.

It’s honestly not even clear that these systems are that closely related.

WatchDog2y ago

It’s a good distinction to make, logging for client based systems, is essentially UI design.

For a web app, serving lots of concurrent users, they are essentially unreadable without tools, so you may as well optimise the logs for tool based consumption.

mrkeen2y ago

> If you’re writing log statements, you’re doing it wrong.

I too use this bait statement.

Then I follow it up with (the short version):

1) Rewrite your log statements so that they're machine readable

2) Prove they're machine-readable by having the down-stream services read them instead of the REST call you would have otherwise sent.

3) Switch out log4j for Kafka, which will handle the persistence & multiplexing for you.

Voila, you got yourself a reactive, event-driven system with accurate "logs".

rewmie2y ago

> 3) Switch out log4j for Kafka, which will handle the persistence & multiplexing for you.

lmm2y ago

1 more reply

mrkeen2y ago

> There are already a few logging agents that support structured logging without dragging in heavyweight dependencies such as Kafka.

What are they? Because admittedly I've lost a little love for the operational side of Kafka, and I wish the client-side were a little "dumber", so I could match it better to my uses cases.

ahoka2y ago

I think OP meant event sourcing.

1 more reply

bowsamic2y ago

How to get me to leave your company 101

mrkeen2y ago

I did write a pretty glib description of what to do ;)

crabbone2y ago

> The second problem with writing logs to stdout

Who on Earth does that? Logs are almost always written to stderr... In part to prevent other problems author is talking about (eg. mixing with the output generated by the application).

OJFord2y ago

FridgeSeal2y ago

I do, as does everyone at my work? Along with basically everyone I’ve ever worked with, ever?

Like, I develop cli apps, so like, what else would go to stdout that you suppose will interfere?

crabbone2y ago

But... today I've learned that console.log() in NodeJS writes to stdout... well, I've lots another tiny bit of faith in humanity.

1 more reply

FridgeSeal2y ago

Can’t edit this now, but this is supposed to say “I don’t develop cli apps”

dalyons2y ago

Being doing it for decade+, ever since the 12 factor app concept became popular. It’s way more common imho for web apps than stderr logging.

benreesman2y ago

I’m in. I apologize for everything bad I ever said and don’t care whatever other annoying thing.

That’s the whole show. Unless it deletes my hard drive I don’t really care about anything else by comparison.

hardwaresofton2y ago

I think there's an alternate universe out there where:

- we collectively realized that logs, events, traces, metrics, and errors are actually all just logs

- we agreed on a single format that encapsulated all that information in a structured manner

- we built firehose/stream processing tooling to provide modern o11y creature comforts

I can't tell if that universe is better than this one, or worse.

andrewstuart22y ago

hardwaresofton2y ago

So I'm imagining something more like:

   {"level":"info", "otlp": { "trace": { ... }}}

   {"level":"info", "otlp": { "error": { ... }}}

   {"level":"info", "otlp": { "log": { ... }}}

   {"level":"info", "otlp": { "metric": { ... }}}

(standardizing this format would be non-trivial of course, but I could imagine a really minimal standard)

Your downstream collector only needs one API endpoint/ingestion mechanism -- unpacking the actual type of telemetry that came in (and persisting where necessary) can be left to other systems.

phillipcarter2y ago

hardwaresofton2y ago

Ah thanks for noting this, I that's exactly the insight I mean here.

dalyons2y ago

Is that really an alternate universe? That’s the universe that splunk and friends are selling, everything’s a log. It’s really expensive.

hardwaresofton2y ago

Splunk does have margins and I think they're quite high. Same with Datadog (see: all the HN startups that are trying to grab some of that space).

There's a big gap between what it takes for the engineering to work and what all these companies charge.

ec1096852y ago

Once you have logs, you can index them in a variety of ways to turn them into metrics, traces, etc., but having logs as the fundamental primitive is powerful.

jeffbee2y ago

xyzzy_plugh2y ago

Adding timestamps and UUIDs and an encoding is par for the course in logging these days, I don't think that is the right angle to criticize efficiency.

Of course, in practice this ends up meaning something completely different, and far costlier. Why that is I cannot fathom.

nithril2y ago

It is actually simpler to conceptualize the difference, one is stateless, the other one is stateful.

Actually structured logging exists since years like in Java https://github.com/logfellow/logstash-logback-encoder

hyperpape2y ago

I don't generally disagree, but using json for structured logs is a growing thing as well.

perpil2y ago

I was recently musing about the 2 different types of logs:

1. application logs, emitted multiple times per request and serve as breadcrumbs

2. request logs emitted once per request and include latencies, counters and metadata about the request and response

ec1096852y ago

Stripe is big believer in request logs: https://stripe.com/blog/canonical-log-lines

ducharmdev2y ago

ryanklee2y ago

I've never encountered this confusion anywhere, so I wouldn't ever think to dispel it. Which isn't to say that I disagree with the more general point that defining your terms is good thing.

In any case, the post itself (which is not long) illustrates and marks out many of the differences.

ducharmdev2y ago

I would guess that you're either not around junior engineers, or people are very good at hiding their confusion.

1 more reply

jlokier2y ago

waffletower2y ago

goalieca2y ago

Logging is essential for security. I think tracing is wonderful and so are metrics. I see these as more of a triad for observability.

waffletower2y ago

Indeed, the three legs (metrics, logs, traces) of OpenTelemetry's telescope. https://opentelemetry.io

candiddevmike2y ago

1 more reply

gazpacho2y ago

h1fra2y ago

On my side I have opted to mixed structured/text, a generic message that can be easily understood while glancing over logs, and a data object attached for more details.

hinkley2y ago

Kind of a bummer. Needs work.

pcthrowaway2y ago

> OpenTelemetry has a very questionable implementation

The nice thing about OpenTelemetry is that it's a standard. The questionable implementation you're referencing isn't a source of truth. There isn't some canonical "questionable" implementation.

There are many, slightly different, questionable implementations.

1 more reply

candiddevmike2y ago

You can add Jaeger to your local dev containers and run it in memory, it's really lightweight and easy to use.

vkoskiv2y ago

pondidumOP2y ago

Thanks, I have fixed the definition in the post! Turns out its just company slang for bating, rather than Finnish slang!

jauntywundrkind2y ago

What's most incredible to me is how close tracing feels in spirit to me to event-sourcing.

alexisread2y ago

juliogreff2y ago

koliber2y ago

Does this naive approach work for anyone to allow a log to be read like a trace:

1. At the start of a request, generate a globally unique traceId

2. Pass this traceId through the whole call stack.

3. Whenever logging, log the traceId as a parameter

Now you have a log with many of the plusses of a trace. The only additional cost to the log is the storage of the traceId on every message.

If you want to read a trace, search through your logs for "traceId: xyz123". If you use plain text storage you can grep. If you use some indexed storage, search for the key-value pair.

This way, you can retrieve something that looks like a trace from a log.

This does not solve all the issues named in the article. However, it is a decent tradeoff that I've used successfully in the past. Call it "poor man's tracing".

pondidumOP2y ago

Yes, but going to this effort, why not move to tracing instead?

A migration path I could see might be:

- replace current logging lib with otel logging (sending to same output) - setup tracing - replace logging with tracing over time (I prefer moving the most painful areas of code first)

koliber2y ago

skybrian2y ago

How would a hobbyist programmer get started with tracing for a simple web app? Where do the traces end up and how do I query it? Can tracing be used in a development environment?

Context: the last thing I wrote used Deno and Deno Deploy.

curioussavage2y ago

Just install opentelemetry libs. I found this example with a quick search: https://dev.to/grunet/leveraging-opentelemetry-in-deno-45bj

Typically in dev environments trace spans are just emitted to stdout just like logs. I sometimes turn that off too though because it gets noisy.

eep_social2y ago

For local dev Jaegar provides a very nice UI to do trace inspection, searching etc.

spullara2y ago

andersrs2y ago

I have a side project that I run in Kubernetes with a postgres database and a few Go/Nodejs apps. Recommend me a lightweight otel backend that isn't going to blow out my cloud costs.

hosh2y ago

That’s weird. I use both logging and tracing where I can. And metrics.

While there are better tools for alerting, metrics, or aggregations, it helps a lot in debugging and troubleshooting.

aero1422y ago

hosh2y ago

To be fair, I live in the devops world more often than not, and my colleagues on the dev teams rarely have to venture outside of traces.

I don't mind the points this author is making. My main criticism is that it is scoped to the world of applications -- which is fine -- but then taken as universal for all of software engineering.

marcus_holmes2y ago

I'm fundamentally uncomfortable with sending all my data to a third party.

Is there an OpenTelemetry solution that is capable of being self-hosted (and preferably OS) that anyone recommends?

Too2y ago

Grafana Tempo or Jaeger all-in-one are both OpenTelemetry compatible and easy host yourself on small scale.

malinens2y ago

open source tracing tools can be easily locally hosted

jasonjmcghee2y ago

I really enjoyed the content- it's a great article.

Note to author: all but the last code block have a very odd mixture of rather large font sizes (at least on mobile) which vary line to line that make them pretty difficult to read.

Also the link to "Observability Driven Development." was a blank slide deck AFAICT

pondidumOP2y ago

They all look fine in "mobile view" in firefox, and on firefox in android.

It's all statically rendered html, and I don't see anything weird in the html either.

Do you have a screenshot and some device info so I can look a bit more? Thanks

jasonjmcghee2y ago

https://pasteboard.co/xNmrHz0YNmgg.png

Happened in safari and brave.

iOS 16, iPhone 13 Pro

amelius2y ago

This is stuff that a debugger is supposed to do for you, for free.

This should not require code at the application level, but it should be implemented at the tooling level.

Too2y ago

Unless you are talking about profilers, that measure execution time and memory only, but traces are a lot more than only that.

conradludgate2y ago

Are you running a debugger on a web service in production?

lmm2y ago

1 more reply

eep_social2y ago

Tracing would be on every web service in production! At the same time! And saving all the output!

lambda_garden2y ago

Couldn't this be injected into the runtime so that no code changes are required?

Perhaps really performance critical stuff could have a "notrace" annotation.

imiric2y ago

There are several projects that leverage eBPF for automatic instrumentation[1].

[1]: https://ebpf.io/applications/

austinsharp2y ago

Yes, OTel has autoinstrumentation libraries for some language that can pick up a fair amount by default. Though it's unlikely that that would ever be sufficient, it's a nice start.

For Java: https://opentelemetry.io/docs/instrumentation/java/automatic...

thinkharderdev2y ago

However, tracing literally every method call would probably be prohibitively expensive so typically you have either:

1. Instrumentation with "understands" common frameworks/libraries and knows what to instrument (eg request handlers in web frameworks)

2. Full opt-in. They make it easy to add a trace for a method invocation with a simple annotation but nothing gets instrumented by default

pondidumOP2y ago

Yes, and itel has instrumentation libraries which do this.

killbot50002y ago

Logs should go to stderr. I will die on this hill.

hello12345672y ago

thegrizzlyking2y ago

Logs are mostly "Hi I reached this line of code, here is some metadata"

j / k navigate · click thread line to collapse