Stack Traces Are Underrated (opens in new tab)

(karl.berlin)

96 pointszoidb1y ago56 comments

56 comments

Stack traces are your #1 ally when supporting someone else's legacy production pile.

Once you get comfortable with how they work and what information they contain, you can hit the ground running anywhere. Stack traces will teach you about the product architecture faster than anyone on the team can.

As you embrace them, you take the little bit of extra time to make sure they go well. For example, re-throwing exceptions correctly, properly awaiting results, etc. Very minor details that make all the difference.

A broader outcome of this enlightenment is preference for monolithic products. Stack traces fare poorly across web service and API boundaries. If you've only ever worked with microservice architectures, the notion of a stack trace may seem distracting.

pjc501y ago

> A broader outcome of this enlightenment is preference for monolithic products. Stack traces fare poorly across web service and API boundaries. If you've only ever worked with microservice architectures, the notion of a stack trace may seem distracting.

Yes. People forget that the original concept of microservices, the AWS "everything must have an API", was to put in an accountability boundary across teams. Either the API behaves per its contract or it does not, you're neither expected nor really allowed to cross that boundary into the API to find out why it's doing that.

In an environment which is correctly doing "each microservice is a different small team", that helps. In an environment which is doing "one team maintains lots of microservices", this is nearly always an anti-pattern.

mike_hearn1y ago

Accountability and transparency go hand in hand, though. Teams would be able to debug together much easier if stack traces were propagated across RPCs, and good RPC frameworks can do that. Unfortunately when (ab)using HTTP+JSON for RPCs, good cross-service debugging is one of the first casualities.

1 more reply

the_mitsuhiko1y ago

> But Rust has a better workaround to create stack traces: the backtrace module, which allows capturing stack traces that you can then add to the errors you return. The main problem with this approach is that you still have to add the stack trace to each error and also trust library authors to do so.

That's technically true, but the situation is not as dire. Many errors do not need stack traces. That so few carry a backtrace in Rust is mostly a result of the functionality still not being stable [1].

The I think bigger issue is that people largely have given up on stack traces I think, in parts because of async programming. There are more and more programming patterns and libraries where back traces are completely useless. For instance in JavaScript I keep working with dependencies that just come minified or transpiled straight out of npm. In theory node has async stack traces now, but I have yet to see this work through `setTimeout` and friends. It's very common to lose parts of the stack.

Because there are now so many situations where stack traces are unreliable, more and more programmers seemingly do lose trust in them and don't see the value they once provided.

I also see it in parts at Sentry where a shocking number of customers are completely willing to work with just minified stack traces and not set up source maps to make them readable.

[1]: https://github.com/rust-lang/rust/issues/99301

reseasonable1y ago

Not sure about node (and I don’t recall it ever being a problem), but chrome supports stack traces through setTimeout just fine.

I’m not sure there are many reputable modules on npm that minify without source maps, and if people aren’t using them I’d consider them to be making a poor technical choice, one that I would correct before contributing to the project.

Diffing two lengthy stack traces to find a divergence is perhaps the fastest way to debug a slew of bug types. Let alone just the ability to instantly click into a file/line even from console prints as you follow the execution path.

And my favorite part is being able to ignore / hide external modules and specific files in chrome’s debugger which allows for stepping through only your code, and evaluating much shorter traces. Something java needed decades ago.

When I do use print debugging I always use console.error to include the expandable stack trace as needed, I can’t imagine how slow it would be to not have that always, and have to resort to stepping and breakpoints to get around.

the_mitsuhiko1y ago

> I’m not sure there are many reputable modules on npm that minify without source maps, and if people aren’t using them I’d consider them to be making a poor technical choice, one that I would correct before contributing to the project.

React is a good example of a library that is a transpiled mess when installed from npm. Sadly not the only one, there are many more popular libraries that look like this.

badmintonbaseba1y ago

Python asyncio supports meaningful stack traces through async functions just fine.

  import asyncio
  
  async def baz():
      await asyncio.sleep(.1)
      raise RuntimeError()
  
  async def bar():
      await asyncio.sleep(.1)
      await baz()
  
  async def foo():
      await asyncio.sleep(.1)
      await bar()
  
  async def main():
      await asyncio.sleep(.1)
      await foo()
  
  if __name__ == "__main__":
      loop = asyncio.new_event_loop()
      asyncio.set_event_loop(loop)
      main_task = loop.create_task(main())
      try:
          loop.run_until_complete(main_task)
      except KeyboardInterrupt:
          main_task.cancel()
          loop.run_until_complete(asyncio.wait([main_task]))
          pass

And then run: $ python3 test_stacktrace.py Traceback (most recent call last): File "/home/user/tmp/test_stacktrace.py", line 24, in <module> loop.run_until_complete(main_task) File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete return future.result() File "/home/user/tmp/test_stacktrace.py", line 17, in main await foo() File "/home/user/tmp/test_stacktrace.py", line 13, in foo await bar() File "/home/user/tmp/test_stacktrace.py", line 9, in bar await baz() File "/home/user/tmp/test_stacktrace.py", line 5, in baz raise RuntimeError() RuntimeError

the_mitsuhiko1y ago

Note how also in this stack trace you lose the information about where the main task was scheduled [1]. While you can stitch together the await points, it's much harder to find where tasks are originating. This is also true for `TaskGroup` where the actual call that schedules a task is lost. You will just find the eventual await, which might be the task group (which is good, since that would be structural concurrency), but often you just find nothing since it's not properly awaited or in a completely different place (eg: pending shutdown).

[1]: the important line is "main_task = loop.create_task(main())"

1 more reply

Yoric1y ago

Maybe in your example, but I long ago gave up on having stack traces with any meaningful async Python code. Is it the frameworks' fault? Presumably. But the end result is the same for me. Which is a shame, because Python stack traces are really good, when they work.

vlovich1231y ago

I wish there was a mode to force Errors to automatically capture traces & print them as part of the chain on panic. Would save a lot of time when debugging & let you force libraries into supporting it.

bbatha1y ago

> In theory node has async stack traces now, but I have yet to see this work through `setTimeout` and friends. It's very common to lose parts of the stack.

You need to use the actual `await` syntax to get an async stack trace in node. Callbacks and raw promise work can't be seen by the async stack trace implementation which hooks into `await` points.

johncolanduoni1y ago

One nice side effect of how Rust’s Futures work is that in many cases “normal” stack traces actually reflect the async/await flow accurately. You should see a series of “poll” methods called on each future in the async call chain.

the_mitsuhiko1y ago

Only until you spawn it into an executor :(

CMDBob1y ago

A stack trace (or even better, a minidump with the call stack!) is one of the most useful debugging things for me. Hell, the call stack in general is super useful to me!

I can look at a stack trace, go "oh, function X is misbehaving after being called by function Y, from function Z", and work out what's gone wrong from the context clues, and other debugger info. As a game developer, with codebases that are big, semi-monolithic codebases, it's essential, especially when code crosses the gameplay/engine and engine/kernel barriers.

montebicyclelo1y ago

Great point. I've found it's very often possible to understand and fix problems "one shot" from stack traces alone — and we're talking production builds here... So I wouldn't turn them off, (an idea mentioned in the article), unless profiling shows that they are one of the last things preventing the code from reaching the target performance.

windward1y ago

>Are they just not used to having them so that they don't miss them?

The languages that I work in that don't print useful traces are typically strongishly-typed system languages. So I miss them - sometimes having to step through offending lines of code in a debugger - but I also completely avoid a whole class of bugs that are responsible for most of my stack traces in Python.

TFA's example isn't one of these, but is a function that would have a return code checked and logged if erroneous. This class of bug also can't be inlined and makes an easy breakpoint-ee.

TinkersW1y ago

They are useful sure, and I print a stacktrace on any type of error/exception, but often breaking into the debugger is even more useful and faster as you can see local variables, program state, and what other threads happen to be doing.

piva001y ago

Hard to break into the debugger for a production application running on hundreds of servers.

Cthulhu_1y ago

One can argue whether stack traces should be enabled for production (at least on all servers) given they're relatively expensive to create. Which isn't a problem if they're exceptional, but in a lot of cases they aren't.

1 more reply

TinkersW1y ago

Perhaps, but remote debugging is a thing, though triggering an auto break into debugger would be more complex.

1 more reply

anonzzzies1y ago

I am a big fan of Lisp SBCL stack traces; even in complex projects I never saw before, I'm almost always able to read, interpret and fix the issue just from that.

rootnod31y ago

Oh yes. I haven't found any equivalent yet.

zokier1y ago

Kinda related, but I feel it would be useful for log entries to include file/lineno and/or some unique identifier. Helps both pinpointing where some weird message comes from, and for searching for specific entries in the logs.

Sure, you can grep the log message but it can be difficult if it has some templating/formatting going on, and it can be pretty easy to end up with non-unique messages.

XorNot1y ago

Whats weird is how expensive this can be - i.e. to do it in Go requires invoking runtime reflection, whereas technically the compiler should be able to update the final numbers into the messages at build time.

GuB-421y ago

I don't know if it is part of the reason but stack traces can be considered a vulnerability in some situations.

Also, for "normal" errors, you shouldn't need a stack trace. For example, "file not found" is, from the point of view of the developer an expected situation and should be handled with the same amount of care as it the file was present. You don't dump the internals to the user when you have successfully opened the file, so don't dump them when you haven't.

For unexpected errors (i.e. bugs), the crash, abort, panic, or whatever it is called in your language. These will usually give you a stack trace, or a core dump from where you can extract the stack trace and more.

What I would wish for however would be a standard feature in languages to display a stack trace on command. Many languages have it, but even when they do, they could be more prominent. This way, if you encounter an unexpected situation you want to debug without crashing and without a debugger attached, you can call it.

albertzeyer1y ago

Stack traces are very valuable. Sometimes it can even help to attach them to some object creation, when you later wonder why/how/where this object was created. E.g. in TensorFlow, every single Tensor had a traceback attached to it, so when there was any error later on, it would show you where it was created. This is maybe less needed now with eager mode, but you might have other similar situations.

One problem with stack traces is maybe that they can be too verbose. E.g. if you print them for any warning you print to log (or stdout). Sometimes they will be extremely helpful for debugging some problem, but in many cases, you maybe don't need them (you know why you get the warning and/or you don't care about it).

You could also add more information to the stack trace such as local variables. That can be even more helpful for debugging then, but again adds more verbosity.

For example, we often use this to add information about relevant local variables: https://github.com/albertz/py_better_exchook

One solution to the problem with verbosity is when you have foldable text output. Then the stack trace is folded away (not shown in all details) and you can unfold it to see the details. See the DomTerm demo here: https://github.com/albertz/py_better_exchook#domterm

Some more on text folding:

https://github.com/PerBothner/DomTerm/issues/54

https://gitlab.com/gnachman/iterm2/-/issues/4950

https://github.com/xtermjs/xterm.js/issues/1875

https://gitlab.freedesktop.org/terminal-wg/specifications/-/...

https://github.com/vercel/hyper/issues/1093

harperlee1y ago

My main problem with (jvm) stack traces is that they generally don't include information about the values that are passed to the function calls, so you get the structure of the code, but not the actual value that could help you reproduce the error. I know that once you deal with relatively complex objects that are not trivially serializable you get an address, which is not super useful, but in some codebases / problem areas you still could be getting a lot of information that gets lost due to that design decision.

Chilinot1y ago

I have been an avid proponent of the way errors are managed in Rust and Go for a long time. However, this article raises a very good point. Before i started developing in Rust and Go, i did Java and python for several years. And damn, do i miss those stacktraces every now and then when something bad happens that isn't properly handled by the code.

Still, i do think returning the error as a return value is better than having a completely separate flow when dealing with exceptions. I like that it forces me to properly deal with an error and not just ignore it and think something like "meh, i'll get to this later". Because i will never "get to it later".

karl421y ago

You could combine both by adding a stack frame each time the error is returned one level up. This could be done explicitly (cumbersome and not everyone will do it) or automatically by the language (weird magic, but useful).

cm21871y ago

In C# the quasi mandatory async/await for everything has many downsides, particularly for debugging. It breaks all stack traces. It also makes it impossible to pause the code.

HdS841y ago

Ha? Can you give an example? I've seen lots of perfectly good stack traces in async code - no problems at all. Pausing code also works, at least using vs or rider.

fedeb951y ago

hiding stack traces is a bad practice and should be avoided unless you're in the last layer of the application (i.e. presentation to the user).

creshal1y ago

Stack traces are underrated, unless you're developing EnterpriseJavaSingletonFactoryAbstractionFactoryFactories, in which case they're buffer overflows on your poor log analyzer

someothherguyy1y ago

Stupid clickbait headline for a famished article

berkes1y ago

Way before I consistently used step debuggers and would just "print-debug" println("why are you here?") or "raise-debug" raise new Error("huh?"), I tinkered with a step debugger, but found it too complex and hard. But I remember that it also allowed me to move backwards in the stack.

It allowed me to go some frames back - lines up, up in the stack. I don't recall the name of this debugger, nor what language it was. But I've never since seen this, yet very often wished I had it (for rust, javascript, python, mostly).

Did I misremember? Can such a thing exist? Does it exist?

hinkley1y ago

Time travel debugging is the category, but I can’t help you much more than that with the tool names.

Veserv1y ago

While time-travel debugging would provide such capability, it is highly unlikely they are describing that due to timeframe and unfamiliarity with debuggers.

They are most likely describing the much simpler feature of having the debugger drop stack frames (by force nuking the stack) and then starting over (with all the other side effects that occurred in the “prior” execution still present). This is a fairly common feature for exploratory debugging, but has the obvious downsides of leaving lingering side effects so is only fit for use in non-production environments at best like other “edit-and-continue” features.

iggldiggl1y ago

Time-travel debugging is something different – time-travel debugging means you can actually step backwards through the execution to try and see how you ended up with the bug.

Merely being able to inspect the state of (local) variables further up the stack frame is a much more limited proposition, even if it can still be useful.

> yet very often wished I had it (for […] javascript […] mostly)

Both Firefox's and Chrome/Edge's devtools allow you to do that, don't they? Click on an entry in the stack frame and it takes to the corresponding code line and shows you the state of the variables relevant at that point.

reseasonable1y ago

VB6 had it, and iirc you could even edit the code after stepping back to step forward on a different path. While trying to confirm that with a quick google I see that visual studio added stepping back in 2017. Though not sure it supports editing inline.

1 more reply

piva001y ago

In Java-world it's very common.

cyberax1y ago

Don't worry. Just wait until you start doing async stuff, especially with React.

You'll be dreaming of good old times of linear stacktraces.

miohtama1y ago

Windows 95 introduces threads as a revolution for developer productivity, and you had no longer write async Windows event loops which were hard to debug. Linear stack traces are one of the main selling point, among others.

hinkley1y ago

Node 16 was supposed to make this situation much better but you sure could have fooled me. Is there less salt in my wounds? Sure.

rollulus1y ago

Depends on the audience. As a user I'd rather see "can't load data: failed to parse header: wrong number of elements" than a stack trace with WrongNumbersOfElementsException at the tail.

miohtama1y ago

Stack traces are a feature for developers to locate and fix bugs easily, and should not be a feature for end users.

logicallee1y ago

While you're debugging using AI (specifically, ChatGPT o1), you can benefit from copying stack traces. It debugs better than if you just describe what's wrong.

Another tip: I have found that it is helpful to ask AI to "deeply analyze" (use those words) and think about the problem without providing a solution (say "don't reply with any code"). If you don't do that, it will take its first guess and then eagerly start outputing code that is still wrong and doesn't really identify or fix the issue. When you ask it to deeply analyze what's wrong and not reply with any code, it frequently finds the true underling problem, and then you can ask for how to solve it in the next step.

j / k navigate · click thread line to collapse

56 comments

bob10291y ago

Stack traces are your #1 ally when supporting someone else's legacy production pile.

pjc501y ago

mike_hearn1y ago

1 more reply

the_mitsuhiko1y ago

Because there are now so many situations where stack traces are unreliable, more and more programmers seemingly do lose trust in them and don't see the value they once provided.

I also see it in parts at Sentry where a shocking number of customers are completely willing to work with just minified stack traces and not set up source maps to make them readable.

[1]: https://github.com/rust-lang/rust/issues/99301

reseasonable1y ago

Not sure about node (and I don’t recall it ever being a problem), but chrome supports stack traces through setTimeout just fine.

the_mitsuhiko1y ago

React is a good example of a library that is a transpiled mess when installed from npm. Sadly not the only one, there are many more popular libraries that look like this.

badmintonbaseba1y ago

Python asyncio supports meaningful stack traces through async functions just fine.

  import asyncio
  
  async def baz():
      await asyncio.sleep(.1)
      raise RuntimeError()
  
  async def bar():
      await asyncio.sleep(.1)
      await baz()
  
  async def foo():
      await asyncio.sleep(.1)
      await bar()
  
  async def main():
      await asyncio.sleep(.1)
      await foo()
  
  if __name__ == "__main__":
      loop = asyncio.new_event_loop()
      asyncio.set_event_loop(loop)
      main_task = loop.create_task(main())
      try:
          loop.run_until_complete(main_task)
      except KeyboardInterrupt:
          main_task.cancel()
          loop.run_until_complete(asyncio.wait([main_task]))
          pass

the_mitsuhiko1y ago

[1]: the important line is "main_task = loop.create_task(main())"

1 more reply

Yoric1y ago

vlovich1231y ago

bbatha1y ago

> In theory node has async stack traces now, but I have yet to see this work through `setTimeout` and friends. It's very common to lose parts of the stack.

You need to use the actual `await` syntax to get an async stack trace in node. Callbacks and raw promise work can't be seen by the async stack trace implementation which hooks into `await` points.

johncolanduoni1y ago

the_mitsuhiko1y ago

Only until you spawn it into an executor :(

CMDBob1y ago

A stack trace (or even better, a minidump with the call stack!) is one of the most useful debugging things for me. Hell, the call stack in general is super useful to me!

montebicyclelo1y ago

windward1y ago

>Are they just not used to having them so that they don't miss them?

TFA's example isn't one of these, but is a function that would have a return code checked and logged if erroneous. This class of bug also can't be inlined and makes an easy breakpoint-ee.

TinkersW1y ago

piva001y ago

Hard to break into the debugger for a production application running on hundreds of servers.

Cthulhu_1y ago

1 more reply

TinkersW1y ago

Perhaps, but remote debugging is a thing, though triggering an auto break into debugger would be more complex.

1 more reply

anonzzzies1y ago

I am a big fan of Lisp SBCL stack traces; even in complex projects I never saw before, I'm almost always able to read, interpret and fix the issue just from that.

rootnod31y ago

Oh yes. I haven't found any equivalent yet.

zokier1y ago

Sure, you can grep the log message but it can be difficult if it has some templating/formatting going on, and it can be pretty easy to end up with non-unique messages.

XorNot1y ago

GuB-421y ago

I don't know if it is part of the reason but stack traces can be considered a vulnerability in some situations.

albertzeyer1y ago

You could also add more information to the stack trace such as local variables. That can be even more helpful for debugging then, but again adds more verbosity.

For example, we often use this to add information about relevant local variables: https://github.com/albertz/py_better_exchook

Some more on text folding:

https://github.com/PerBothner/DomTerm/issues/54

https://gitlab.com/gnachman/iterm2/-/issues/4950

https://github.com/xtermjs/xterm.js/issues/1875

https://gitlab.freedesktop.org/terminal-wg/specifications/-/...

https://github.com/vercel/hyper/issues/1093

harperlee1y ago

Chilinot1y ago

karl421y ago

cm21871y ago

In C# the quasi mandatory async/await for everything has many downsides, particularly for debugging. It breaks all stack traces. It also makes it impossible to pause the code.

HdS841y ago

Ha? Can you give an example? I've seen lots of perfectly good stack traces in async code - no problems at all. Pausing code also works, at least using vs or rider.

fedeb951y ago

hiding stack traces is a bad practice and should be avoided unless you're in the last layer of the application (i.e. presentation to the user).

creshal1y ago

Stack traces are underrated, unless you're developing EnterpriseJavaSingletonFactoryAbstractionFactoryFactories, in which case they're buffer overflows on your poor log analyzer

someothherguyy1y ago

Stupid clickbait headline for a famished article

berkes1y ago

Did I misremember? Can such a thing exist? Does it exist?

hinkley1y ago

Time travel debugging is the category, but I can’t help you much more than that with the tool names.

Veserv1y ago

While time-travel debugging would provide such capability, it is highly unlikely they are describing that due to timeframe and unfamiliarity with debuggers.

iggldiggl1y ago

Time-travel debugging is something different – time-travel debugging means you can actually step backwards through the execution to try and see how you ended up with the bug.

Merely being able to inspect the state of (local) variables further up the stack frame is a much more limited proposition, even if it can still be useful.

> yet very often wished I had it (for […] javascript […] mostly)

reseasonable1y ago

1 more reply

piva001y ago

In Java-world it's very common.

cyberax1y ago

Don't worry. Just wait until you start doing async stuff, especially with React.

You'll be dreaming of good old times of linear stacktraces.

miohtama1y ago

hinkley1y ago

Node 16 was supposed to make this situation much better but you sure could have fooled me. Is there less salt in my wounds? Sure.

rollulus1y ago

Depends on the audience. As a user I'd rather see "can't load data: failed to parse header: wrong number of elements" than a stack trace with WrongNumbersOfElementsException at the tail.

miohtama1y ago

Stack traces are a feature for developers to locate and fix bugs easily, and should not be a feature for end users.

logicallee1y ago

While you're debugging using AI (specifically, ChatGPT o1), you can benefit from copying stack traces. It debugs better than if you just describe what's wrong.

j / k navigate · click thread line to collapse