How Does a C Debugger Work? (2014) (opens in new tab)

(blog.0x972.info)

131 pointsbtashton5y ago29 comments

29 comments

It writes an invalid instruction at this location. What ever this instruction, it just has to be invalid.

On x86 at least, it is a valid instruction. INT3, or CC in hex. There are also the debug registers which implement breakpoints without modifying any code, although it's limited to a maximum of 4 at once.

Characterising gdb as a "C debugger" is quite appropriate --- try to debug the Asm directly with it is an excruciating experience.

wallnuss5y ago

Having recently had the pleasure of having to debug JIT-compiled code with an ABI mismatch, I can't overstate how useful `rr` (https://github.com/mozilla/rr) can be to debug assembly. The ability to `rsi` e.g. reverse step instruction is very powerful.

One tool that I started exploring is https://pernos.co/, the ability to do dataflow analysis is super cool. Let's you easily answer the question "How did this value get into this register".

jlokier5y ago

`rr` is indeed very excellent.

It's really striking that it can't work on ARM though, due to an ARM architectural issue which x86 doesn't have.

pm2155y ago

Yeah, it's useful to have a special-purpose 'software breakpoint' instruction. Arm actually didn't have one until v5 of the architecture, so some older systems or those wanting to maintain compatibility will still use an arbitrary undefined-instruction pattern. The advantage of an architecturally defined instruction rather than picking something invalid at random is (a) software can rely on future CPUs not deciding to use it for some actual feature (b) software can easily distinguish it from a random attempt to execute an illegal instruction and (c) it acts as a "strong convention" that pushes all software (OSes, debuggers, etc) towards using the same mechanism for setting a software breakpoint, which improves interoperability. In the Unix world the distinction is usually surfaced to the debugger as SIGTRAP vs SIGILL.

weinzierl5y ago

(d) it facilitates code obfuscation because you don't have to guess which instruction you have to use to confuse the debugger

saagarjha5y ago

Really? I use GDB all the time to debug assembly, it's quite good at it. Perhaps there's something much better that I'm not aware of?

userbinator5y ago

Having come from a background of WinDbg (Windows) and DEBUG (DOS) before it, GDB feels very "unergonomic" in comparison. The general verbosity (why do you need an asterisk in front of an address when writing a breakpoint --- as the title of this site so prominently reminds?), lack of a regular hexdump command (16-byte hex+ASCII format) and the rather perplexing behaviour of the "disassemble" command (https://stackoverflow.com/questions/1237489/how-to-disassemb...) are what comes to mind immediately.

WinDbg also lets you set the default base to 16, a feature whose usefulness is greatly appreciated when working with Asm: https://docs.microsoft.com/en-us/windows-hardware/drivers/de... GDB... makes an attempt: https://sourceware.org/bugzilla/show_bug.cgi?id=23390

PaulDavisThe1st5y ago

gdb) watch my_ptr

what does this mean? watch when the value at the address given by my_ptr changes, or watch when the value of my_ptr changes (i.e. it is modified to point to a different location) ?

gdb) watch * my_ptr

Ah ... now it's clear what is meant.

gdb) watch 0xfeedface

hmm, now 0xfeedface is not a variable but literally an address. But wait, is it? What does this mean? Watch the value at memory location 0xfeedface? But that's totally inconsistent with the semantics of "watch my_ptr". So,

gdb) watch * 0xfeedface

and once again, no ambiguity over what is going on and consistent syntax.

As for you other complaints, I've been programming for about 33 years in C and C++, and I don't recall ever needing to use a hexdump inside the debugger or the disassemble command. Which is not to say that they're not important for some work, but they are also not important for all work.

4 more replies

trott5y ago

If you can use DDD (which is an interface to GDB), it shows the disassembly.

wazari9725y ago

Author here.

I will rewrite this sentence, "invalid instruction" is too limited. It should be something like "an instruction that traps, and which isn't already used for another purpose". Syscalls trap but they are already used; and INT3/CC are valid instructions.

drfuchs5y ago

Your description still has problems on architectures like x86 with instructions of different lengths. In particular, you pretty much have to use a single-byte instruction to do the trap; a multi-byte one runs the risk of clobbering other code that could well be jumped to. That’s why INT3 exists as a single-byte instruction to begin with!

dima555y ago

What issues did you have debugging assembly? I've done it somewhat, and it SEEMS to work ok.

elvis705y ago

See also the "Writing a Linux Debugger" series of posts in which a source-level debugger is implemented: https://blog.tartanllama.xyz/writing-a-linux-debugger-setup/

woodruffw5y ago

This is a fantastic summary of debugger implementation!

Another great one that actually walks through writing a basic debugger is Eli Bendersky's series[1].

One nitpick:

> It could, and that would work (that the way valgrind memory debugger works), but that would be too slow. Valgrind slows the application 1000x down, GDB doesn't. That's also the way virtual machines like Qemu work.

This is usecase-dependent: running a program until you hit a breakpoint will be significantly faster with `int 3`, but running a piece of instrumentation on every instruction (or branch, or basic block, or ...) will be significantly faster with Valgrind (or another dynamic binary instrumentation framework). This is because Valgrind and other DBI tools can rewrite the instruction stream to sidecar instrumentation into the same process, versus converting every instruction (or other program feature) into a sequence of expensive system calls.

[1]: https://eli.thegreenplace.net/tag/debuggers

jbn5y ago

One useful reference for this is https://www.cs.tufts.edu/~nr/pubs/retargetable-abstract.html

qlk11235y ago

It writes an invalid instruction at this location. What ever this instruction, it just has to be invalid.

RISC-V actually did this in a special instruction called ebreak. It can change the CPU privileged mode into Debug Mode.

wazari9725y ago

Author here.

As I mentioned above, I will rewrite this sentence to include dedicated special instructions. It was a wording mistake not to mention it!

Gunax5y ago

What about optimizations?

Isn't it possible the compiler will re-order or combine statements differently than how they are written in source?

flohofwoe5y ago

That's why step-debugging usually is done on unoptimized builds where there's a clear relationship between the original source code lines and variables and the generated machine code.

But there's a wide "grey area" depending on optimization level where the debug information is more or less off yet the mapping is still "good enough". Debugging on optimized builds can work surprisingly well if you know what to expect (e.g. the debugging cursor might jump to unexpected places in the source code because the line mapping is off, or you can't step into a function because it has been inlined).

saagarjha5y ago

Yes, it is. Do you have a specific question?

MaxBarraclough5y ago

I think their question was this: how is it that I can step through and see the expected changes to my variables, given that the optimiser is permitted to re-order/elide/restructure my code?

Asooka5y ago

You will not see the expected changes and stepping through will jump around erratically. Often you will be unable to print the value of a variable because it has been optimised out.

jonny3835y ago

And here I was thinking this would be an article on writing printf() statements.

j / k navigate · click thread line to collapse

29 comments

userbinator5y ago

It writes an invalid instruction at this location. What ever this instruction, it just has to be invalid.

Characterising gdb as a "C debugger" is quite appropriate --- try to debug the Asm directly with it is an excruciating experience.

wallnuss5y ago

One tool that I started exploring is https://pernos.co/, the ability to do dataflow analysis is super cool. Let's you easily answer the question "How did this value get into this register".

jlokier5y ago

`rr` is indeed very excellent.

It's really striking that it can't work on ARM though, due to an ARM architectural issue which x86 doesn't have.

pm2155y ago

weinzierl5y ago

(d) it facilitates code obfuscation because you don't have to guess which instruction you have to use to confuse the debugger

saagarjha5y ago

Really? I use GDB all the time to debug assembly, it's quite good at it. Perhaps there's something much better that I'm not aware of?

userbinator5y ago

PaulDavisThe1st5y ago

gdb) watch my_ptr

what does this mean? watch when the value at the address given by my_ptr changes, or watch when the value of my_ptr changes (i.e. it is modified to point to a different location) ?

gdb) watch * my_ptr

Ah ... now it's clear what is meant.

gdb) watch 0xfeedface

gdb) watch * 0xfeedface

and once again, no ambiguity over what is going on and consistent syntax.

4 more replies

trott5y ago

If you can use DDD (which is an interface to GDB), it shows the disassembly.

wazari9725y ago

Author here.

drfuchs5y ago

dima555y ago

What issues did you have debugging assembly? I've done it somewhat, and it SEEMS to work ok.

elvis705y ago

See also the "Writing a Linux Debugger" series of posts in which a source-level debugger is implemented: https://blog.tartanllama.xyz/writing-a-linux-debugger-setup/

woodruffw5y ago

This is a fantastic summary of debugger implementation!

Another great one that actually walks through writing a basic debugger is Eli Bendersky's series[1].

One nitpick:

[1]: https://eli.thegreenplace.net/tag/debuggers

jbn5y ago

One useful reference for this is https://www.cs.tufts.edu/~nr/pubs/retargetable-abstract.html

qlk11235y ago

It writes an invalid instruction at this location. What ever this instruction, it just has to be invalid.

RISC-V actually did this in a special instruction called ebreak. It can change the CPU privileged mode into Debug Mode.

wazari9725y ago

Author here.

As I mentioned above, I will rewrite this sentence to include dedicated special instructions. It was a wording mistake not to mention it!

Gunax5y ago

What about optimizations?

Isn't it possible the compiler will re-order or combine statements differently than how they are written in source?

flohofwoe5y ago

That's why step-debugging usually is done on unoptimized builds where there's a clear relationship between the original source code lines and variables and the generated machine code.

saagarjha5y ago

Yes, it is. Do you have a specific question?

MaxBarraclough5y ago

I think their question was this: how is it that I can step through and see the expected changes to my variables, given that the optimiser is permitted to re-order/elide/restructure my code?

Asooka5y ago

You will not see the expected changes and stepping through will jump around erratically. Often you will be unable to print the value of a variable because it has been optimised out.

jonny3835y ago

And here I was thinking this would be an article on writing printf() statements.

j / k navigate · click thread line to collapse