Show HN: I wrote a tool in Rust for tracking all allocations in a Linux process (opens in new tab)

(github.com)

113 pointsmkimball3y ago30 comments

30 comments

Neat. I had made something similar for work a while back, but as a LD_PRELOAD library that intercepted calls to malloc and friends. It would add extra space to every allocation so it could add a pointer at the end that would point into a leaf node of a call graph backtrace tree it maintained. Each node in the tree had lifetime allocated/freed block counts and bytes by code site. The cool part about it was that it barely affected the performance of the application.

It made its own socket and thread to listen on it. It would just dump a snapshot of tree to anything that connected. I also had some tooling that would let you diff two snapshots, since it was helpful to see if particular stimuli cause persistent extra allocations. While finding the largest outstanding delta between allocated and free bytes was great for finding leaks, sorting by lifetime count of blocks allocated was also fun. I remember some little puzzle game I enjoyed playing at the time would allocate and free tens of thousands of blocks as you dragged a line around for a second.

There was a tricky chicken and egg problems with LD_PRELOAD wrapping one of the allocation functions, because it was used internally by dlsym, which I was using to retrieve pointers to the proper function implementations. (calloc if I recall correctly.) I hacked around it by making my library allocate bytes out of a static char array for the calloc call that would happen while dlsym-ing for calloc. Debugging this was a nightmare, since it would break so early in the process's lifetime that GDB breakpoints weren't functioning. Tracking in a second process seems like a way simpler idea, and probably doesn't have too much of an impact on performance.

Scramblejams3y ago

If $JOB will let you throw it on GitHub, I'll try it!

arsome3y ago

Sounds interesting but I'd very much appreciate knowing what the output any exploration capabilities look like in allocscope-view before jumping into installation, maybe add some screenshots to the readme. Poking around the code it looks like a curses-based interface.

mkimballOP3y ago

Yeah, it's a curses based interface, but with an option to output a text report for offline use.

Good idea to add screenshots.

wongarsu3y ago

That looks quite neat.

Though I'm currently not on a x64 linux, and since the main selling point seems to be the TUI it would be great to have a couple screenshots, or even better a gif of an asciinema recording (or whatever people use now).

yohannesk3y ago

This might help https://twitter.com/KimballCode/status/1614276163005726720?c...

catskul23y ago

Could you compare/contrast its functionality to https://github.com/KDE/heaptrack ?

alschwalm3y ago

Interesting approach. How is performance compared to something like https://github.com/koute/bytehound

kouteiheika3y ago

Bytehound author here.

Just from a cursory look at the README:

> allocscope-trace attaches to another process as a debugger. By using breakpoints on memory allocation functions such as malloc it tracks allocations made by that process.

Looks like it's using breakpoints so I'd expect it to be orders of magnitude slower. And looking at the source code it's also using `libunwind`, so even if it wasn't using breakpoints it'd still be at least another order of magnitude slower since Bytehound has a custom unwinder that's specially optimized for this purpose.

One advantage it has is that it can be attached to an already running process; Bytehound can't do that. (I have ideas how I could do that, and it should be technically doable by dynamically injecting Bytehound's .so into the target process' address space, but so far I haven't needed it so I did not implement it)

kouteiheika3y ago

Out of curiosity I ran a quick test on my private benchmark.

libbytehound.so (with extra debug assertions, because I'm too lazy to recompile in release mode): 4s

allocscope: did not finish after 4 minutes (I got bored waiting and CTRL+C'd it)

alschwalm3y ago

Yeah, that was my assumption as well, good to have it confirmed though. Thanks for your excellent work on bytehound!

dmos623y ago

Why is this being downvoted?

Edit: now this comment is being downvoted.

JoshMcguigan3y ago

Thanks for sharing! I built a similar tool (also in Rust) which allows tracing system and library calls, and could be used for this purpose. I wanted to expose the functionality both as a library and CLI, but for now I’ve only published documentation on using the CLI.

https://github.com/JoshMcguigan/backlight

stevefan19993y ago

If this project can trace memory allocation/deallocation and their call stacks in real time -- this would be super useful, because we can statistically profile which function is always allocating without proper free in a certain time frame (when the memory is supposed to be freed), because valgrind only tells you there are memory leaks but not where is the leak exactly.

mkimballOP3y ago

Depends on what you mean by "real time". My method of crawling the stack impacts execution speed of the app you are tracing. I intended to do future work to minimize that impact.

With allocscope, you do get a callstack for the allocations which leak, though.

stevefan19993y ago

For real time I mean it will not severely slows down a game from 120fps to 40fps, this kind of real time

Too3y ago

Valgrind with --leak-check does include the stack trace of where unfreed memory was originally allocated.

wyldfire3y ago

I'd be curious to see how this ptrace tool performs compared with one that relies on ELF symbol interposition (a la LD_PRELOAD). Other heap profilers (heaptrack, libtcmalloc, etc) use this method. Presumably the loader resolves the symbols once at load time and there's little cost overhead to switch to the profiler code.

However, as a practical matter those solutions might omit mmap which some applications might use for anonymous allocations.

kouteiheika3y ago

> I'd be curious to see how this ptrace tool performs compared with one that relies on ELF symbol interposition (a la LD_PRELOAD).

I've posted some very quick numbers in my comment here comparing it to Bytehound: https://news.ycombinator.com/item?id=34806401

> However, as a practical matter those solutions might omit mmap which some applications might use for anonymous allocations.

Bytehound also gathers mmaps. (:

linuxftw3y ago

Assuming you're running linux, there are some ebpf programs that can accomplish this already, no breakpoints needed.

nequo3y ago

Do you have any that you would recommend specifically?

linuxftw3y ago

This blog post talks about some of the existing tools and their tradeoffs: http://mysqlentomologist.blogspot.com/2021/05/dynamic-tracin...

catskul23y ago

I really like the picture at the top, was that the work of stable diffusion?

mkimballOP3y ago

Midjourney, actually. :)

behnamoh3y ago

Let me guess: it made the HN front page because Rust.

Thaxll3y ago

So it's like strace looking for brk()?

wyldfire3y ago

strace is limited to system calls but this particular tool uses ptrace to trap symbolic references to mmap, malloc, calloc, etc. This provides better resolution because your allocator probably asks for large chunks of memory from the system and allocates from those instead of making each request one-for-one.

weinzierl3y ago

Sorry if this is a dumb question, but can't strace trace brk() calls?

And as kind of a follow up what is the easiest way to trace all allocations (brk() and mmap) but nothing else?

matheusmoreira3y ago

> can't strace trace brk() calls?

Absolutely.

> what is the easiest way to trace all allocations (brk() and mmap) but nothing else?

  strace -e mmap "$command"

I don't think anything modern still uses the program break but one should know brk and sbrk exist. To see deallocations, add munmap to the filter. Note that these represent operating system allocations: programs usually request huge chunks and then manage that memory in user space in order to avoid system call overhead. In many systems, this memory won't actually count as used unless the process actually touches it and causes page fault.

zokier3y ago

Fyi there is -e %memory alias in strace for all memory related syscalls

j / k navigate · click thread line to collapse

30 comments

mitchs3y ago

Scramblejams3y ago

If $JOB will let you throw it on GitHub, I'll try it!

arsome3y ago

mkimballOP3y ago

Yeah, it's a curses based interface, but with an option to output a text report for offline use.

Good idea to add screenshots.

wongarsu3y ago

That looks quite neat.

yohannesk3y ago

This might help https://twitter.com/KimballCode/status/1614276163005726720?c...

catskul23y ago

Could you compare/contrast its functionality to https://github.com/KDE/heaptrack ?

alschwalm3y ago

Interesting approach. How is performance compared to something like https://github.com/koute/bytehound

kouteiheika3y ago

Bytehound author here.

Just from a cursory look at the README:

> allocscope-trace attaches to another process as a debugger. By using breakpoints on memory allocation functions such as malloc it tracks allocations made by that process.

kouteiheika3y ago

Out of curiosity I ran a quick test on my private benchmark.

libbytehound.so (with extra debug assertions, because I'm too lazy to recompile in release mode): 4s

allocscope: did not finish after 4 minutes (I got bored waiting and CTRL+C'd it)

alschwalm3y ago

Yeah, that was my assumption as well, good to have it confirmed though. Thanks for your excellent work on bytehound!

dmos623y ago

Why is this being downvoted?

Edit: now this comment is being downvoted.

JoshMcguigan3y ago

https://github.com/JoshMcguigan/backlight

stevefan19993y ago

mkimballOP3y ago

Depends on what you mean by "real time". My method of crawling the stack impacts execution speed of the app you are tracing. I intended to do future work to minimize that impact.

With allocscope, you do get a callstack for the allocations which leak, though.

stevefan19993y ago

For real time I mean it will not severely slows down a game from 120fps to 40fps, this kind of real time

Too3y ago

Valgrind with --leak-check does include the stack trace of where unfreed memory was originally allocated.

wyldfire3y ago

However, as a practical matter those solutions might omit mmap which some applications might use for anonymous allocations.

kouteiheika3y ago

> I'd be curious to see how this ptrace tool performs compared with one that relies on ELF symbol interposition (a la LD_PRELOAD).

I've posted some very quick numbers in my comment here comparing it to Bytehound: https://news.ycombinator.com/item?id=34806401

> However, as a practical matter those solutions might omit mmap which some applications might use for anonymous allocations.

Bytehound also gathers mmaps. (:

linuxftw3y ago

Assuming you're running linux, there are some ebpf programs that can accomplish this already, no breakpoints needed.

nequo3y ago

Do you have any that you would recommend specifically?

linuxftw3y ago

This blog post talks about some of the existing tools and their tradeoffs: http://mysqlentomologist.blogspot.com/2021/05/dynamic-tracin...

catskul23y ago

I really like the picture at the top, was that the work of stable diffusion?

mkimballOP3y ago

Midjourney, actually. :)

behnamoh3y ago

Let me guess: it made the HN front page because Rust.

Thaxll3y ago

So it's like strace looking for brk()?

wyldfire3y ago

weinzierl3y ago

Sorry if this is a dumb question, but can't strace trace brk() calls?

And as kind of a follow up what is the easiest way to trace all allocations (brk() and mmap) but nothing else?

matheusmoreira3y ago

> can't strace trace brk() calls?

Absolutely.

> what is the easiest way to trace all allocations (brk() and mmap) but nothing else?

  strace -e mmap "$command"

zokier3y ago

Fyi there is -e %memory alias in strace for all memory related syscalls

j / k navigate · click thread line to collapse