It made its own socket and thread to listen on it. It would just dump a snapshot of tree to anything that connected. I also had some tooling that would let you diff two snapshots, since it was helpful to see if particular stimuli cause persistent extra allocations. While finding the largest outstanding delta between allocated and free bytes was great for finding leaks, sorting by lifetime count of blocks allocated was also fun. I remember some little puzzle game I enjoyed playing at the time would allocate and free tens of thousands of blocks as you dragged a line around for a second.
There was a tricky chicken and egg problems with LD_PRELOAD wrapping one of the allocation functions, because it was used internally by dlsym, which I was using to retrieve pointers to the proper function implementations. (calloc if I recall correctly.) I hacked around it by making my library allocate bytes out of a static char array for the calloc call that would happen while dlsym-ing for calloc. Debugging this was a nightmare, since it would break so early in the process's lifetime that GDB breakpoints weren't functioning. Tracking in a second process seems like a way simpler idea, and probably doesn't have too much of an impact on performance.
Good idea to add screenshots.
Though I'm currently not on a x64 linux, and since the main selling point seems to be the TUI it would be great to have a couple screenshots, or even better a gif of an asciinema recording (or whatever people use now).
Just from a cursory look at the README:
> allocscope-trace attaches to another process as a debugger. By using breakpoints on memory allocation functions such as malloc it tracks allocations made by that process.
Looks like it's using breakpoints so I'd expect it to be orders of magnitude slower. And looking at the source code it's also using `libunwind`, so even if it wasn't using breakpoints it'd still be at least another order of magnitude slower since Bytehound has a custom unwinder that's specially optimized for this purpose.
One advantage it has is that it can be attached to an already running process; Bytehound can't do that. (I have ideas how I could do that, and it should be technically doable by dynamically injecting Bytehound's .so into the target process' address space, but so far I haven't needed it so I did not implement it)
libbytehound.so (with extra debug assertions, because I'm too lazy to recompile in release mode): 4s
allocscope: did not finish after 4 minutes (I got bored waiting and CTRL+C'd it)
Edit: now this comment is being downvoted.
With allocscope, you do get a callstack for the allocations which leak, though.
However, as a practical matter those solutions might omit mmap which some applications might use for anonymous allocations.
I've posted some very quick numbers in my comment here comparing it to Bytehound: https://news.ycombinator.com/item?id=34806401
> However, as a practical matter those solutions might omit mmap which some applications might use for anonymous allocations.
Bytehound also gathers mmaps. (:
And as kind of a follow up what is the easiest way to trace all allocations (brk() and mmap) but nothing else?
Absolutely.
> what is the easiest way to trace all allocations (brk() and mmap) but nothing else?
strace -e mmap "$command"
I don't think anything modern still uses the program break but one should know brk and sbrk exist. To see deallocations, add munmap to the filter. Note that these represent operating system allocations: programs usually request huge chunks and then manage that memory in user space in order to avoid system call overhead. In many systems, this memory won't actually count as used unless the process actually touches it and causes page fault.