Show HN: Py-spy – A new sampling profiler for Python programs (opens in new tab)

(github.com)

286 pointsbenfrederickson7y ago43 comments

43 comments

Many thanks for building and releasing this. It's ridiculously easy to install (especially in virtualenvs) and very powerful. When I push '3' or '4', I get informative, stable output.

Minor feature request: an explicit 'pause' button would make it easier to copy file paths from the output. Ctrl-S is a reasonable alternative, but it's a little hacky.

Also, it would be nice to somehow eliminate time spent in poll() from the results. I'm profiling a server process and 99.9% of the time is spent in a poll function. Perhaps there could be an option to disregard time spent in system calls rather than user code. Most of the time I'm interested in profiling only user code.

(Actually, I've been looking for a reason to play with Rust code. Maybe I'll try to add these features myself!)

benfredericksonOP7y ago

Thanks! Both of your suggestions totally make sense. I've created an issue to track the poll() issue here https://github.com/benfred/py-spy/issues/13 - I think that should be an easy fix.

toxik7y ago

I think a more robust solution might be to have counters for in-Python samples, outside-Python, and in-syscall.

gnufx7y ago

The adaptations of HPC-type performance tools to Python and called non-Python, specifically parallel, libraries might be of interest:

TAU: https://www.cs.uoregon.edu/research/tau/docs/newguide/ch03s0... Extrae/Paraver: https://www.researchgate.net/publication/317485375_Performan... llel_Python_Applications Score-p/Scalasca: http://score-p.org https://github.com/score-p/scorep_binding_python

maxmcd7y ago

The localhost talk for rbspy (the inspiration for this project) is awesome: https://www.recurse.com/events/localhost-julia-evans

Imagine it should also provide relevant insights into the structure of this tool

TTPrograms7y ago

This is really fantastic. I just managed to find a 3x speedup on a compute heavy job I run using Py-spy - there was an unneeded hotspot in a library I'm using that I didn't previously suspect. It would have taken a long time using kernprof to dig in through the call stack to find the issue.

jpeanuts7y ago

This is a really great tool - the kind I didn't even know that I needed until I was given it!

One question, does anyone here know how to interpret the GIL (Global Interpreter Lock) percentage display in the top-left? In my code, "Active" sticks nicely at 100%, but the GIL jumps around from 1% to 100%, changing on every sample.

edit: Now that I think about it, my code spends a lot of time in C API calls - maybe the GIL is released there?

toxik7y ago

Wow, it tracks GIL contention? Major feature for me. I use a lot of Numba, and it releases the GIL if you want — so threading is actually useful.

lathiat7y ago

I've been enjoying pyflame from Uber - which the author quotes in their info

> The only other Python profiler that runs totally in a separate process is pyflame, which profiles remote python processes by using the ptrace system call. While pyflame is a great project, it doesn't support Python 3.7 yet and doesn't work on OSX or Windows.') > Py-spy works by directly reading the memory of the python program using the process_vm_readv system call on Linux, the vm_read call on OSX or the ReadProcessMemory call on Windows.

I think ptrace is fundamentally letting you do the same thing in terms of how pyflame is using it.. and the same ptrace access permission governs whether you can use process_vm_readv

The real win for this project is a real-time "top" or "perf top" style UI instead of only generating flamegraph output. I love that feature, and will be particularly good for quick shot "what is this process doing" type info as opposed to specifically profiling some timeframe to analyse the resulting flamegraph (which is all pyflame let you do)

Nice work!

samstave7y ago

"top" for python programs. Thats pretty awesome - not sure if this has existed in other traces, but the output is great.

actuallyalys7y ago

Rbspy, the inspiration for this tool, has a similar default output, which you can see in the documentation, https://rbspy.github.io/.

_verandaguy7y ago

If you're talking about the flame graphs, they're a fairly common feature of modern profilers. The oldest implementation I know of is at https://github.com/brendangregg/FlameGraph.

kevin_thibedeau7y ago

This is the same script. The Rust code is just invoking Perl to generate it.

gnufx7y ago

I've never understood why flame graphs are better than the normal presentation of inclusive and exclusive timings in performance tools, even if they're not "modern", but embody some decades' experience. Anyone care to explain?

andreareina7y ago

Flame graphs have extra information in the form of seeing the caller and callees. That, and I find the graphical presentation to be easier to scan.

1 more reply

detaro7y ago

I'm far from a performance expert, but my impression is:

It shows the call paths to the functions and what part each path took, that's not so obvious from the typical table. On the other hand, finding functions that are called quite a lot all over the place and add up is easier in the table, so it's not become useless.

2 more replies

SketchySeaBeast7y ago

Oh that's great - I had it up and running in 30 seconds.

fake-name7y ago

Does it support python multiprocessing?

Basically nothing out there does that I've found, and it's a really major pain-point for me.

marmaduke7y ago

yappi?

https://bitbucket.org/sumerc/yappi/

fake-name7y ago

> If you want to profile a multi-threaded application, you must give an entry point to these profilers and then maybe merge the outputs.

It basically boils down to (currently) doing multiprocessing profiling is a giant pain in the ass, you have to manually attach the profiler yourself if you ever launch another process, and every profiled process produces it's own output file.

It's not impossible, it's just very annoying. I've been vaguely meaning to write a thing which attaches to the fork() call and automatically starts the profiler in the child-process, and handles aggregating all the results back to a single output when all children exit.

wrmsr7y ago

As a heads-up if you hadn't already seen it 3.7 added fork callbacks for stuff like this - https://docs.python.org/3/library/os.html#os.register_at_for... - much nicer than patching-and-praying.

1 more reply

gnufx7y ago

Multi-process (<~1M) profiling is obviously bread and butter in the HPC world. That's what the tools I referenced are for primarily. The more recent Python targeting may not be so solid, especially if there's no good launch framework to hook into, which would be a good reason for using MPI.

zzzeek7y ago

It says you pass it a pid, so yes

craftyguy7y ago

This is great!

I also took the liberty to add this (and setuptools_rust) to Arch Linux's AUR: https://aur.archlinux.org/packages/python-py-spy

kawsper7y ago

I would love something like that for Ruby :)

I am envious, well done!

benfredericksonOP7y ago

Check out rbspy https://github.com/rbspy/rbspy (rbspy was the inspiration for this project =)

TeMPOraL7y ago

Hah, and I'd love something like this for Common Lisp.

SBCL's sampling profiler is pretty finicky, and I haven't figured out yet how to use Linux-level profiling tools to get something useful out of a CL image.

So I second the OP, well done!

pvg7y ago

Wonderful. Can the data it produces be munged into something KCachegrind can show?

benfredericksonOP7y ago

Not yet - but I'm hoping to have a version that supports this next week. Will update this issue when it's done: https://github.com/benfred/py-spy/issues/3

pvg7y ago

Great, thanks!

And as a slightly different take than that of the person posting the issue - interfaces like kcachegrind are a pretty clunky (if powerful, in their clunky way) - the profiler coming with some built-in presentation and reporting of its own like the flamegraph and the realtime display is a big win and a serious deficiency in most python profilers.

nevon7y ago

Does anyone know if there is something like this for Nodejs? Of course you can enable profiling, but it would be nice to be able to look at running processes.

bobwaycott7y ago

Gave it a try, and wasn’t expecting to have to execute via sudo on macOS. I almost never use sudo, so this stands out as quite unexpected for a developer tool. Is there something off with my system, or are sudo privileges always going to be required on macOS?

Edit: I pip installed it into a virtualenv, if that matters.

MetricMike7y ago

https://github.com/benfred/py-spy#when-do-you-need-to-run-as...

tl;dr - yes, it's a limitation(?) of macOS syscalls.

AdamM127y ago

Dangit. I was thinking of writing an AST inspector name pyspy.

ant6n7y ago

why the dash, py-spy vs rbspy?

benfredericksonOP7y ago

the name pyspy was taken on pypi already : https://github.com/tdfischer/pyspy =)

A2017U17y ago

Just a guess but remove the dash and read it out loud.

ant6n7y ago

yesenadam7y ago

It would rhyme with crispy.

j / k navigate · click thread line to collapse

43 comments

hathawsh7y ago

Many thanks for building and releasing this. It's ridiculously easy to install (especially in virtualenvs) and very powerful. When I push '3' or '4', I get informative, stable output.

Minor feature request: an explicit 'pause' button would make it easier to copy file paths from the output. Ctrl-S is a reasonable alternative, but it's a little hacky.

(Actually, I've been looking for a reason to play with Rust code. Maybe I'll try to add these features myself!)

benfredericksonOP7y ago

Thanks! Both of your suggestions totally make sense. I've created an issue to track the poll() issue here https://github.com/benfred/py-spy/issues/13 - I think that should be an easy fix.

toxik7y ago

I think a more robust solution might be to have counters for in-Python samples, outside-Python, and in-syscall.

gnufx7y ago

The adaptations of HPC-type performance tools to Python and called non-Python, specifically parallel, libraries might be of interest:

maxmcd7y ago

The localhost talk for rbspy (the inspiration for this project) is awesome: https://www.recurse.com/events/localhost-julia-evans

Imagine it should also provide relevant insights into the structure of this tool

TTPrograms7y ago

jpeanuts7y ago

This is a really great tool - the kind I didn't even know that I needed until I was given it!

edit: Now that I think about it, my code spends a lot of time in C API calls - maybe the GIL is released there?

toxik7y ago

Wow, it tracks GIL contention? Major feature for me. I use a lot of Numba, and it releases the GIL if you want — so threading is actually useful.

lathiat7y ago

I've been enjoying pyflame from Uber - which the author quotes in their info

I think ptrace is fundamentally letting you do the same thing in terms of how pyflame is using it.. and the same ptrace access permission governs whether you can use process_vm_readv

Nice work!

samstave7y ago

"top" for python programs. Thats pretty awesome - not sure if this has existed in other traces, but the output is great.

actuallyalys7y ago

Rbspy, the inspiration for this tool, has a similar default output, which you can see in the documentation, https://rbspy.github.io/.

_verandaguy7y ago

If you're talking about the flame graphs, they're a fairly common feature of modern profilers. The oldest implementation I know of is at https://github.com/brendangregg/FlameGraph.

kevin_thibedeau7y ago

This is the same script. The Rust code is just invoking Perl to generate it.

gnufx7y ago

andreareina7y ago

Flame graphs have extra information in the form of seeing the caller and callees. That, and I find the graphical presentation to be easier to scan.

1 more reply

detaro7y ago

I'm far from a performance expert, but my impression is:

2 more replies

SketchySeaBeast7y ago

Oh that's great - I had it up and running in 30 seconds.

fake-name7y ago

Does it support python multiprocessing?

Basically nothing out there does that I've found, and it's a really major pain-point for me.

marmaduke7y ago

yappi?

https://bitbucket.org/sumerc/yappi/

fake-name7y ago

> If you want to profile a multi-threaded application, you must give an entry point to these profilers and then maybe merge the outputs.

wrmsr7y ago

As a heads-up if you hadn't already seen it 3.7 added fork callbacks for stuff like this - https://docs.python.org/3/library/os.html#os.register_at_for... - much nicer than patching-and-praying.

1 more reply

gnufx7y ago

zzzeek7y ago

It says you pass it a pid, so yes

craftyguy7y ago

This is great!

I also took the liberty to add this (and setuptools_rust) to Arch Linux's AUR: https://aur.archlinux.org/packages/python-py-spy

kawsper7y ago

I would love something like that for Ruby :)

I am envious, well done!

benfredericksonOP7y ago

Check out rbspy https://github.com/rbspy/rbspy (rbspy was the inspiration for this project =)

TeMPOraL7y ago

Hah, and I'd love something like this for Common Lisp.

SBCL's sampling profiler is pretty finicky, and I haven't figured out yet how to use Linux-level profiling tools to get something useful out of a CL image.

So I second the OP, well done!

pvg7y ago

Wonderful. Can the data it produces be munged into something KCachegrind can show?

benfredericksonOP7y ago

Not yet - but I'm hoping to have a version that supports this next week. Will update this issue when it's done: https://github.com/benfred/py-spy/issues/3

pvg7y ago

Great, thanks!

nevon7y ago

Does anyone know if there is something like this for Nodejs? Of course you can enable profiling, but it would be nice to be able to look at running processes.

bobwaycott7y ago

Edit: I pip installed it into a virtualenv, if that matters.

MetricMike7y ago

https://github.com/benfred/py-spy#when-do-you-need-to-run-as...

tl;dr - yes, it's a limitation(?) of macOS syscalls.

AdamM127y ago

Dangit. I was thinking of writing an AST inspector name pyspy.

ant6n7y ago

why the dash, py-spy vs rbspy?

benfredericksonOP7y ago

the name pyspy was taken on pypi already : https://github.com/tdfischer/pyspy =)

A2017U17y ago

Just a guess but remove the dash and read it out loud.

ant6n7y ago

yesenadam7y ago

It would rhyme with crispy.

j / k navigate · click thread line to collapse