A little tip, you don't have to compare actual distances, you can compare squared distances just as well. Then in `norm < max_dist`, you don't have to do a `sqrt()` for every `norm`. Saves a few CPU ticks as well.
I once rewrote a GDI+ point transformation routine in pure C# and got 200x speedup just because the routine was riddled with needless virtual constructors, copying type conversions, and something called CreateInstanceSlow. Ten years after, I gathered a few of these anecdotes and wrote the Geometry for Programmers book (https://www.manning.com/books/geometry-for-programmers) with its main message: when you know geometry behind your tools, you can either use them efficiently, or rewrite them completely.
He uses the same data structure in both the Python and Rust code, so I imagine that he can get an extra 4x speedup at least if he rewrites his code with memory layout in mind.
I couldn't get into a this in the article (would be too long), but this is a great point and the original library does this in a lot of places.
One problem in our use case is that the actual structs members are pretty big & that we need to group/regroup them a lot.
The fastest approach for us was to do something like in the article for the initial filtering, then build a hashmap of SoAs with the needed data, and do the heavier math on that.
It's easy to make intuitive guesses about how things are working that seem completely reasonable. But you have to benchmark because modern CPUs are so complex that reasoning and intuition mostly don't work.
Programs used for testing are below. I ran everything twice because my system wasn't always idle, so take the lower of the 2 runs.
[jim@mbp ~]$ sh -x x
+ cat x1.c
#include <stdio.h>
#define NUM 1000000000
struct {
int x;
int y;
} p[NUM];
int main() {
int i,s;
for (i=0; i<NUM; i++) {
p[i].x = i;
p[i].y = i;
}
s=0;
for (i=0; i<NUM; i++) {
s += p[i].x + p[i].y;
}
printf("s=%d\n", s);
}
+ cc -o x1 x1.c
+ ./x1
s=1808348672
real 0m12.078s
user 0m7.319s
sys 0m4.363s
+ ./x1
s=1808348672
real 0m9.415s
user 0m6.677s
sys 0m2.685s
+ cat x2.c
#include <stdio.h>
#define NUM 1000000000
int x[NUM];
int y[NUM];
int main() {
int i,s;
for (i=0; i<NUM; i++) {
x[i] = i;
y[i] = i;
}
s=0;
for (i=0; i<NUM; i++) {
s += x[i] + y[i];
}
printf("s=%d\n", s);
}
+ cc -o x2 x2.c
+ ./x2
s=1808348672
real 0m9.753s
user 0m6.713s
sys 0m2.967s
+ ./x2
s=1808348672
real 0m9.642s
user 0m6.674s
sys 0m2.902s
+ cat x3.c
#include <stdio.h>
#define NUM 1000000000
struct {
int x;
int y;
} p[NUM];
int main() {
int i,s;
for (i=0; i<NUM; i++) {
p[i].x = i;
}
for (i=0; i<NUM; i++) {
p[i].y = i;
}
s=0;
for (i=0; i<NUM; i++) {
s += p[i].x;
}
for (i=0; i<NUM; i++) {
s += p[i].y;
}
printf("s=%d\n", s);
}
+ cc -o x3 x3.c
+ ./x3
s=1808348672
real 0m13.844s
user 0m11.095s
sys 0m2.700s
+ ./x3
s=1808348672
real 0m13.686s
user 0m11.038s
sys 0m2.611s
+ cat x4.c
#include <stdio.h>
#define NUM 1000000000
int x[NUM];
int y[NUM];
int main() {
int i,s;
for (i=0; i<NUM; i++)
x[i] = i;
for (i=0; i<NUM; i++)
y[i] = i;
s=0;
for (i=0; i<NUM; i++)
s += x[i];
for (i=0; i<NUM; i++)
s += y[i];
printf("s=%d\n", s);
}
+ cc -o x4 x4.c
+ ./x4
s=1808348672
real 0m13.530s
user 0m10.851s
sys 0m2.633s
+ ./x4
s=1808348672
real 0m13.489s
user 0m10.856s
sys 0m2.603sSo much scientific computing code suffers between core packages being split away from their core language - at what point do we stop and abandon python for languages which actually make sense? Obviously julia is the big example here, but its interest, development and ecosystem doesn't seem to be growing at a serious pace. Given that the syntax is moderately similar and the performance benefits are often 10x what's stopping people from switching???
Also, few scientific programmers have any notion of what C or Fortran is under the hood. Most are happy to stand on the shoulders of giants and do work with their specialized datasets. Which for the vast majority of researchers are not big data. If the one-time calculation takes 12 seconds instead of 0.1 seconds is not a problem worth optimizing.
The same could be said about CPAN and NPM. Yet Perl is basically dead and JavaScript isn't used for any machine learning tasks as far as I'm aware. WebAssembly did help bring a niche array of audio and video codecs to the ecosystem[1][2], something I'm yet to see from Python.
I don't use Python, but with what little exposure I've had to it at work, its overall sluggish performance and need to set up a dozen virtualenvs -- only to dockerize everything in cursed ways when deploying -- makes me wonder how or why people bother with it at all beyond some 5-line script. Then again, Perl used to be THE glue language in the past and mod_perl was as big as FastAPI, and Perl users would also point out how CPAN was unparalleled in breadth and depth. I wonder if Python will follow a similar fate as Perl. One can hope :-)
During my PhD I was running some simulations using poorly written python code. initially it would take several hours. In that time i could go to the lab, run some wetlab experiments and the results of my simulations would be there when i got back to the office. It was only taking python "home" and building some of my own projects that i learned how to 1. write more pythonic code and 2. write more performant code. Now i work for a software company.
If i'd have stayed in in academia I would probably still be writing quick and dirty code and not worrying about the runtime because as a researcher there is always something else you can be doing.
* PythonCall.jl - https://github.com/cjdoris/PythonCall.jl
* NodeCall.jl - https://github.com/sunoru/NodeCall.j
* RCall.jl - https://github.com/JuliaInterop/RCall.jl
I tend to use Julia for most things and then just dip into another language’s ecosystem if I can’t find something to do the job and it’s too complex to build myself
The use case btw. is often also very different. In most of academia, writing code is basically just a fancy mode of documentation for what is basically a glorified calculator. Readability trumps efficiency by a large margin every time.
If my script takes 3s to run and 5m to write in Python, vs 0.1s to run and 3h to write in C, I finish first with Python. I can try more ideas with Python.
Everything is just a prove of concept and no one expect anything more than that.
centers = np.array([p.center for p in ps]) norm(centers - point, axis=1)
They were just using numpy wrong. You can be slow in any language if you use the tools wrong
though I do feel like i see this a lot with these kinds of "we re-wrote it in rust and everything is fast". comparing to a language with gc options often the scenario
on one hand, i feel like you should just learn how to use your stuff properly. on the other hand it is interesting to see that people who can't write fast code or use libraries properly are actually writing fast code. like fast code for the masses almost hah. though maybe theyll just run into the same issue when they misuse a library in rust
but also I don't see any problem there, I think the python + c++/rust idiom is actually pretty nice. I have a billion libs to choose from on either side. Great usability on the py side, and unbeatable performance on the c++ side
The immature AoT capabilities are a huge pain to deal with when writing large code packages or even when trying to make command line applications. Things have to be recompiled each time the Julia runtime is shut down. The current strategy in the community to get around this seems to be "keep the REPL alive as long as possible" [3][4][5], but this isn't a viable option for all use cases.
Until Julia has better AoT compilation support, it's going to be very difficult to develop large scale programs with it. Version 1.9 has better support for caching compiled code, but I really wish there were better options for AoT compiling small, static, standalone executables and libraries.
[1]: https://julialang.github.io/PackageCompiler.jl/dev/
[2]: https://github.com/tshort/StaticCompiler.jl
[3]: https://discourse.julialang.org/t/ann-the-ion-command-line-f...
[4]: https://discourse.julialang.org/t/extremely-slow-execution-t...
[5]: https://discourse.julialang.org/t/extremely-slow-execution-t...
[6]: https://www.reddit.com/r/Julia/comments/ytegfk/size_of_a_hel...
- the development experience is hampered by the slow start time;
- the ecosystem is quite brittle;
- the promised performances are quite hard to actually reach, profiling only gets you so far;
- the ecosystem is pretty young, and it shows (lack of docs, small community, ...)
> what's stopping people from switching???
All of the mentioned above, inertia, perfect is the enemy of good enough, the alternatives are far away from python ecosystem & community, performances are not often a show blocker.
Its all machine code under the hood. Everything else on top is essentially description of more and more complex patterns of that code. So its a no brainer that a language that lets you describe those complex but repeating patterns in the most direct way is the most popular. When you use python, you are effectively using a framework on top of C to describe what you need, and then if you want to do something specialized for performance, you go back to the core fundamentals and write it in C.
Also read about SIMD instructions like AVX2. They are often used under the hood when possible, but just knowing what they require can help "triggering" them, depending on which language you use. In C++ for example, the compiler really looks for opportunities to use those instructions. You can tell the compiler did it, by looking in the assembly code if any XMM or YMM registers are being used (these are the names of the SIMD registers).
Numpy's tutorial for broadcasting is also a good starting point.
You can look at various tutorials to see how it works. For example: https://jakevdp.github.io/PythonDataScienceHandbook/02.05-co...
> It’s worth noting that converting parts of / everything to vectorized numpy might be possible for this toy library, but will be nearly impossible for the real library while making the code much less readable and modifiable, and the gains are going to be limited (here’s a partially vertorized version, which is faster but far from the results we are going to achieve).
def find_close_polygons(
polygon_subset: List[Polygon], point: np.array, max_dist: float
) -> List[Polygon]:
close_polygons = []
for poly in polygon_subset:
if np.linalg.norm(poly.center - point) < max_dist:
close_polygons.append(poly)
return close_polygons
And after replacing it with this Rust code, it is taking "an avg of 23.44ms per iteration": use pyo3::prelude::*;
use ndarray_linalg::Norm;
use numpy::PyReadonlyArray1;
#[pyfunction]
fn find_close_polygons(
py: Python<'_>,
polygons: Vec<PyObject>,
point: PyReadonlyArray1<f64>,
max_dist: f64,
) -> PyResult<Vec<PyObject>> {
let mut close_polygons = vec![];
let point = point.as_array();
for poly in polygons {
let center = poly
.getattr(py, "center")?
.extract::<PyReadonlyArray1<f64>>(py)?
.as_array()
.to_owned();
if (center - point).norm() < max_dist {
close_polygons.push(poly)
}
}
Ok(close_polygons)
}
Why is the Rust version 13x faster than the Python version? import numpy as np
n_sides = 30
n_polygons = 10000
class Polygon:
def __init__(self, x, y):
self.x = x
self.y = y
self.center = np.array([self.x, self.y]).mean(axis=1)
def find_close_polygons(
polygon_subset: List[Polygon], point: np.array, max_dist: float
) -> List[Polygon]:
close_polygons = []
for poly in polygon_subset:
if np.linalg.norm(poly.center - point) < max_dist:
close_polygons.append(poly)
return close_polygons
polygons = [Polygon(*np.random.rand(2, n_sides)) for _ in range(n_polygons)]
point = np.array([0, 0])
max_dist = 0.5
%timeit find_close_polygons(polygons, point, max_dist)
(I've made up number of sides and number of polygons to get to the same order of magnitude of runtime; also I've pre-computed centers, as they are cached anyway in their code), which on my machine takes about 40ms to run. If we just change the function to: def find_close_polygons(
polygon_subset: List[Polygon], point: np.array, max_dist: float
) -> List[Polygon]:
centers = np.array([polygon.center for polygon in polygon_subset])
mask = np.linalg.norm(centers - point[None], axis=1) < max_dist
return [
polygon
for polygon, is_pass in zip(polygon_subset, mask)
if is_pass
]
then the same computation takes 4ms on my machine.Doing a Python loop of numpy operations is a _bad_ idea... The new code hardly even takes more space than the original one.
(as someone else mentioned in the comments, you can also directly use the sum of the squares rather than `np.linalg.norm` to avoid taking square roots and save a few microseconds more, but well, we're not in that level of optimization here)
https://levelup.gitconnected.com/python-performance-showdown...
Benchmarking methodology in the link is not good. Author should use timeit() or cProfiler or so. 0.01s of difference is mostly due to fluctuation. The order of execution also matters. Say you want to test A and B function, you need actually to run A, B, B, A to see if the ordering brings the different.
for poly in polygon_subset:
if np.linalg.norm(poly.center - point) < max_dist:
close_polygons.append(poly) for poly in polygon_subset:
if np.linalg.norm(poly.center - point) < max_dist:
close_polygons.append(poly)
I don't think the entire feature set of the Python runtime is involved in this.No need to use Golang or Rust from the start, no need for those resources until you absolutely need the speed improvement. Sounds like a dream to a lot of people who find it much easier to develop in Python.
- Some code doesn’t have obvious optimization hotspots, and is instead just generally slow everywhere.
- Most FFI boundaries incur their own performance cost. I’m not sure about Python, but I wouldn’t be surprised if FFI to rust in a hot loop is often slower than just writing the same code in Python directly. And it’s not always easy to refactor to avoid this.
- A lot of programs in languages like Python are slow because the working set size contains a lot of small objects, and the GC struggles. You can optimize code like this by moving large parts of the object graph into rust. But it can become a mess if the objects rust retains then need references to Python objects, in turn.
The optimization described in this blog post is the best case scenario for this sort of thing - the performance hotspot was clear, small, and CPU bound. When you can make optimizations like this you absolutely should. But your mileage may vary when you try this out on your own software.
The more ML I do, the more disappointed I get.
IMO/IME much better to go with a language where you don't have this dichotomy in the first place - e.g. Java or C#.
You don't have to move all the computation, just `for` loops will help alot.
My first attempt in Python was both prohibitively slow and more complicated than necessary, because I tried to use vectorized numpy, where possible.
Since this was only a small standalone script, I rewrote it in Julia in a day. The end result was ca. 100x faster and the code a lot cleaner, because I could just implement the core logic for one tetrahedron and then use Julia's broadcast to apply it to the array of tetrahedrons.
Anyway, Julia's long startup time often prohibits it from being used inside other languages (even though the Python/Julia interoperability is good). On the contrary the Rust/Python interop presented here seems to be pretty great. Another reason I should finally invest the time to learn Rust.
Instead of:
for p in ps: norm(p.center - point)
You should do:
centers = np.array([p.center for p in ps]) norm(centers - point, axis=1)
You’ll get your same speed up in 2 lines without introducing a new dependency
There's also a "v1.5" version which is 6x faster, and uses "vectorizing" (doing more of the work directly in numpy). This version is much harder to optimize further.
[0] https://github.com/ohadravid/poly-matchOn Google colab
import numpy as np
import time
vals = np.random.randn(1000000, 2)
point = np.array([.2, .3])
s = time.time()
for x in vals:
np.linalg.norm(x - point) < 3
a = time.time() - s
s = time.time()
np.linalg.norm(vals - point, axis=1) < 3
b = time.time() - s
print(a / b)
~296x faster, significantly faster than the solution in the article.Having to write in a subset of a language in order for it to perform decently is a big deal. Having no feedback given to the programmer when you deviate from the fast path makes it even harder to learn what the fast path is. The result is not that you get the ease of Python and the speed of C without having to understand much; the result is that you have to be a fairly deep expert in Python and understand the C bindings intimately and learn how to avoid doing what is the natural thing in Python, the Python covered in all the tutorials, or you end up writing your code to run at native Python speeds without even realizing it.
It's a feasible amount of knowledge to have, it's not like it's completely insane, but it's still rather a lot.
My career just brushed this world and I'm glad I bounced off of it. It would drive me insane to walk through this landmine field every day, and then worse, have to try to guide others through it all the while they are pointing at all the "common practices" that are also written by people utterly oblivious to all this.
https://github.com/pyutils/line_profiler
You can literally see the hot spot of your code, then you can grind different algorithms or change the whole architecture to make it faster.
For example replace short for loops to list comprehensions, vectorize all numpy operations (only vectorize partially do not help the issue), using 'not any()' instead or 'all()' for boolean, etc.
Doing this for like 2 weeks, basically you can automatically recognize most bad code patterns at a glance.
Python is much, much easier to learn, and Rust is notoriously difficult. This obviously feeds into the inferiority complex. But -- while I do know there's a number of applications where perf is crucial -- I think it's well worth doing an ego check before moving from what's a lower friction path and gives you access to myriad developers, including ones trained in very hard disciplines.
Also: does it ever make sense to write something like a CRUD+ backend in Rust? Maybe 100X Rustaceans can do it with one hand tied behind their backs; but imagine what these ubermenschen could be achieving in Python?
For now, we're getting away with Python-only optimizations and relying on cPython improvements. Still, some processes take a few minutes and a lot of that is spent just because Python is quite slow. In that sense, yes, it happens for CRUD+ applications (which I consider our CRM+ERP to be).
Obviously the answer is "because I'm faster with Python" but then part of my brain is "well get good idk".
Well... given 2 frameworks equally pleasant to work with, why not using the one with the best performance ? (Whatever performance means... less cpu ? less memory ? better i/o ?).
As a Django user, working on problems where Django shines, I have yet to see such a solution in Rust, but that doesn't mean it won't happen one day.
What Amazon charges for, say a t2.nano must factor in the cost of the hardware, power, cooling, and system adminstration, so it is technically possible to evaluate the running cost of a piece of software.
If I, as the ruthless capitalist that I am pretending to be, made the running cost of the solution an integral part of annual remuneration, through bonus calculation say, there will be a tipping point when that cost has an impact on techology strategy.
I know it sounds remote but a similar scheme has/was implemented in some investment banks, where bonus calculations introduced a vesting schedule and factored in the back office cost of maintaining the trade over its lifetime rather than being based entirely on revenue.
Let's start with a totatally reasonable caculation made by a non technical user , with absolutely no understanding of the problem at hand, and by that I mean completely unfair and arbitrary, so pretty much the real world.
The TechEmpower multiple request benchmarks tell me that Rust is the performance king. Go offers about 2/3 that performance, and python 1/3, that bit doesn't sound too unreasonable, perhaps a little unkind to Go, but I digress; given these relative performance numbers, if the running cost bonus pool is $x, the maximum amount you will receive for a python based solution is 1/3x.
At what point does the financial impact influence our technology strategy in a way that perhaps our otherwise virtuous proseltysing about the need to reduce our carbon footprint doesn't?
FWIW I am continually suprised that Jon Blow, Casey Muratori , Mike Acton et al, aren't beating exactly this drum. It's not that their arguments about writing high performance software are not compelling in their own right, but appealing to peoples otherwise orthogonal concerns / passions seems to be a logical next step.
Sorry to ramble on!!
Is the problem the Oracle involvement? Or is it not that fast as advertised or problems with the ecosystem (C libraries)?
“At this point, the Python runtime is made available for experimentation and curious end-users. “
Take from Rust Algebraic types, ahead of time compilation and strong types, functional features, a borrow / escape checker that automatically turns shared data into Rc or Arc, as necessary, instead of tormenting me to rewrite performance irrelevant code;
Take from Python the simple syntax, default pass by reference of all non-numeric types, simplified string handling, unified slice and array syntax - and any other simplifying feature possible.
... and give me a fast, safe and powerful language that gets out of my way while maximizing the power of the compiler to prevent bugs. Golang was a commendable attempt, but they made '70s design decisions that condemned the language: mandatory garbage collection, nullables, (void*) masquerading as interface{} casts, under-powered compiler etc.
On a side note, python is strictly pass by value. For non-primitives, their references are passed by value.
Thanks!
``` centers = get_centers(polgons) # M x 3 array close_idx = np.where( np.linalg.norm(centers - point, axis=1) < max_dist)[0] close_polygons = polygons[close_idx,:] ```
That's one reason I prefer for to use arrays for polygons, rather then abstract it into a Python object. Fundamentally geometries are sequences of points, and with some zero-padding to account for irregular point counts, you can still keep them in a nice, efficient array representation.
poly = Polygon(vertices)
I would bet you can achieve just as much of a speedup with numba or Cython using this form.Where does this nonsense come from? No. It isn't. It's just a stupid fashion. Something that should be discouraged, not encouraged by trying to make it work when it's obviously broken.
As someone who does work with researchers who do use Python a lot, I see the everyday painful experiences of people who use it. And this pain doesn't need to be there. It's just masochism. And the only real reason is that they don't know any better. The only other thing they know is Matlab, and that's even worse.
Python is just a bad language. Popular, but awful. Ironically, while researchers are supposed to be on the forerfront of discovery and technology... well, they aren't. Industry outpaced research. So much so that today there are government programs to onboard researchers into more automated and more automatically verified way to do research. And we aren't talking about making an elite force here. These programs are meant for people in research who copy data from Excel sheets one data point at a time into another spreadsheet. It's that kind of bad.
My wife happened to work in such a government center, and that's how I know about what's going on inside these programs. And it's very sad that decisions about the preferred tools for research automation are made by people who, unlike most of their peers, had some exposure to what happens in the industry, but had no deeper understanding of the reasons any particular technology ended up in any particular niche, nor any independent ability to assess the capabilities of any particular tech. It's really sad what's going on there.
You couldn't be more wrong. Python is the language going forward. There is a reason why bleeding edge ML stuff is done through Python, as well as it being the backend to several very popular web platforms and is second most use language on github behind JS solely because JS is hard tied to web.
I have a feeling the hate for Python is just comes from paradigms that are taught in extremely poor CS curriculum in schools. If you think that Python is bad because of dynamic typing, you haven't been paying attention to the direction compute is going.
Language going forward... I hadn't had such a laugh in a while... It was dead in the 70s. Long before it was born. Are you under influence or something?
> There is a reason why bleeding edge ML stuff is done through Python
No, there isn't, and bleeding edge ML isn't done in Python. It's done mostly in C++ and more seldom in C. Which are another garbage languages, but that's just the reality we live in. Python is just a tiny fraction of what's being done, the tip of the tip of the iceberg.
And, no Python is not the backend of popular platforms, it is, again, a tiny little bit of what's going on in those "popular platforms", and, unlike you, I actually worked for those "popular platforms", so, would know how that is.
But, even, imagine that in your fairy-tale world Python is somehow so super-important and successful? Didn't I already explain how this is possible while still being a garbage language? -- I bet I did. You just rushed to spit your despair and frustration at me, because I offended something that you like... well, you think that you like, but really, you just don't know any better, just like other people using Python :( And that's really the sad part. You lack critical thinking to be really able to tell the quality of your tools. You substituted the ability to judge the quality of something by looking at what the majority is doing and following the herd.
And, I didn't study when Python became taught in school. In my days the language of choice for intro to CS was Visual Basic :| It's also garbage... and the whole idea that the intro to CS has to be done by learning a random fashionable language of the day is just dumb. It's a misunderstanding of SICP, which to someone who didn't understand what the book is about looks as if the author tried to teach the students Scheme, and later intro to CS classes were modeled on this book, but replaced Scheme with another language, while completely forgetting that Scheme was used in the first place as a language with "little syntax", to avoid spending time on learning the language.
You again are confused when you use the expression "dynamic typing", but you don't even know what that is. But you will never accept it because, again, you don't have the ability to critically think, you follow the herd. You red a "definition" somewhere that says that Python is "dynamically typed language", and you believed this nonsense, even though you never really thought about what that might mean...
One thing we agree on though: the direction the compute is going is to utilize more low-skill programmers, and that means, beside other things, following fads more than doing any sensible work. Python is the current fad, and so, for a while, we are stuck with this garbage. And, the way things are going, at best, I can hope for another equally idiotic language to come to replace it. But, probably not very soon.
But specific fields often have their own, bespoke solutions. I've only ever dealt with math, but it has plenty of its own niche languages that are much better than, say, Sage. My personal choice was Maxyma, but that's because I like Common Lisp.
Furthermore, it's just the situation today. It doesn't mean that this is what it has to be tomorrow. Any of the languages I used in this domain have their issues, and could still be improved. We are nowhere near a place where it's hard to imagine something better than what we have. So, I'd say, if you really want a very good language, you might as well start building one now -- you have a very good chance yours will be the best one so far.
(e.g. it doesn't need to be "as fast as possible", just fast enough to no longer be a workflow bottleneck)
I literally clicked in to read the article to see if they'd mention this:) But... unless I missed it, there wasn't really an answer? I thought numpy does do the heavy lifting in native code, so why is this faster? Does this version just push more of the logic into native code than numpy did?
Just that small change already gives 10x faster performance without ever leaving python/numpy land.
Scipy should have already implemented such thing. Scikit-Learn also. Because KNN clustering is exactly doing this kind of work.
>It’s worth noting that converting parts of / everything to vectorized numpy might be possible for this toy library, but will be nearly impossible for the real library while making the code much less readable and modifiable...
diff --git a/poly_match_v1.py b/poly_match_v1.py
index 675c88a..4293a46 100644
--- a/poly_match_v1.py
+++ b/poly_match_v1.py
@@ -1,4 +1,5 @@
from functools import cached_property
+from itertools import compress
from typing import List, Tuple
import numpy as np
from dataclasses import dataclass
@@ -56,11 +57,8 @@ def generate_example() -> Tuple[List[Polygon], List[np.array]]:
def find_close_polygons(
polygon_subset: List[Polygon], point: np.array, max_dist: float
) -> List[Polygon]:
- close_polygons = []
- for poly in polygon_subset:
- if np.linalg.norm(poly.center - point) < max_dist:
- close_polygons.append(poly)
-
+ dist = np.linalg.norm([poly.center for poly in polygon_subset] - point, axis=1)
+ close_polygons = list(compress(polygon_subset, dist < max_dist))
return close_polygons
10x faster than the original, without resorting to native code, and without substantial increase in complexity of code base.From a maintenance view, I much prefer folks write vectorized data frames vs numpy or low level bindings, but that comes from having lived with the alternative for a lot longer. All of our exceptions are pretty much slotted for deletion. (Our fast path is single or multi-GPU dataframes in python.) Here's to hoping that one day we'll have dependent types in mypy!
That was a lot of my curiosity, because I'm not quite a good enough programmer to know whether the problem is just that the original code is bad or does something super inefficient, and so rewriting it any language will let you make massive improvements.
Piqued my curiosity as my background almost lines up, and I'm interested in that kind of role... But I live near one of their offices only hiring for skill sets I don't really have.
This sort of hybrid is super common, you typically spend 90%+ of your time in computationally intensive problems in a very small subset of your code, typically the innermost loops. Optimizing those will have very good pay-off.
Traditionally we'd do this with high level stuff in one language and then assembly for the performance critical parts, these days it is more likely a combination of a scripting language for the high level part and a compiled language for the low level parts (C, rust, whatever). Java and such as less suitable for such optimization purposes, both because they come with a huge runtime and because they are hard to interface to other languages unless they happen to use the same underlying VM, but then there usually isn't much performance gain.
Another nice way to optimize computationally intensive code is by finding out if the code is suitable for adaptation to the GPU, which can give you many orders of magnitude speed improvement in some cases.
i wasnt able to see the numba comparison. anyone know how much worse it was ?
If you’re going to benchmark scripts or executables, use hyperfine.
"It is big and complex and very business critical and highly algorithmic, so that would take ~months of work, ..."
That's doing spatial data processing by exaustive search, which is inherently slow. There are better algorithms. If the number of items to be searched is large, the spatial indices of MySQL could help.
You need more experienced Python engineer.
Hasn't he heard of ctypes? You can wrap C structs add Python objects since forever.
I assume, haven’t really messed with numpy for anything but I can’t imagine it wouldn’t work that way.