But Nim is only one of a whole suite of languages that easily cruise to a 10x performance win over Python. And that isn't counting multicore - if you count that you quickly get to a 100x improvement.
Personally I use Groovy for much of what I do for similar reasons (which is somewhat unusual) but its just a placeholder for "use anything except python".
From my experience in using Python at my last job, I'll also add that Python is decent at tasks that aren't CPU-bound.
I wrote a lot of scripts that polled large amounts of network devices for information and then did something with it (typically upsert the data into a database, either via direct SQL or a REST API to whatever service owns the database). All these tasks were heavily network-bound. The amount of time the CPU was doing any work was minuscule compared to the amount of time it was waiting to get data back from the network. I doubt Nim or any other language would have been a significant performance improvement in this case.
For what it's worth, that made these scripts excellent candidates for multithreading. I'd run them with 20+ threads, and it was glorious. At first I did multiprocessing, because of all the GIL horror stories, but multiprocessing made it very difficult to cache data, so eventually I said "well, all this is network-bound so the GIL doesn't even apply" and switched over to multiprocessing.dummy (which implements pools using the same API as multiprocessing but with threads instead of processes), and I never looked back.
Edit: For what it's worth, Nim sounds like a really cool language, and it's right up my alley in several ways, I just don't think Python is particularly slow at network-bound tasks that use very little CPU.
And suddenly you need to introduce quite a bit more technical complexity into this story that‘s gonna be hard to explain to management - all they see is that you now can insert a couple of millions of DB rows and their Big Data consultants[TM] told them that this is nowadays not even worth thinking about.
Point being: If your performance ceiling is low, you‘re gonna hit it sooner.
IO-bound tasks are almost by definition outside of your Python application's control. You yield control to the system to execute the actual task, and from that point on - you're no longer in control of how long the task will take to complete.
In other words, Python "being fast" by waiting on a Socket to complete receiving data isn't a particularily impressive feat.
But as demonstrated, Nim is fast to write and fast to compile, so Python has little edge. Just it's huge ecosystem.
E.g. random example:
Sprinkle some cdef's in your python and suddenly you're faster than c++
https://github.com/luizsol/PrimesResult
https://github.com/PlummersSoftwareLLC/Primes/blob/drag-race...
25.8 seconds down to 1.5
Still, getting Java level performance out of python is a huge improvement and should be enough for most cases.
Some may consider Jax, and its XLA compiler, but unless you require gradients, numba will be significantly faster, an instance of this is available here [1].
XLA runs on a higher level than LLVM and therefore can't achieve the same optimizations as numba does using the latter. IIRC numba also has a Python to Cuda compiler, which is also very impressive.
[1] https://github.com/scikit-hep/iminuit/blob/develop/doc/tutor...
CPython's slowness doesn't boggle my mind at all. It's a bytecode interpreter for an incredibly dynamic language that states simplicity of implementation as a goal. I would say performance is actually pretty impressive considering all that. What _does_ boggle my mind is the performance of cutting-edge optimizing compilers like LLVM and V8!
At least there is a benefit to a simple implementation: Someone like me can dive into CPython's source and find out how things work.
No, Nim is truly among the top fastest languages when writing idiomatic code as shown in many benchmarks.
> But Nim is only one of a whole suite of languages that easily cruise to a 10x performance win over Python
...while also being very friendly to Python programmers, intuitive and expressive. Unlike many other languages.
Granted, but inside its optimised numerical science ecosystem, Python is, in fact, fast enough. If most of your program is calls into numpy, Python will get you where you need to go. In my experience, one scalar Python math operation takes about the same amount of time as the equivalent numpy operation on a million-element array. Linked against a recent libblas, numpy will even distribute work across multiple cores. So much for the GIL.
Also, "awful" is too harsh. Probably 90% of Python code just doesn't need to be faster than it is.
Also I don't know how anyone could design a language in the 21st century and make basic mistakes like this:
> Nim treats identifiers as equal if they are the same after removing capitalization (except for the first letter) and underscore, which means that you can use whichever style you want.
If that's any indication of the sanity of the rest of Nim then I'd say steer well clear!
Nim's underlying, perhaps understated philosophy is that it lets you write code the way you want to write code. If you like snake case, use it. If you want camel case, sure. Write your code base how you want to write it, keep it internally consistent if you want, or don't. Nim doesn't really care.
(That philosophy extends far beyond naming conventions.)
What this avoids is being stuck with antiquated standard libraries that continue to do things contrary to the language's standards for the sake of backward compatibility (arg Python!) and 3rd party libraries where someone chose a different standard because that's their preference (arg Python! JavaScript! Literally every language!). Now you're stuck with screaming linters or random `# noqa` lines stuffed in your code, and that one variable that you're using from a library sticks out like a sore thumb.
Your code is inconsistent because someone else's code was inconsistent - that's simply not a problem in Nim.
Could Nim have forced everyone to snake_case naming structures for everything from the start? Well, sure, but then the people that have never actually written code in Nim would be whining about that convention instead and we'd be in the same place. After having actually used Nim, my opinion, and I would venture to say the opinion of most, is that its identity rules were a good decision for the developers who actually write Nim code.
Not entirely. Nim‘s benefit here is that it’s superficially similar enough to Python that’s it’s easy for people from that world to pickup and start using Nim.
> Also I don't know how anyone could design a language in the 21st century and make basic mistakes like this: > If that's any indication of the sanity of the rest of Nim then I'd say steer well clear!
It may seem like a design mistake at first glance but it’s surprisingly useful. It’s intent is to allow a given codebase to maintain a consistent style (eg camel vs snake) even when making use of upstream libraries that use different styles. Not including the first letter avoids most of the annoyance of wantonly mixing all cap constants or lower case and linters avoid teams mismatching internal styles. Though mostly I forgot it’s there as most idiomatic Nim code sticks with camel case. I’d say not to knock it until you’ve tried it.
The rest of Nim’s design avoids many issues I consider actual blunders in a modern language such as Python’s treatment of if/else as statements rather than as expressions, and then adding things like the walrus operator etc to compensate.
With respect to the identifier resolution in Nim, it strikes me as more of a matter of preference. Especially given the universal function call syntax in Nim, at least it's consistent. For example, Nim treats "ATGCA".lowerCase() the same as lowercase("ATGCA"). I do appreciate the fact that you can use a chaining syntax instead of a nesting one when doing multiple function calls but this is also a matter of style more than substance.
[1] https://github.com/Benjamin-Lee/viroiddb/blob/main/scripts/c...
One of the big, big things for improving performance on DNA analysis of ANY kind is converting these large text files into binary (4 letters easily converts to 2 bit encoding) and massively improves basically any analysis you’re trying to do.
Not only does it compress your dataset (2 bits vs 16 bits), it allows absurdly faster numerical libraries to be used in lieu of string methods.
There’s no real point in showing off that a compiled language is faster at doing something the slow way…
[1] https://github.com/biocore/scikit-bio/blob/b470a55a8dfd054ae...
[2] https://en.wikipedia.org/wiki/Nucleic_acid_notation
[3] https://bioinformatics.stackexchange.com/questions/225/upper...
I’m surprised you need the full 4 bits to deal with ambiguous bases, but it probably makes sense at some lower level I don’t understand.
(As in GATTACA might be read as is, but might be read as GAT?ACA.)
Still that's a minimal of 3 bits versus much longer.
[Edit : i see another commenter with the same observation, more thoroughly explained! ]
Because we use it as a nice syntactic frontend to numpy, a large and highly optimized library written in C++ and Fortran (sic). That is, we actually don't use "Python-native" code much, and numpy is essentially APL-like array-oriented thing where e.g. you don't normally need loops.
For native-language data processing, Python is slow; Nim or Julia would easily outperform it, while being comparably ergonomic.
The funny thing is that Nim and Julia libraries are still wrapping Fortran numerical library while D has beaten the old and trusted Fortran library in its home turf five years back:
http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/...
You say that, but Julia is rapidly acquiring native numerical libraries that outperform OpenBLAS:
https://discourse.julialang.org/t/realistically-how-close-is...
For Nim, there’s also NimTorch which is interesting in that it builds on Nim’s C++ target to generate native PyTorch code. Even Python is technically a second class citizen for the C++ code. Most ML libraries are C++ all the way down.
https://github.com/YingboMa/RecursiveFactorization.jl/pull/2...
So a stiff ODE solve is pure Julia, LU-factorizations and all. This is what allows it to outperform the common C and Fortran libraries very consistently. See https://benchmarks.sciml.ai/html/MultiLanguage/wrapper_packa... and https://benchmarks.sciml.ai/html/Bio/BCR.html
https://news.ycombinator.com/item?id=28506531 - project allows creating pythonic bindings for your nim libraries pretty easily, which can be useful if you still want to write most of your toplevel code in python, but leverage nim's speed when it matters.
If you want to make your nim code even more "pythonic" there is a https://github.com/Yardanico/nimpylib, and for calling some python code from nim there is a https://github.com/yglukhov/nimpy
However, in any case I would never replace Python with Nim as it is too niche of a language and you would struggle with recruiting. I could consider Julia if it's popularity keeps growing.
That is the ultimate challenge of a language. It either needs a large backer (Go and Google) or be so good, it gets a natural market adaptation(Julia). As a manager I am reluctant to adapt yet another language unless there is a healthy job market for it.
Not all technologies require the full cycle and the normal risk management.
with open("orthocoronavirinae.fasta") as f:
text = ''.join((line.rstrip() for line in f.readlines() if not line.startswith('>')))
gc = text.count('G') + text.count('C')
total = len(text)
Or if you want to be explicit, this is just as fast (and might scale better for particularly long genomes): gc = 0
total = 0
with open("orthocoronavirinae.fasta") as f:
for line in f.readlines():
if not line.startswith('>'):
line = line.rstrip()
gc += line.count('C') + line.count('G')
total += len(line)
I didn't test Nim but the author reports Nim is 30x faster than his Python implementation, so mine would be about 3x slower than his Nim.Yes, you can implement a faster Python version, but notice also:
* This faster version is reading all the file into memory (except comment lines). The article mentions the data being 150MB, which should fit in memory, but for larger datasets, this approach would be unfeasible
* The faster version is actually delegating a lot of work to Python's C internals by using text.count('G'). All the internal looping and comparisons is done in C, while on the original version, goes through Python
So yes, you can definitely write faster Python by delegating most of the work to C.
The point of the article is not about how to optimize Python, but about how given almost identical implementations in Python and Nim, Nim can outperform Python by 1 or 2 orders of magnitude without resorting to use C internals for basic things like looping or comparing characters.
To make it streaming, take the second version and remove the readlines (directly iterate over f).
Delegating work to Python's C internals is fine IMO because "batteries included" is a key feature of Python. "Nim outperforms unidiomatic Python that deliberately ignores key language features" is perhaps true, but less flashy of a headline.
And to be honest, I mainly wrote this because the other top level Python implementations for this one were terrible at the time of the post.
import io
f = io.StringIO(
"""
AB
CD
EF
GH
"""
)
total = sum(map(lambda s: 0 if s[0]==">" else s.count('G') + s.count('C'), f.readlines()))
print(total)Your first example takes 3.1 seconds, my previous comment takes 2.3 seconds, this one takes 1.4 seconds.
start = time.perf_counter()
with open("orthocoronavirinae.fasta", "rb") as f:
total = sum(map(lambda s: 0 if s[0]==65 else s.count(b"G") + s.count(b"C"), f.readlines()))
end = time.perf_counter()
print(total, " total")
print(end-start, " seconds")In my use case, I don't really see how Nim would make my life easier right now.
The main places you find it the other way are spreadsheets and shells.
Is there an explanation from the Nim authors as to why they made such an odd choice?
The answer for the latter is programmer time, and some things can be scaled easily using `joblib`, or `dask`. Now, it isn't as trivial as importing parallel iterators with rust and changing `.into_iter` to `.into_par_iter`, but still needs less time, and once it is done, I don't need to think about it again.
I don't write code only for myself.
How would I convince my employer to let me use Nim instead of a better known language?
And even I would convince my employer, if we want to start a new project how could we find programmers well-versed in Nim?
And even id we can find those people, it would mean we would have to write many things ourselves, which in other languages we can take for granted as they have libraries for almost anything.
So having a nice, performant and good language is just a small part of achieving your goals. You also need the people and the ecosystem.
Go, Rust, Kotlin, Swift and even Julia have the luck of having some industry heavyweights behind them, pushing the ecosystem and contributing with money and developers. Nim has only a bunch of passionate people behind it.
If a programmer can't pick up a language like Nim in a few weekends (from what I gather, it's similar to Python and not much different from most common languages, i.e. not something relatively exotic like Haskell) then I don't know. Our mainly PHP shop transitioned to Go quite effortlessly. Today we hire PHP juniors without any Go experience (easier to find), we teach them, and then they work on Go codebases already after a month of internship. So lack of "professional Nim programmers" doesn't look like a problem to me.
Lack of libraries is a good point but from what I read, Nim compiles to C, so I understand they can have access to tens (hundreds?) of thousands C libraries without writing everything from scratch.
However, indeed, if you are to choose between, for example, Nim and Go for a new project, then I am not sure why would anyone prefer Nim. I'm really interested to know.
Same here, curious to know what HN crowd recommends between Nim vs Go for new projects.
Hiring for Nim skills can be a signal that a company has people who learn languages beyond the run-of-the-mill ones. A bunch of passionate people you might say. That would make the company promising to work for.
Why the phrase "only a bunch of passionate people"? This is how software gets written, parasitical corporations and their unproductive developers who are installed in existing OSS projects come later and mainly associate themselves with the result (speaking of Python again).
Nim's easy to learn if you have any experience with any compiled language and can understand anything along the line of C#, Kotlin or Python syntax. Also because it compiles to C and JS it makes it easy to add it to a project incrementally in many cases.
This is a rephrasing of "nobody ever got fired for buying IBM".
Some organization prioritize innovation and technical acumen.
> So having a nice, performant and good language is just a small part of achieving your goals. You also need the people and the ecosystem.
Many applications don't need a large ecosystem. People can learn.
> Go, Rust, Kotlin, Swift and even Julia have the luck of having some industry heavyweights behind them
Python was never corporate-driven, thankfully, and it is successful.
That's horrifyingly slow for a compiler. The author mentioned "modern languages look like Python but run as fast as C", which is a common promise those languages make that never really materialize except for a few very happy path cases they heavily optmised the language for. Julia, for example, makes this promise too, but compiles even slower than that and takes ridiculous amounts of RAM even for hello world.
Did the author post the data set they used for the examples? Would be nice to try it out on a few languages to see how fast that can compile and run on a mature language like Common Lisp (which is just as easy to write) or even node.js.
Nim's advantage is that it uses a good old C compiler for the backend (which has been hyperoptimized for decades), but the frontend (transpiler) is also pretty fast. Nim's compilation speed should improve a bit when incremental compilation support is added (which would probably solve a lot of other current issues for Nim, for example better IDE tooling)
[1] https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType...
Here's a comparison with Common Lisp:
~/fasta-dna $ time python3 run.py
0.3797277865097147
21.828 secs
~/fasta-dna $ time sbcl --script run.lisp
0.37972778
2.415 secs
~/fasta-dna $ ls -al nc_045512.2.fasta
-rw-r--r-- 1 156095639 2021-09-25 11:15 nc_045512.2.fasta
So, almost as fast as Nim (the time includes compilation time)?
Here's the Common Lisp code:
(with-open-file (in "nc_045512.2.fasta")
(loop for line = (read-line in nil)
while line
with gc = 0 with total = 0 do
(unless (eql (aref line 0) #\>)
(loop for i from 0 below (length line)
for ch = (char line i) do
(setf total (1+ total))
(when (or (eql ch #\C) (eql ch #\G))
(setf gc (1+ gc)))))
finally (format t "~f~%" (/ gc total))))
With a top-level function and some type declarations it could run even faster, I think.EDIT: compiling the Lisp code to FASL and annotating the types brings the total runtime to 2.0 seconds. Running it from source increases the time very slightly, to 2.08 seconds, showing how the SBCL compiler is incredibly fast. Taking 0.7 seconds to compile a few lines of code is crazy, imagine when your project grows to many thousands of lines.
The Lisp code still can't really match Nim, which is really C at runtime, in speed when excluding compile-time, but if you need a scripting language, CL is great (specially when used with the REPL and SLIME).
Last time I used it, I liked it but didn't use it long enough to have a strong opinion.
It's a compromise, but I always prioritise _my_ time over my computers time, so if I can write something quickly and just go and get a coffee while it runs - I will do that. I won't spend twice as long writing a single-run script just because it'll finish before the kettle has boiled.
Static types help for basic data munging when you haven’t used a script for months to get up to speed and make tweaks.
It’s a shame because I think Nim has some neat features that allow it to present as a serious competitor to Rust but it will ultimately have to compete against Python instead to secure its niche.
So is Golang.
My point, which apparently it wasn't evident enough, is that you can get the most of the benefits by doing nothing, just trying a different Python implementation, without the hassle of learning a niche language, as easy as it might be.
BTW, if you take into account compilation times the difference is even meager, and in all fairness the PyPy warmup period should have had to be discounted.
This.
The general guideline has always been that Python is ideal for glue code and non-performance-critical code, and when performance became an issue then Python code would simply be used as glue code to invoke specialized libraries. Perhaps the most popular example of this approach is bumpy, which uses BLAS and LAPACK internally to handle linear algebra stuff.
This Nim advertisement sounds awfully desperate with the way it resorts to what feels like a poorly assembled strawman, while giving absolutely nothing in return.
Python has never been one of my favorite languages, but easy support in Google Colab, AWS SageMaker, etc. as well as most of my professional deep learning work using TensorFLow + Keras, it makes Python a go-to language for me. If you want a Lisp syntax on top of Python, you can try Hy (and get a free copy of my Hy book at https://leanpub.com/hy-lisp-python by setting the price to $0.00).
That said, for unpaid experiments I like Julia + Flux, which also solves the author's preference to avoid slow programming languages. Julia is really a nice language but no one has ever paid me to use it.
"Benchmarking programming languages/implementations for common tasks in Bioinformatics"
https://github.com/lh3/biofast#fqcnt
https://lh3.github.io/2020/05/17/fast-high-level-programming...
When you write C++, you kind of cheat because even code with high computational complexity is pretty fast. Whereas the equivalent code in Python will be awfully slow.
So, while it's true that Python requires less development time, this statement can't be used generally. I have spent hours optimizing Python code when in C++ I would have just moved on to my next task.
If Nim had cloud SDKs I would use it as my default language for pretty much everything.
in cases where what you want to do doesn't exactly fit standard operations, cython can be pretty nice. e.g. 200x -- 1000x speedups for translating C-oriented number crunching code from python to cython. but if you do want performance, you have to think about it while writing the code (avoid needlessly allocating memory in tight loops, data-oriented programming with simple arrays, statically type all of your variables, ...).
If I were writing something from scratch that dealt with data, I would probably use Nim though. It's super easy to write something fast in and is more pleasant than pretty much any other compiled language.
lines = (line for line in lines("orthocoronavirinae.fasta") if not line.startswith(">"))
gc_lines = (1 if ('G' in line or 'C' in line) else 0 for line in lines)
gc = sum(gc_lines)
total = len(list(gc_lines))
# Alternatively, a more "memory efficient" total would be:
total = sum(1 for _ in lines)
Edit: my code is not perfect (I’m typing from my phone, I’m surprised I could even match parentheses).My point is: this is a highly I/O bound program. The implementation matters. With the correct implementation there shouldn’t be much difference between the languages.
That won't work properly; you've already exhausted the gc_lines generator in the previous line.
From what I gather, the author is a researcher in bioinformatics related field. This may indicate that they tend to work either alone or in a relatively small group. The domain is small scope data processing/manipulation, research/exploratory code, ,likely short-lived or even one-off.
The progress in this context will possibly be governed by sheer processing speed (e.g. it’s unlikely anyone will delve deep into the code, a lot of iterations to ‘just get it done’ instead of testing etc.).
If this is more or less correct, the point that Nim might be more useful than Python for the author sounds very sensible to me. It’s a nice spot between command line tools and more functionality-loaded languages.
cat test.py | py2many --nim=1 -
http://dpaste.com//5ALVT7MK4Yes, that is the achilles heel of Python.
I am always torn between Python and PHP for new projects because of this.
The Python Syntax plus its import system are huge advantages over PHP. On the other hand, you suffer a 6x slowdown if you go with Python. Decisions decisions. I so dearly wish I could have the good parts of both worlds.
And for data processing of all things ...
Php is also very slow, on top of being many other kinds of unpleasant and broken.
I think that is pip.
Python is good where speed of development matters, where you write throw-away code testing some ideas and you want to do it fast, where you write glue code, for prototypes, for small code bases.
Once you are getting outside of that area, you better should use a language more suited for the task.
As for myself, even if I can use Python in some cases, I can churn C# code almost as fast so I prefer doing it that way in case I want to grow the code later or use it somewhere else. Being lazy, I dislike rewriting code.