Show HN: Python can make 3M+ WebSocket keys per second (opens in new tab)

(github.com)

83 pointscprogrammer19942y ago70 comments

70 comments

throwaway20372y ago

The hardest part of debugging Python is "hitting the wall" when you come to a native library (compiled C code). And Python has achieved a lot of speed-ups from Py2 to Py3 by adding more compiled C code. This is a real blocker for understanding the foundation library.

On the other hand, in Java, with a few exceptions around Swing (native painting for GUIs) in Java, almost everything is written in pure Java, so you can debug all the way down if need be. It is a huge help for understanding the foundation library and all of its edge cases (normal for any huge library). Modern Java debuggers, like IntelliJ, are so crazy, they will decompile JARs and allow you to set debug breakpoints and step-into decompiled code. It is mind blowing when trying to debug a library that you don't own the source code (random quant lib, ancient auth lib, etc.).

spacephysics2y ago

Mojo seems like a very promising solution, despite it being AI oriented. Lex Friedman had a great podcast with the founder, its first version was released about a month ago, and in two weeks got 10k discord users.

A superset of python that, once finished, can run all native python code, but also adds language features to get closer to the “metal”, getting speeds of C libraries with the ability to properly debug and step through it.

If this project goes the way it’s promised (the guy has great experience with language development), this could further cement python as it would solve its biggest criticisms. Very excited for it

https://youtu.be/pdJQ8iVTwj8 https://www.modular.com/mojo https://en.m.wikipedia.org/wiki/Chris_Lattner

throwaway20372y ago

Crikey: LattBot strikes again! This guy's footprint on compilers and languages in one generation cannot be understated. His PhD work eventually grew into LLVM and Clang which lit a fire under GCC and MSVC. The C and C++ compiler ecosystem is now far more competitive and healthy.

He has the magic touch. What you describe sounds amazing. And having LattBot behind it is even better.

I have watched one of his episodes with Lex before. He is absurdly humble about his accomplishments.

zigzag3122y ago

Likewise, in C#, almost everything is written in pure C#.

It also gives you more control over memory allocations than Java, which is nice when trying to squeeze more performance out of the code.

throwaway20372y ago

    It also gives you more control over memory allocations than Java

Cool! I didn't know about this. Can you share an example? It might foster some good discussion. I haven't written any serious C# in about 10 years now. I still love that language. To me, it's like Java with all the rough edges sanded down. In that era, Visual Studio was the only choice for dev and needed the IntelliJ plug-in. I always thought that was a bit goofy, but nothing against the language itself. And, the COM+ integration is legendary if you need to run on Win32.

zigzag3122y ago

Value types have gained more functionality in recent versions.

You can quickly create record structs or value tuples (that auto-implement Equals(obj) & GetHashCode()) and pass them around by reference instead of copying.

  record struct RGBA(byte R, byte G, byte B, byte A);

  // allocate a small collection on the stack
  Span<RGBA> colors = stackalloc RGBA[3]
  {
      new RGBA(255, 0, 0, 255),
      new RGBA(0, 255, 0, 255),
      new RGBA(0, 0, 255, 255)
  };

  // get value by reference
  ref var c = ref colors[2]; 
  c = new RGBA(127, 127, 127, 255); 
  
  // prints: RGBA { R = 127, G = 127, B = 127, A = 255 }
  Console.WriteLine(colors[2]); 

  // value tuple
  (byte R, byte G, byte B, byte A) white = (255,255,255,255);

Heap allocated generic collections of value types have better data locality than collections of reference types.

Ref struct is a type of struct that is always stack allocated and cannot be promoted to the managed heap or boxed. `Span<T>` is ref struct.

Spans allow you to create views over contiguous regions of memory that is located on the stack or on the heap or over native memory. You can pass spans to methods that then read/modify the data in the view.

And many other features that help you manage memory or avoid allocations.

For example. When working with interfaces you can avoid boxing allocation of value types by using a generic type constrained to the interface instead of the interface.

merb2y ago

c# has structs and span, which makes memore allocation more easy. in java the thing gets easier as well (soon) but java still has no value types and than there is sun.misc.Unsafe which is still a special case and a really really old api, in jdk 19 some new apis were introduced which lifts some of these stuff like panama preview and a new vector api.

za3faran2y ago

Hopefully Java is getting value types and primitive types soon-ish.

heywhatupboys2y ago

if you need speed ups like that, don't program in either.

I strongly doubt any production C# code is significantly faster than its JAVA counterpart.

VBprogrammer2y ago

When it comes to languages you can draw up a loose hierarchy of potential performance where languages like C# and Java are vastly equivalent. Attempting to refine it any further descends into a bikeshedding farce.

1 more reply

elnatro2y ago

Do you think that relying less in C code could be better for Python and it’s community?

I mean less focus on being “glue” between C code and more focus con optimizing the interpreter?

generichuman2y ago

It is unlikely people would use Python if it didn't rely on C code.

If your ML model takes hours when it can take minutes, or takes days when it can take hours, you will move away. You could move away to another language or a faster interpreter but that's a different discussion.

> more focus con optimizing the interpreter

This is good but there's an upper bound to performance of interpreted languages. Maybe the Python interpreter could be as fast as V8, but it is unlikely to be fast as JVM. People will need to drop down to C / Fortran for whatever compute intensive work they're doing.

robertlagrant2y ago

> If your ML model takes hours when it can take minutes, or takes days when it can take hours, you will move away. You could move away to another language or a faster interpreter but that's a different discussion.

How much of this is done in Python, vs constructing instructions in Python to run on a GPU at uber speed?

1 more reply

didibus2y ago

Creator of LLVM and Swift is doing just that: https://www.modular.com/mojo

wheelerof4te2y ago

If you asked me, Python not only shouldn't rely on (third-party) C library code, but it shouldn't even be used for cases where pure Python is not optimal.

steveBK1232y ago

Right, Python is the new Perl and people are misunderstanding what that means.

1 more reply

Affric2y ago

The whole point of Python is that it abstracts away the low level stuff and you simply “import” and attack the problem you’re actually looking at. It’s optimal for allowing people who don’t have lots of algorithm design memorised to do higher level comp sci.

Python’s optimal use case does not relate to speed.

smabie2y ago

Then no one would really use Python then? It's far too slow

criley22y ago

On the other hand, you still have to learn and use Java. I think learning Python and a touch of C is easier than trying to learn Java. Heck I'd go so far as to say C is a lot easier than Java. Frankly I don't understand why having to learn a little bit of C to debug a python program is such a red flag or wall to you, considering the sheer amount of learning one has to do to use Java at all. You're setting up an uber-sophisticated IDE to debug "ancient auth packages" instead of just... learning a little C and potentially fixing an up-to-date and beloved library?

post-it2y ago

> You're setting up an uber-sophisticated IDE

You just install it.

> learning a little C and potentially fixing an up-to-date and beloved library

A romantic thought, but 99% of the time I'm just going to do a workaround or a local patch.

I don't use Java anymore, but I don't hate it. I think it has some verbose conventions, but I vastly prefer it to C's extremely terse conventions.

Nowadays I try to do as much in TypeScript as I can, because I find it a pleasure to use, and it has the same property where you can dive into any lib when debugging.

throwaway20372y ago

    uber-sophisticated IDE

I love this. As if any developer is not much more productive with a 4GB+ RAM IDE churning away at their code base and suggesting all sorts of things as you write code. OMG: See ClangTidy! All C++ IDEs these days are either directly incorpating ClangTidy, or copying its features. When I use CLion, as a medium-level C++ programmer, it is scary how good are the suggestions from ClangTidy!

throwaway20372y ago

Monkey patching in-house proprietary libraries without source code is also such a nice feature of Java. (Can this be done with C#? I assume yes.)

throwaway20372y ago

This post assumes that I don't already know C. I do. The real problem is friction. Being able to debug from Python code into C code is super hard, even in 2023. If you are in the same language (and debug session), the friction is so much lower.

PhilipRoman2y ago

Are python debuggers really unable to integrate with something like GDB? I have no problem debugging native calls from Java. During development I've never seen a case where library source was unavailable. Even proprietary components from other companies come with source included.

ben-schaaf2y ago

GDB on its own does a reasonable job debugging python: https://wiki.python.org/moin/DebuggingWithGdb

uranusjr2y ago

I feel it’s less about instrumentation but more about Python developers understanding C code.

throwaway20372y ago

No, as a "fluent C speaker", the instrumentation part is much harder than understanding the C code.

Qem2y ago

> The hardest part of debugging Python is "hitting the wall" when you come to a native library (compiled C code).

This is one big reason I love Pharo and other Smalltalk-language implementations. Being mostly written in themselves down to the VM. You can take a deep dive and inspect everything without being afraid of smashing your head against C bedrock underneath. And they still manage doing this while being reasonably performant and dynamic.

westurner2y ago

> The hardest part of debugging Python is "hitting the wall" when you come to a native library (compiled C code). And Python has achieved a lot of speed-ups from Py2 to Py3 by adding more compiled C code.

"Debugging a Mixed Python and C Language Stack" (2023) https://news.ycombinator.com/item?id=35710350

mikece2y ago

> The hardest part of debugging Python is "hitting the wall" when you come to a native library (compiled C code).

Aren’t most of these native libraries open source? I’m a C# dev so maybe this is a naive question based on my experience but is there not a way to bring in the source of the C library and debug into it in these “hit the wall” situations?

tkuraku2y ago

You can, but then you have to setup and build the libraries in debug mode. You can then debug with gdb or visual studio. It is just a lot more work than debugging pure python.

phoe-krk2y ago

> Show HN: Python can make 3M+ WebSocket keys per second

> This article is about optimizing a tiny bit of Python code by replacing it with its C++ counterpart.

So it's C++ rather than Python.

sdiepend2y ago

The article was posted by "cprogrammer1994", that should've been a hint!

phoe-krk2y ago

The first "C" stands for "Cython".

cprogrammer1994OP2y ago

C stands for, well, C :) also, I do not use Cython.

jcarrano2y ago

Yes, that's one of the ways one is supposed to optimize Python. Writing a tiny performance-sensitive part of a Python app in C/C++ is a possibility that many people do not think about when they decide they are going to develop the whole app in C/C++ and can be a much a smarter choice.

phoe-krk2y ago

In a way, but at this point your stack is Python/C++, where Python serves the role of a high-level language and C++ serves the role of the fast one. You're not optimizing Python-as-a-language at that point; you're FFIing to another codebase written in a different language.

So, if anything, the title should say "Python, when linked with C++ code, can make (...)"

sovietmudkipz2y ago

It’s cool that languages support this kind of stuff.

So if you’re writing python the employs this technique and you still want to keep the “OS agnostic” characteristic of python does that mean you’d have to compile multiple C++ binaries and check the OS to see which one to run?

yw34102y ago

Yes, though hopefully it's been packaged up by the distro otherwise you'll end up having dependency problems if you compiled against a wrong libc or something else.

lukevp2y ago

Alternatively, you could compile the C++ (or rust or zig) to WebAssembly, and a single binary would run anywhere.

sovietmudkipz2y ago

Thanks for this. This is something I wouldn’t have considered.

rat99882y ago

WWouldn't you lose some of the performance benefit?

keithalewis2y ago

Bingo.

jeroenhd2y ago

This has nothing to do with websockets and much more with doing hashing and Python-to-native calls. It's comparing generating base64(sha1(something)) in Python which I suppose also means "websocket keys".

I'm not sure why the author implemented SHA1 and a base64 digest thereof manually rather than including a small library, but perhaps that was part of the challenge.

Python can generate a whole lot more keys per second if you enable SIMD, multithreading, or even GPU support. In fact, Ryzen / 11th+ Gen Intel/ARMv8A have dedicated SHA1 instructions that should significantly boost performance here. Together with something like https://github.com/WojciechMula/base64-avx512 I bet you could increase the performance an order of magnitude if daw CPU speed were really a concern.

I suppose three million keys per second ought to be enough for any websocket server, especially for a relatively simple implementation of the code.

Someone2y ago

The title should be “optimization-demo” (original title) or “Replacing parts of Python programs by C++ can be easy and profitable”.

They replace Python code that makes 5 calls into native code by code that makes 1 call that makes those 5 calls, and get a speed up from 869k calls per second to 3.15m calls per second, so a snarky title could even be “Python-to-native calls are slow”.

They could even measure it by adding a C++ version of that

  def magic_accept(key: str) -> str:
    return 's3pPLMBiTxaQ9kYGzzhZRbK+xOo='

code and benchmarking that.

mangeld2y ago

Did you take a look at the C++ implementation of the hashing function they did? I didn't see a single call made there. They replaced python code that makes 5 calls into a single call.

akx2y ago

That's a copy-pasted variation of this public-domain SHA1 code: http://ftp.funet.fi/pub/crypt/hash/sha/sha1.c surrounded by base64 decoding and encoding for a known-length binary text.

By inlining all library code you use, yes indeed, you too can also not make a single call.

Someone2y ago

I didn’t. I expect that it makes a difference, though for the observation that Python native calls are relatively slow.

k__2y ago

"...by replacing it with its C++..."

Nice try!

grodes2y ago

Python can make 3M+ WebSocket keys per second C++ 85.1%

wheelerof4te2y ago

The irony of Python.

keithalewis2y ago

Indeed. "I'm a Pythonjack and I'm okay..."

tyingq2y ago

I suspect part of the speedup is avoiding the python included base64 implementation. Third party extensions[1] claim fairly large improvements.

[1] https://github.com/mayeut/pybase64

This particular one also includes b64encode_as_string, which would also reduce some work/copying.

akx2y ago

While the article clearly says this is a toy/example and so on, one of the nice points of the Python version is that it doesn't e.g.

- segfault the interpreter if you pass in something that's not a string

- read bogus memory if the length of the string is < 24

raverbashing2y ago

Ok now try the python version with Pypy and see how it goes

joshxyz2y ago

the tragedy though is websocket tls is what will actually slow us down

j / k navigate · click thread line to collapse

70 comments

throwaway20372y ago

spacephysics2y ago

If this project goes the way it’s promised (the guy has great experience with language development), this could further cement python as it would solve its biggest criticisms. Very excited for it

https://youtu.be/pdJQ8iVTwj8 https://www.modular.com/mojo https://en.m.wikipedia.org/wiki/Chris_Lattner

throwaway20372y ago

He has the magic touch. What you describe sounds amazing. And having LattBot behind it is even better.

I have watched one of his episodes with Lex before. He is absurdly humble about his accomplishments.

zigzag3122y ago

Likewise, in C#, almost everything is written in pure C#.

It also gives you more control over memory allocations than Java, which is nice when trying to squeeze more performance out of the code.

throwaway20372y ago

    It also gives you more control over memory allocations than Java

zigzag3122y ago

Value types have gained more functionality in recent versions.

You can quickly create record structs or value tuples (that auto-implement Equals(obj) & GetHashCode()) and pass them around by reference instead of copying.

  record struct RGBA(byte R, byte G, byte B, byte A);

  // allocate a small collection on the stack
  Span<RGBA> colors = stackalloc RGBA[3]
  {
      new RGBA(255, 0, 0, 255),
      new RGBA(0, 255, 0, 255),
      new RGBA(0, 0, 255, 255)
  };

  // get value by reference
  ref var c = ref colors[2]; 
  c = new RGBA(127, 127, 127, 255); 
  
  // prints: RGBA { R = 127, G = 127, B = 127, A = 255 }
  Console.WriteLine(colors[2]); 

  // value tuple
  (byte R, byte G, byte B, byte A) white = (255,255,255,255);

Heap allocated generic collections of value types have better data locality than collections of reference types.

Ref struct is a type of struct that is always stack allocated and cannot be promoted to the managed heap or boxed. `Span<T>` is ref struct.

And many other features that help you manage memory or avoid allocations.

For example. When working with interfaces you can avoid boxing allocation of value types by using a generic type constrained to the interface instead of the interface.

merb2y ago

za3faran2y ago

Hopefully Java is getting value types and primitive types soon-ish.

heywhatupboys2y ago

if you need speed ups like that, don't program in either.

I strongly doubt any production C# code is significantly faster than its JAVA counterpart.

VBprogrammer2y ago

1 more reply

elnatro2y ago

Do you think that relying less in C code could be better for Python and it’s community?

I mean less focus on being “glue” between C code and more focus con optimizing the interpreter?

generichuman2y ago

It is unlikely people would use Python if it didn't rely on C code.

> more focus con optimizing the interpreter

robertlagrant2y ago

How much of this is done in Python, vs constructing instructions in Python to run on a GPU at uber speed?

1 more reply

didibus2y ago

Creator of LLVM and Swift is doing just that: https://www.modular.com/mojo

wheelerof4te2y ago

If you asked me, Python not only shouldn't rely on (third-party) C library code, but it shouldn't even be used for cases where pure Python is not optimal.

steveBK1232y ago

Right, Python is the new Perl and people are misunderstanding what that means.

1 more reply

Affric2y ago

Python’s optimal use case does not relate to speed.

smabie2y ago

Then no one would really use Python then? It's far too slow

criley22y ago

post-it2y ago

> You're setting up an uber-sophisticated IDE

You just install it.

> learning a little C and potentially fixing an up-to-date and beloved library

A romantic thought, but 99% of the time I'm just going to do a workaround or a local patch.

I don't use Java anymore, but I don't hate it. I think it has some verbose conventions, but I vastly prefer it to C's extremely terse conventions.

Nowadays I try to do as much in TypeScript as I can, because I find it a pleasure to use, and it has the same property where you can dive into any lib when debugging.

throwaway20372y ago

    uber-sophisticated IDE

throwaway20372y ago

Monkey patching in-house proprietary libraries without source code is also such a nice feature of Java. (Can this be done with C#? I assume yes.)

throwaway20372y ago

PhilipRoman2y ago

ben-schaaf2y ago

GDB on its own does a reasonable job debugging python: https://wiki.python.org/moin/DebuggingWithGdb

uranusjr2y ago

I feel it’s less about instrumentation but more about Python developers understanding C code.

throwaway20372y ago

No, as a "fluent C speaker", the instrumentation part is much harder than understanding the C code.

Qem2y ago

> The hardest part of debugging Python is "hitting the wall" when you come to a native library (compiled C code).

westurner2y ago

"Debugging a Mixed Python and C Language Stack" (2023) https://news.ycombinator.com/item?id=35710350

mikece2y ago

> The hardest part of debugging Python is "hitting the wall" when you come to a native library (compiled C code).

tkuraku2y ago

You can, but then you have to setup and build the libraries in debug mode. You can then debug with gdb or visual studio. It is just a lot more work than debugging pure python.

phoe-krk2y ago

> Show HN: Python can make 3M+ WebSocket keys per second

> This article is about optimizing a tiny bit of Python code by replacing it with its C++ counterpart.

So it's C++ rather than Python.

sdiepend2y ago

The article was posted by "cprogrammer1994", that should've been a hint!

phoe-krk2y ago

The first "C" stands for "Cython".

cprogrammer1994OP2y ago

C stands for, well, C :) also, I do not use Cython.

jcarrano2y ago

phoe-krk2y ago

So, if anything, the title should say "Python, when linked with C++ code, can make (...)"

sovietmudkipz2y ago

It’s cool that languages support this kind of stuff.

yw34102y ago

Yes, though hopefully it's been packaged up by the distro otherwise you'll end up having dependency problems if you compiled against a wrong libc or something else.

lukevp2y ago

Alternatively, you could compile the C++ (or rust or zig) to WebAssembly, and a single binary would run anywhere.

sovietmudkipz2y ago

Thanks for this. This is something I wouldn’t have considered.

rat99882y ago

WWouldn't you lose some of the performance benefit?

keithalewis2y ago

Bingo.

jeroenhd2y ago

I'm not sure why the author implemented SHA1 and a base64 digest thereof manually rather than including a small library, but perhaps that was part of the challenge.

I suppose three million keys per second ought to be enough for any websocket server, especially for a relatively simple implementation of the code.

Someone2y ago

The title should be “optimization-demo” (original title) or “Replacing parts of Python programs by C++ can be easy and profitable”.

They could even measure it by adding a C++ version of that

  def magic_accept(key: str) -> str:
    return 's3pPLMBiTxaQ9kYGzzhZRbK+xOo='

code and benchmarking that.

mangeld2y ago

Did you take a look at the C++ implementation of the hashing function they did? I didn't see a single call made there. They replaced python code that makes 5 calls into a single call.

akx2y ago

That's a copy-pasted variation of this public-domain SHA1 code: http://ftp.funet.fi/pub/crypt/hash/sha/sha1.c surrounded by base64 decoding and encoding for a known-length binary text.

By inlining all library code you use, yes indeed, you too can also not make a single call.

Someone2y ago

I didn’t. I expect that it makes a difference, though for the observation that Python native calls are relatively slow.

k__2y ago

"...by replacing it with its C++..."

Nice try!

grodes2y ago

Python can make 3M+ WebSocket keys per second C++ 85.1%

wheelerof4te2y ago

The irony of Python.

keithalewis2y ago

Indeed. "I'm a Pythonjack and I'm okay..."

tyingq2y ago

I suspect part of the speedup is avoiding the python included base64 implementation. Third party extensions[1] claim fairly large improvements.

[1] https://github.com/mayeut/pybase64

This particular one also includes b64encode_as_string, which would also reduce some work/copying.

akx2y ago

While the article clearly says this is a toy/example and so on, one of the nice points of the Python version is that it doesn't e.g.

- segfault the interpreter if you pass in something that's not a string

- read bogus memory if the length of the string is < 24

raverbashing2y ago

Ok now try the python version with Pypy and see how it goes

joshxyz2y ago

the tragedy though is websocket tls is what will actually slow us down

j / k navigate · click thread line to collapse