The current issue the JVM has is that all threads have a corresponding operating system thread. That, unfortunately, is really heavy memory wise and on the OS context switcher.
Loom allows java to have threads as light weight as a goroutine. It's going to change the way everything works. You might still have a dedicated CPU bound thread pool (the common fork join pool exists and probably should be used for that). But otherwise, you'll just spin up virtual threads and do away with all the consternation over how to manage thread pools and what a thread pool should be used for.
So, there was a time where a broad statement like that was pretty solid. These days, I don't think so. The default stack size (on 64-bit Linux) is 1MB, and you can manipulate that to be smaller if you want. That's also the virtual memory. The actually memory usage depends on your application. There was a time where 1MB was a lot of memory, but these days, for a lot of contexts, it's kind of peanuts unless you have literally millions of threads (and even then...). Yes, you can be more memory efficient, but it wouldn't necessarily help that much. Similarly, at least in the case of blocking IO (which is normally why you'd have so many threads), the overhead on the OS context switcher isn't necessarily that significant, as most threads will be blocked at any given time, and you're already going to have a context switch from the kernel to userspace. Depending on circumstance, using polling IO models can lead to more context switching, not less.
There's certainly circumstances where threads significantly impede your application's efficiency, but if you are really in that situation you likely already know it. In the broad set of use cases though, switching from a thread-based concurrency model to something else isn't going to be the big win people think it will be.
That time is approaching 20 years old at this point, too. Native threads haven't been "expensive" for a very, very long time now.
Maybe if you're in the camp of disabling overcommit it matters, but otherwise the application of green threads is definitely a specialized niche, not generally useful.
> In the broad set of use cases though, switching from a thread-based concurrency model to something else isn't going to be the big win people think it will be.
I'd go even further and say it'll be a net-loss in most cases, especially with modern complications like heterogeneous compute. If you're use case is specifically spinning up thousands of threads for IO (aka, you're a server & nothing else), then sure. But if you aren't there's no win here, just complications (like times when you need native thread isolation for FFI reasons, like using OpenGL)
I don't know why, I, personally, never needed that feature and good old threads were always enough for me. It's weird for me to watch non-JDBC drivers with async interface, when it was a common knowledge that JDBC data source should use something like 10-20 threads maximum (depending on DB CPU count), anything more is a sign of bad database design. And running 10-20 threads, obviously, is not an issue.
But demand is here. And probably lightweight threads is a better approach than async/await transformations.
The default thread stack size is 8 or 10 MB on most Linux.
The exception is Alpine that's below 1 MB.
Granted there are scenarios where you want 100,000 "threads of execution." And that clearly is going to be impractical for system threads.
But if your worried about the overhead of your pool of 50 threads, stop it.
Which leads to dirty things like inserting sleep 0 at the top of loops and dealing with really unbalanced scheduling of threads don’t hit yields often enough. Plus with loom it might not be obvious that some function is a yield since it’s meant to be transparent so if you grab a lock and yield you make everyone wait until your scheduled again.
Green threads are great! I love them and they’re the only real solutions to really concurrent IO heavy workloads but it’s not a panacea and trades one kind of discipline for another.
It just so happens that a large number of JVM users are working with IO bound problems. Once you start talking about CPU bound problems the JVM tends not to be the thing most people reach for.
Loom doesn't remove the CPU bound solution by adding the IO solution. Instead, it adds a good IO solution and keeps the old CPU solution when needed.
In fact, there's already a really good pool in the JVM for common CPU bound tasks. `Forkjoin.common()`.
I see Project Loom as really providing all the benefits of single threaded languages like Node (i.e. tons of scalability), but with an easier programming model that threads provide as opposed to using async/await.
pwhile(()-> loop predicate, ()-> {loop body});
All it does is add a thread.isinterrupted check to the predicate. At this point, best to switch to Erlang !There are basically two kinds of green threads:
(1) N:1, where one OS thread hosts all the application threads, and
(2) M:N, where M application threads are hosted on N OS threads.
Original Java (and Ruby, and lots of other systems before every microcomputer was a multicore parallel system) green threads were N:1, which provide concurrency but not parallelism, which is fine when your underlying system can't do real parallelism anyway.)
Wanting to take advantage of multicore systems (at least, in the Ruby case, for underlying native code) drove a transition to native threads (which you could call an N:N threading model, as application and OS threads are identical.)
But this limits the level of concurrency to the level of parallelism, which can be a regression compared to N:1 models for applications where the level of concurrency that is useful is greater than the level of parallelism available.
What lots of newer systems are driving toward, to solve that, are M:N models, which can leverage all available parallelism but also provide a higher degree of concurrency.
Longer answer: devs back in the day couldn't really grok the difference between green and real threads. Java made its bones as an enterprise language, which can have smart programmers, but they will conversely not be closer-to-metal knowledgewise. Too many devs back in the day expected a java thread to be a real thread, so java re-engineered to accomodate this.
I think the JDK/JVM teams also viewed it as a maturation of the JVM to be directly using OS resources so closely across platforms, rather than "hacking" it with green threads.
These days, our high performance fanciness means the devs are demanding green thread analogues, and go/elixir/others are seemingly superior because of those.
So to remain competitive in the marketplace, Java now needs threads that aren't threads even though Java used to have threads that weren't threads.
The new Loom threads will be much lighter weight than the original Java green threads. Further, the entire IO infrastructure of the JVM is being reworked for Loom to make sure the OS doesn't block the VM's thread. What's more, Loom does M:N threading.
Same concept, very different implementation.
I am working with Kafka and MongoDB and it is normal for my app to have a million in flight transactions at various stages of completion.
In the past it required a lot of planning (and a lot of code) but Reactor let's me build these processes as pipelines with whatever concurrency or scheduler I desire, at any stage of the processing.
We are even doing tricks like merging unrelated queries to MongoDB so that sometimes thousands of same queries are executed together (one query with huge in() or one bulk write rather than separate ones).
This is improving our throughputs by orders of magnitude while the pipeline pulls millions of documents per second from the database.
I just don't see how Loom helps.
Loom could help if you had blocking APIs to start, but you get much better results if you just resolve to use async, non-blocking wrapped in ReactiveX.
Now I get it is hard for many folks to understand that part. Just like at my workplace people think it is impossible to write micro service without SpringBoot.
> Loom could help if you had blocking APIs to start, but you get much better results if you just resolve to use async, non-blocking wrapped in ReactiveX.
There might be billions of lines of legacy code which would adapt to Loom with minimal changes but impossible to turn in ReactiveX etc without enormous investment and risk. Your ideas are rather simplistic for real world.
Currently for efficiency, you would need at least 2 pools: 1 small bounded pool for dequeuing the requests and create the IO operation, and 1 unbounded pool for actually executing the IO operation.
I've not looked into the goroutine implementation, so I couldn't tell you how it compares to what I've read loom is doing.
Loom is looking to have some extremely compact stacks which means each new "virtual thread" as they are calling them will end up having bytes worth of memory allocated.
Another thing coming with loom that go lacks is "structured concurrency". It's the notion that you might have a group of tasks that need to finish before moving on from a method (rather than needing to worry about firing and forgetting causing odd things to happen at odd times).
I don't think goroutines would need such information. A goroutine knows that "int foobar;" is currently being stored in "rbx", and that "int foobar" is currently saved on the stack. Therefore, rbx doesn't need to be saved.
------
Linux/NPTL threads don't know when they are interrupted. So all register state (including AVX512 state if those are being used) needs to be saved. AVX512 x 32 is 2kB alone.
Even if AVX512 isn't being used by a thread (Linux detects all AVX512 registers to be all-zero), RAX through R15 is 128-bytes, plus SSE-registers (another 128-bytes) or ~256 bytes of space that the goroutines don't need. Plus whatever other process-specific information needs to be saved off (CPU time and other such process / thread details that Linux needs to decide which threads to process next)
You can expose a method on the scala side to enter the IO world that will take your arguments and run them in the IO environment, returning a result to you, or notifying some Java class using Observer/Observable. This can, of course take Java lambdas and datatypes, thus keeping your business code in Java should you so desire. It's clunky, though, and I wish Java had easy IO primitives like Scala.
Anyway, they have their place, but if you've got a fancy chain of micro services calling out to wherever, think hard before putting those calls in an unbounded thread pool.
Care to elaborate please? Seems like the author is recommending unbounded thread pools with bounded queues for blocking IO. Isn't that pretty similar?
Threadpooling only matters if you have neither of those things.
Otherwise, you should be using one or the other over a thread pool. You might still spin up a threadpool for CPU bound operations, but you wouldn't have one dedicated to IO.
As of C++ 20, there are coroutines which you should be looking at (IMO).
Coroutines / Goroutines and the like are probably better on I/O bound tasks where the CPU-effort in task-switching is significant.
--------
For example: Matrix Multiplication is better with a Threadpool. Handling 1000 simultaneous connections when you get Slashdotted (or "Hacker News hug of death") is better solved with coroutines.
Ha! Maybe in 20 years. Sadly, I'm still writing new code targeting C++98 on one project. The most current project I'm a part of is on C++11.
Not for languages with go/coroutines (e.g. Go, Clojure, Crystal) as those were designed specifically to help with the thread-per-IO constraint.
I'm bemused by this statement, and I can't figure out whether this is an assertion rooted in supreme confidence, or just idle, wishful thinking.
That being said, giving threading advice in a virtualized and containerized world is tricky. And while these three categories seem sensible, mapping the functions of any non-trivial system onto them is going to be difficult, unless the system was specifically designed around it.