Thread Pools on the JVM (opens in new tab)

(gist.github.com)

217 pointsovis4y ago117 comments

117 comments

Loom can't land fast enough!

The current issue the JVM has is that all threads have a corresponding operating system thread. That, unfortunately, is really heavy memory wise and on the OS context switcher.

Loom allows java to have threads as light weight as a goroutine. It's going to change the way everything works. You might still have a dedicated CPU bound thread pool (the common fork join pool exists and probably should be used for that). But otherwise, you'll just spin up virtual threads and do away with all the consternation over how to manage thread pools and what a thread pool should be used for.

cbsmith4y ago

> That, unfortunately, is really heavy memory wise and on the OS context switcher.

So, there was a time where a broad statement like that was pretty solid. These days, I don't think so. The default stack size (on 64-bit Linux) is 1MB, and you can manipulate that to be smaller if you want. That's also the virtual memory. The actually memory usage depends on your application. There was a time where 1MB was a lot of memory, but these days, for a lot of contexts, it's kind of peanuts unless you have literally millions of threads (and even then...). Yes, you can be more memory efficient, but it wouldn't necessarily help that much. Similarly, at least in the case of blocking IO (which is normally why you'd have so many threads), the overhead on the OS context switcher isn't necessarily that significant, as most threads will be blocked at any given time, and you're already going to have a context switch from the kernel to userspace. Depending on circumstance, using polling IO models can lead to more context switching, not less.

There's certainly circumstances where threads significantly impede your application's efficiency, but if you are really in that situation you likely already know it. In the broad set of use cases though, switching from a thread-based concurrency model to something else isn't going to be the big win people think it will be.

kllrnohj4y ago

> So, there was a time where a broad statement like that was pretty solid.

That time is approaching 20 years old at this point, too. Native threads haven't been "expensive" for a very, very long time now.

Maybe if you're in the camp of disabling overcommit it matters, but otherwise the application of green threads is definitely a specialized niche, not generally useful.

> In the broad set of use cases though, switching from a thread-based concurrency model to something else isn't going to be the big win people think it will be.

I'd go even further and say it'll be a net-loss in most cases, especially with modern complications like heterogeneous compute. If you're use case is specifically spinning up thousands of threads for IO (aka, you're a server & nothing else), then sure. But if you aren't there's no win here, just complications (like times when you need native thread isolation for FFI reasons, like using OpenGL)

cbsmith4y ago

> That time is approaching 20 years old at this point, too. Native threads haven't been "expensive" for a very, very long time now.

It depends on the context, but yes. I worked on stuff throughout the 2000's where we ran into scaling problems with thread based concurrency models. At the time, running 100,000 threads was... challenging. But yeah, by 2010 we were talking about the C10M problem, because the C10K problem wasn't a problem any more. There are some cases where you really do need to handle 10's or 100's of millions of threads, but there aren't a lot of them.

> Maybe if you're in the camp of disabling overcommit it matters, but otherwise the application of green threads is definitely a specialized niche, not generally useful.

Yup, but everyone is still stuck on the old mental model of "threads are bad", partly driven by the assumption that whatever is being done to handle those extreme cases is what one should be doing to address their own problem space. :-(

> I'd go even further and say it'll be a net-loss in most cases, especially with modern complications like heterogeneous compute.

Even more so if you're doing polling based I/O rather than a reactive model. The look on people's faces when I point out to them that there's good reason to think that for the scale they are working at, they'll likely get better performance if they just use threads to scale...

It's so weird how we talk about the context switching costs between threads without recognizing that the thread does the poll is not the same thread that processed the IO request in the kernel.

samus4y ago

> I'd go even further and say it'll be a net-loss in most cases, especially with modern complications like heterogeneous compute. If you're use case is specifically spinning up thousands of threads for IO (aka, you're a server & nothing else), then sure. But if you aren't there's no win here, just complications (like times when you need native thread isolation for FFI reasons, like using OpenGL)

Virtual threads are going to be an /option/, not a requirement. Threads have to explicitly created as virtual threads. If this is not done, nothing will change.

1 more reply

vbezhenar4y ago

Your words might be true, but the world jumped on async wagon long time ago and going all in. Nobody likes threads, everyone wants lightweight threads. Emulating lightweight threads with promises (optionally hidden behind async/await transformations) is very popular. So demand for this feature is here.

I don't know why, I, personally, never needed that feature and good old threads were always enough for me. It's weird for me to watch non-JDBC drivers with async interface, when it was a common knowledge that JDBC data source should use something like 10-20 threads maximum (depending on DB CPU count), anything more is a sign of bad database design. And running 10-20 threads, obviously, is not an issue.

But demand is here. And probably lightweight threads is a better approach than async/await transformations.

cbsmith4y ago

It's madness.

1 more reply

user59944614y ago

>>> The default stack size (on 64-bit Linux) is 1MB

The default thread stack size is 8 or 10 MB on most Linux.

The exception is Alpine that's below 1 MB.

cbsmith4y ago

To clarify, the 1MB is the default stack size for threads with the JVM on 64-bit Linux.

Search for "-Xss": https://docs.oracle.com/en/java/javase/16/docs/specs/man/jav...

ori_b4y ago

The default reserved size is 8mb. The allocated size starts at a page (usually 4k), and grows in page sized increments as you use it.

paulddraper4y ago

Yep.

Granted there are scenarios where you want 100,000 "threads of execution." And that clearly is going to be impractical for system threads.

But if your worried about the overhead of your pool of 50 threads, stop it.

cbsmith4y ago

> Granted there are scenarios where you want 100,000 "threads of execution." And that clearly is going to be impractical for system threads.

100,000 was impractical in the 2000's. Today, even with the default Java stack size of 1MB, 100,000 * 1MB = 100 GB of virtual memory. For IO bound tasks, actual memory usage would typically be a fraction of that, possibly under 2GB. That's definitely practical for a modern server.

> But if your worried about the overhead of your pool of 50 threads, stop it.

Yeah, people seem to misunderstand how thread pools work out these days. They're more limits on concurrency than anything else.

Spivak4y ago

You are ignoring the downside to green threads which is that it’s cooperative. If the thread doesn’t yield control back to the event loop then the real OS thread backing the loop is now stuck.

Which leads to dirty things like inserting sleep 0 at the top of loops and dealing with really unbalanced scheduling of threads don’t hit yields often enough. Plus with loom it might not be obvious that some function is a yield since it’s meant to be transparent so if you grab a lock and yield you make everyone wait until your scheduled again.

Green threads are great! I love them and they’re the only real solutions to really concurrent IO heavy workloads but it’s not a panacea and trades one kind of discipline for another.

cogman104y ago

Which is why the advice would be "Don't use virtual threads for CPU work".

It just so happens that a large number of JVM users are working with IO bound problems. Once you start talking about CPU bound problems the JVM tends not to be the thing most people reach for.

Loom doesn't remove the CPU bound solution by adding the IO solution. Instead, it adds a good IO solution and keeps the old CPU solution when needed.

In fact, there's already a really good pool in the JVM for common CPU bound tasks. `Forkjoin.common()`.

sudhirj4y ago

Sleep 0 sounds like quite a hack, Go has the neater https://pkg.go.dev/runtime#Gosched instead, and I assume there will be a Java equivalent as well. And if most stdlib methods and all blocking methods call it, it's going to be pretty difficult to hang a green thread.

WatchDog4y ago

FWIW, Java has had `Thread#yield()`[0] since inception.

[0]: https://docs.oracle.com/javase/7/docs/api/java/lang/Thread.h...()

kaba04y ago

Since there is a runtime that knows everything about the state of the thread, my understanding is that there is no need for explicit yields. Everything will turn automagically into non-blocking (except for FFI)

saurik4y ago

FWIW, while you are probably correct in the context of Loom--a specific implementation that I honestly haven't looked at much--you shouldn't generalize to "green threads" of all forms as you not only can totally implement this well but Erlang does so: as you are working with a byte code and a JIT anyway, you instrument the code to check occasionally if it was preempted (I believe Erlang does this for every potentially-backward jump, which is sufficient to guarantee even a broken loop can be preempted).

hn_throwaway_994y ago

Agreed, but you have other single-threaded server languages like NodeJS which have the same problem (a new request can only be handled if the current request gives up control, usually waiting for IO) and people have figured out how to handle it.

I see Project Loom as really providing all the benefits of single threaded languages like Node (i.e. tons of scalability), but with an easier programming model that threads provide as opposed to using async/await.

brokencode4y ago

I was under the impression that Loom was implementing preemptable lightweight threads. Is that not the case?

Spivak4y ago

So loom uses interesting terminology when talking about this. They say that they’re preemptive and not cooperative because there’s not an explicit await/yield keyword that you call from your code but that isn’t the whole story because threads are only preempted when they perform IO or are synchronized. So you as an author can’t know for sure where the yield points are and aren’t supposed to rely on them but they’re still there. You’re not going to be forcefully preempted in the middle of number crunching.

I think most people would consider this a surprising notion of preemption where it’s out of your control-ish but also not arbitrary like it is for OS threads which still leads to basically the same problems and constraints as cooperative threads.

2 more replies

mikepurvis4y ago

It sounds like it is: https://cr.openjdk.java.net/~rpressler/loom/loom/sol1_part1....

But the other side of that is that sometimes non-preemption is also a desirable property— like in JavaScript, or Python asyncio, knowing that you don't need to lock over every little manipulation of some shared data structure because you're never going to yield if you didn't explicitly await.

clhodapp4y ago

I think that's not quite it:

I believe that loom is implementing cooperative lightweight threads and simultaneously reworking all of the blocking IO operations in the Java standard library to include yields. I guess this means that you could, for example, hold an OS-level thread forever by writing an infinite loop that doesn't do any IO...

1 more reply

kaba04y ago

When you have a runtime, you have proper information whether there is work being done on a given virtual thread - So in case of Loom, afaik any blocking call will turn into non-blocking auto-magically (other than FFI, but that is very rare in Java), since the JVM is free to wait on that asynchronously behind the scenes and do some other work in the meantime.

neeleshs4y ago

:) sleep 0! I was trying to see if there is a way to preempt stuck threads (infinite loops etc), and wrote a small while loop replacement

  pwhile(()-> loop predicate, ()-> {loop body});

All it does is add a thread.isinterrupted check to the predicate. At this point, best to switch to Erlang !

christkv4y ago

Are we coming full circle going back a variant of the original Java green threads?

dragonwriter4y ago

> Are we coming full circle going back a variant of the original Java green threads?

There are basically two kinds of green threads:

(1) N:1, where one OS thread hosts all the application threads, and

(2) M:N, where M application threads are hosted on N OS threads.

Original Java (and Ruby, and lots of other systems before every microcomputer was a multicore parallel system) green threads were N:1, which provide concurrency but not parallelism, which is fine when your underlying system can't do real parallelism anyway.)

Wanting to take advantage of multicore systems (at least, in the Ruby case, for underlying native code) drove a transition to native threads (which you could call an N:N threading model, as application and OS threads are identical.)

But this limits the level of concurrency to the level of parallelism, which can be a regression compared to N:1 models for applications where the level of concurrency that is useful is greater than the level of parallelism available.

What lots of newer systems are driving toward, to solve that, are M:N models, which can leverage all available parallelism but also provide a higher degree of concurrency.

cbsmith4y ago

Java had M:N green thread models a LOOOOONG time ago.

And Linux tried M:N thread implementations specifically to improve thread performance.

In both cases, it turned out that just using 1:1 native threads ended up being a net win.

1 more reply

jjtheblunt4y ago

I worked in Solaris internals for a while at Sun during the early java era, and Solaris threading definitely did multiplexing of userspace onto os, and then os onto cores.

Do you have a citation (because I can't find one) specifying your assertion that original Java green threads were not analogous to Solaris user -> os -> hardware multiplexing?

1 more reply

AtlasBarfed4y ago

Basically yes.

Longer answer: devs back in the day couldn't really grok the difference between green and real threads. Java made its bones as an enterprise language, which can have smart programmers, but they will conversely not be closer-to-metal knowledgewise. Too many devs back in the day expected a java thread to be a real thread, so java re-engineered to accomodate this.

I think the JDK/JVM teams also viewed it as a maturation of the JVM to be directly using OS resources so closely across platforms, rather than "hacking" it with green threads.

These days, our high performance fanciness means the devs are demanding green thread analogues, and go/elixir/others are seemingly superior because of those.

So to remain competitive in the marketplace, Java now needs threads that aren't threads even though Java used to have threads that weren't threads.

cogman104y ago

Yes and no.

The new Loom threads will be much lighter weight than the original Java green threads. Further, the entire IO infrastructure of the JVM is being reworked for Loom to make sure the OS doesn't block the VM's thread. What's more, Loom does M:N threading.

Same concept, very different implementation.

iamcreasy4y ago

So, with Loom now we can tell exactly in which order theses threads were executed as it's not up to OS to decide thread execution order anymore?

1 more reply

neeleshs4y ago

This gives some color - https://blogs.oracle.com/javamagazine/going-inside-javas-pro...

carimura4y ago

and also many resources direct from the source (those that are working on Loom): https://inside.java/tag/loom

hashmash4y ago

Not quite. The original green threads were seen as more of a hack until Solaris supported true threads. Green threads could only support one CPU core, and so without a major redesign, it was a dead end.

avereveard4y ago

More like the many-to-many threading model of the Solaris implementation of the JVM

lmilcin4y ago

I have discovered ReactiveX for Java and Reactor in particular.

I am working with Kafka and MongoDB and it is normal for my app to have a million in flight transactions at various stages of completion.

In the past it required a lot of planning (and a lot of code) but Reactor let's me build these processes as pipelines with whatever concurrency or scheduler I desire, at any stage of the processing.

We are even doing tricks like merging unrelated queries to MongoDB so that sometimes thousands of same queries are executed together (one query with huge in() or one bulk write rather than separate ones).

This is improving our throughputs by orders of magnitude while the pipeline pulls millions of documents per second from the database.

I just don't see how Loom helps.

Loom could help if you had blocking APIs to start, but you get much better results if you just resolve to use async, non-blocking wrapped in ReactiveX.

geodel4y ago

Loom will help folks who prefer writing straightforward Java code instead of some random reactive library with obscure exception handling and poor to impossible debuggability.

Now I get it is hard for many folks to understand that part. Just like at my workplace people think it is impossible to write micro service without SpringBoot.

> Loom could help if you had blocking APIs to start, but you get much better results if you just resolve to use async, non-blocking wrapped in ReactiveX.

There might be billions of lines of legacy code which would adapt to Loom with minimal changes but impossible to turn in ReactiveX etc without enormous investment and risk. Your ideas are rather simplistic for real world.

1 more reply

dikei4y ago

Yup, Loom will simplify a lot the Producer-Consumer pattern on I/O operation. With virtual threads, it's basically free to block on consumer threads, so you would need only 1 bounded pool for the consumers.

Currently for efficiency, you would need at least 2 pools: 1 small bounded pool for dequeuing the requests and create the IO operation, and 1 unbounded pool for actually executing the IO operation.

ovisOP4y ago

What benefits does loom provide vs using something like cats-effect fibres?

_old_dude_4y ago

You can actually debug the code you write because you get a real stacktrace, not few frames that shows the underlying implementation.

clhodapp4y ago

Admittedly, loom will do much better but cats-effect does try its best within the limitations of the current JVM: https://typelevel.org/cats-effect/docs/2.x/guides/tracing

Nullabillity4y ago

On the other hand, you'll spend a lot more time debugging Loom code, because it reuses the same broken-by-design thread API.

1 more reply

ackfoobar4y ago

For the team that I am in, I can see a huge productivity boost if my teammates can write in direct style instead of wrapping their heads around monads.

hamandcheese4y ago

Scala for-expressions make it pretty easy to write "direct style" code. Someone on the team should probably understand whats going on, though. I've had decent success with ZIO on my team, and it seems perfectly teachable/learnable.

1 more reply

bestinterest4y ago

Whats the difference between goroutines and project loom? Is their any?

cogman104y ago

Terminology mostly :D

I've not looked into the goroutine implementation, so I couldn't tell you how it compares to what I've read loom is doing.

Loom is looking to have some extremely compact stacks which means each new "virtual thread" as they are calling them will end up having bytes worth of memory allocated.

Another thing coming with loom that go lacks is "structured concurrency". It's the notion that you might have a group of tasks that need to finish before moving on from a method (rather than needing to worry about firing and forgetting causing odd things to happen at odd times).

jayd164y ago

>structured concurrency

That's good to hear. You see a lot of these Loom discussions talk about implicit and magical asynchronous execution. I was afraid fine grained thread control would be left out. Its super useful if you want to interface with how most GUI frameworks function (ie a Main thread), or important OS threads like a thread with a bound GL context or what have you.

1 more reply

ccday4y ago

Not sure if it counts as structured concurrency but Go has the feature you describe: https://gobyexample.com/waitgroups

_old_dude_4y ago

Unlike go routine, Loom virtual threads are not preempted by the scheduler. I believe you may be able to explicitly preempt a virtual thread but the last time i checked it was not part of the public API

vips7L4y ago

Unless I'm misunderstanding, virtual threads are preemptive: https://cr.openjdk.java.net/~rpressler/loom/loom/sol1_part1....

1 more reply

jayd164y ago

The biggest difference is probably that the JVM will support both OS and lightweight threads. That's really useful for certain things talking to the GPU in a single thread context.

jeffbee4y ago

Are you quite certain that a (linux, nptl) thread costs more memory than a goroutine? You've implied that but it's not obviously true.

dragontamer4y ago

Wouldn't any linux/nptl thread require at at least the register-state of the entire x86 (or ARM) CPU?

I don't think goroutines would need such information. A goroutine knows that "int foobar;" is currently being stored in "rbx", and that "int foobar" is currently saved on the stack. Therefore, rbx doesn't need to be saved.

------

Linux/NPTL threads don't know when they are interrupted. So all register state (including AVX512 state if those are being used) needs to be saved. AVX512 x 32 is 2kB alone.

Even if AVX512 isn't being used by a thread (Linux detects all AVX512 registers to be all-zero), RAX through R15 is 128-bytes, plus SSE-registers (another 128-bytes) or ~256 bytes of space that the goroutines don't need. Plus whatever other process-specific information needs to be saved off (CPU time and other such process / thread details that Linux needs to decide which threads to process next)

jeffbee4y ago

I don't think the question is dominated by machine state, I think it would be more of a question of stack size. They are demand-paged and 4k by default for native threads, 2k by default for goroutines but stored on a GC'd heap that defaults to 100% overhead, so it sounds like a wash to me.

1 more reply

jackcviers34y ago

Author mentions scala. Both ZIO[1] and Cats-Effect[2] provide fibers (coroutines) over these specific threadpool designs today, without the need for Project Loom, and give the user the capability of selecting the pool type to use without explicit reference. They are unusable from Java, sadly, as the schedulers and ExecutionContexts and runtime are implicitly provided in sealed companion objects and are therefore private and inaccessible to Java code, even when compiling with ScalaThenJava. Basically, you cannot run an IO from Java code.

You can expose a method on the scala side to enter the IO world that will take your arguments and run them in the IO environment, returning a result to you, or notifying some Java class using Observer/Observable. This can, of course take Java lambdas and datatypes, thus keeping your business code in Java should you so desire. It's clunky, though, and I wish Java had easy IO primitives like Scala.

1. https://github.com/zio/zio

2. https://typelevel.org/cats-effect/versions

rzzzt4y ago

Quasar has similar functionality: https://docs.paralleluniverse.co/quasar/

cogman104y ago

Fun fact, one of the primary loom devs wrote quasar.

AzzieElbab4y ago

That gist is from D.J. Spiewak - one of the authors of cats effect :)

jfoutz4y ago

I'm wary of unbounded thread pools. Production has a funny way of showing that threads always consume resources. A fun example is file descriptors. An unexpected database reboot is often a short outage, but it's crazy how quickly unbounded thread pools can amplify errors and delay recovery.

Anyway, they have their place, but if you've got a fancy chain of micro services calling out to wherever, think hard before putting those calls in an unbounded thread pool.

sk5t4y ago

And you should be wary! Prefer instead a bounded thread pool with a bounded queue of tasks waiting for service, and also decide explicitly what should happen when the queue fills up or wait times become too high (whatever "too high" means for the application).

jeffbee4y ago

Unbounded thread pools are bad, bounded thread pool executors with unbounded work queues are bad, and bounded thread pools with bounded queues, FIFO policies, and silent drops are also bad. There are many bad ways to do this.

dimitrov4y ago

> and bounded thread pools with bounded queues, FIFO policies, and silent drops are also bad.

Care to elaborate please? Seems like the author is recommending unbounded thread pools with bounded queues for blocking IO. Isn't that pretty similar?

jfoutz4y ago

I can't speak for the parent, some things that stand out to me

1. k8s and bare metal, when you make a bunch of threads things get slower. with the FIFO case, you can have pending requests in the queue that don't get their connection canceled event, and the same user puts another request in the queue.

2. Silently dropping is bad, you want an alert - really you want an alert when you get close, so you can add more capacity

3. bounded queue with unbounded threads is really just an unbounded queue - a short line with a mob pushing to get in line

Then, you know, memory on k8s, pod gets OOM killed. that sucks cause you have to reschedule and restart. all the pending requests are dropped.

It's very easy to make something that works, but is actually quite detrimental when things are on fire. little extra gasoline helps get over the hills, but when things are on fire, gasoline makes a bigger fire.

charleslmunger4y ago

Another tip - If you have a dynamically-sized thread pool, make it use a minimum of two threads. Otherwise developers will get used to guaranteed serialization of tasks, and you'll never be able to change it.

bobbylarrybobby4y ago

https://www.hyrumslaw.com

hellectronic4y ago

nice!

0xffff24y ago

This seems like good advice in general. Is any of it really specific to the JVM? If I was doing thread pooling with CPU and IO bound tasks, I would approach threading in a similar way in C++.

cogman104y ago

It'll depend on if your language has either coroutines or lightweight threads.

Threadpooling only matters if you have neither of those things.

Otherwise, you should be using one or the other over a thread pool. You might still spin up a threadpool for CPU bound operations, but you wouldn't have one dedicated to IO.

As of C++ 20, there are coroutines which you should be looking at (IMO).

https://en.cppreference.com/w/cpp/language/coroutines

dragontamer4y ago

Threadpools are probably better on CPU-bound bound (or CPU-ish bound tasks: like RAM-bound) without any I/O.

Coroutines / Goroutines and the like are probably better on I/O bound tasks where the CPU-effort in task-switching is significant.

--------

For example: Matrix Multiplication is better with a Threadpool. Handling 1000 simultaneous connections when you get Slashdotted (or "Hacker News hug of death") is better solved with coroutines.

cogman104y ago

I agree.

Coroutines MIGHT be more efficient if what you end up building is a statemachine anyways (as that's what most of those coroutines are doing with the compiler). Otherwise, if it's just pure parallel CPU/memory burning with little state transitions/dependence then a dedicated CPU pool fixed to roughly the number of CPU cores on the box will be the most efficient.

Heck, it can often even yield benefits to "pin" certain tasks to a thread to keep the CPU cache filled with relent data. For example, 4 threads handling the 4 quadrants of the matrix rather than having the next available thread picking up the next task.

2 more replies

0xffff24y ago

>As of C++ 20, there are coroutines which you should be looking at (IMO).

Ha! Maybe in 20 years. Sadly, I'm still writing new code targeting C++98 on one project. The most current project I'm a part of is on C++11.

valbaca4y ago

> Is any of it really specific to the JVM?

Not for languages with go/coroutines (e.g. Go, Clojure, Crystal) as those were designed specifically to help with the thread-per-IO constraint.

WatchDog4y ago

If your app is fully non-blocking, doesn't it make sense to just do everything on the one pool, CPU bound tasks and IO polling. Rather than passing messages between threads.

tadfisher4y ago

"Fully non-blocking" means "does no work". Ignoring the process' spawning thread, if your app performs CPU-bound tasks on a bounded thread pool, you will be leaving I/O throughput on the table as the number of tasks increases, since I/O-bound tasks will block on waiting for a thread.

elric4y ago

> you're almost always going to have some sort of singleton object somewhere in your application which just has these three pools, pre-configured for use

I'm bemused by this statement, and I can't figure out whether this is an assertion rooted in supreme confidence, or just idle, wishful thinking.

That being said, giving threading advice in a virtualized and containerized world is tricky. And while these three categories seem sensible, mapping the functions of any non-trivial system onto them is going to be difficult, unless the system was specifically designed around it.

u678u4y ago

With Python at first I was scared of GIL being single threaded, now I'm used to it and it works great. Thousands of threads used to be normal for my old Java projects but seems crazy to me now.

j / k navigate · click thread line to collapse

117 comments

cogman104y ago

Loom can't land fast enough!

The current issue the JVM has is that all threads have a corresponding operating system thread. That, unfortunately, is really heavy memory wise and on the OS context switcher.

cbsmith4y ago

> That, unfortunately, is really heavy memory wise and on the OS context switcher.

kllrnohj4y ago

> So, there was a time where a broad statement like that was pretty solid.

That time is approaching 20 years old at this point, too. Native threads haven't been "expensive" for a very, very long time now.

Maybe if you're in the camp of disabling overcommit it matters, but otherwise the application of green threads is definitely a specialized niche, not generally useful.

> In the broad set of use cases though, switching from a thread-based concurrency model to something else isn't going to be the big win people think it will be.

cbsmith4y ago

> That time is approaching 20 years old at this point, too. Native threads haven't been "expensive" for a very, very long time now.

> Maybe if you're in the camp of disabling overcommit it matters, but otherwise the application of green threads is definitely a specialized niche, not generally useful.

> I'd go even further and say it'll be a net-loss in most cases, especially with modern complications like heterogeneous compute.

It's so weird how we talk about the context switching costs between threads without recognizing that the thread does the poll is not the same thread that processed the IO request in the kernel.

samus4y ago

Virtual threads are going to be an /option/, not a requirement. Threads have to explicitly created as virtual threads. If this is not done, nothing will change.

1 more reply

vbezhenar4y ago

But demand is here. And probably lightweight threads is a better approach than async/await transformations.

cbsmith4y ago

It's madness.

1 more reply

user59944614y ago

>>> The default stack size (on 64-bit Linux) is 1MB

The default thread stack size is 8 or 10 MB on most Linux.

The exception is Alpine that's below 1 MB.

cbsmith4y ago

To clarify, the 1MB is the default stack size for threads with the JVM on 64-bit Linux.

Search for "-Xss": https://docs.oracle.com/en/java/javase/16/docs/specs/man/jav...

ori_b4y ago

The default reserved size is 8mb. The allocated size starts at a page (usually 4k), and grows in page sized increments as you use it.

paulddraper4y ago

Yep.

Granted there are scenarios where you want 100,000 "threads of execution." And that clearly is going to be impractical for system threads.

But if your worried about the overhead of your pool of 50 threads, stop it.

cbsmith4y ago

> Granted there are scenarios where you want 100,000 "threads of execution." And that clearly is going to be impractical for system threads.

> But if your worried about the overhead of your pool of 50 threads, stop it.

Yeah, people seem to misunderstand how thread pools work out these days. They're more limits on concurrency than anything else.

Spivak4y ago

You are ignoring the downside to green threads which is that it’s cooperative. If the thread doesn’t yield control back to the event loop then the real OS thread backing the loop is now stuck.

Green threads are great! I love them and they’re the only real solutions to really concurrent IO heavy workloads but it’s not a panacea and trades one kind of discipline for another.

cogman104y ago

Which is why the advice would be "Don't use virtual threads for CPU work".

It just so happens that a large number of JVM users are working with IO bound problems. Once you start talking about CPU bound problems the JVM tends not to be the thing most people reach for.

Loom doesn't remove the CPU bound solution by adding the IO solution. Instead, it adds a good IO solution and keeps the old CPU solution when needed.

In fact, there's already a really good pool in the JVM for common CPU bound tasks. `Forkjoin.common()`.

sudhirj4y ago

WatchDog4y ago

FWIW, Java has had `Thread#yield()`[0] since inception.

[0]: https://docs.oracle.com/javase/7/docs/api/java/lang/Thread.h...()

kaba04y ago

saurik4y ago

hn_throwaway_994y ago

brokencode4y ago

I was under the impression that Loom was implementing preemptable lightweight threads. Is that not the case?

Spivak4y ago

2 more replies

mikepurvis4y ago

It sounds like it is: https://cr.openjdk.java.net/~rpressler/loom/loom/sol1_part1....

clhodapp4y ago

I think that's not quite it:

1 more reply

kaba04y ago

neeleshs4y ago

:) sleep 0! I was trying to see if there is a way to preempt stuck threads (infinite loops etc), and wrote a small while loop replacement

  pwhile(()-> loop predicate, ()-> {loop body});

All it does is add a thread.isinterrupted check to the predicate. At this point, best to switch to Erlang !

christkv4y ago

Are we coming full circle going back a variant of the original Java green threads?

dragonwriter4y ago

> Are we coming full circle going back a variant of the original Java green threads?

There are basically two kinds of green threads:

(1) N:1, where one OS thread hosts all the application threads, and

(2) M:N, where M application threads are hosted on N OS threads.

What lots of newer systems are driving toward, to solve that, are M:N models, which can leverage all available parallelism but also provide a higher degree of concurrency.

cbsmith4y ago

Java had M:N green thread models a LOOOOONG time ago.

And Linux tried M:N thread implementations specifically to improve thread performance.

In both cases, it turned out that just using 1:1 native threads ended up being a net win.

1 more reply

jjtheblunt4y ago

I worked in Solaris internals for a while at Sun during the early java era, and Solaris threading definitely did multiplexing of userspace onto os, and then os onto cores.

Do you have a citation (because I can't find one) specifying your assertion that original Java green threads were not analogous to Solaris user -> os -> hardware multiplexing?

1 more reply

AtlasBarfed4y ago

Basically yes.

I think the JDK/JVM teams also viewed it as a maturation of the JVM to be directly using OS resources so closely across platforms, rather than "hacking" it with green threads.

These days, our high performance fanciness means the devs are demanding green thread analogues, and go/elixir/others are seemingly superior because of those.

So to remain competitive in the marketplace, Java now needs threads that aren't threads even though Java used to have threads that weren't threads.

cogman104y ago

Yes and no.

Same concept, very different implementation.

iamcreasy4y ago

So, with Loom now we can tell exactly in which order theses threads were executed as it's not up to OS to decide thread execution order anymore?

1 more reply

neeleshs4y ago

This gives some color - https://blogs.oracle.com/javamagazine/going-inside-javas-pro...

carimura4y ago

and also many resources direct from the source (those that are working on Loom): https://inside.java/tag/loom

hashmash4y ago

avereveard4y ago

More like the many-to-many threading model of the Solaris implementation of the JVM

lmilcin4y ago

I have discovered ReactiveX for Java and Reactor in particular.

I am working with Kafka and MongoDB and it is normal for my app to have a million in flight transactions at various stages of completion.

In the past it required a lot of planning (and a lot of code) but Reactor let's me build these processes as pipelines with whatever concurrency or scheduler I desire, at any stage of the processing.

This is improving our throughputs by orders of magnitude while the pipeline pulls millions of documents per second from the database.

I just don't see how Loom helps.

Loom could help if you had blocking APIs to start, but you get much better results if you just resolve to use async, non-blocking wrapped in ReactiveX.

geodel4y ago

Loom will help folks who prefer writing straightforward Java code instead of some random reactive library with obscure exception handling and poor to impossible debuggability.

Now I get it is hard for many folks to understand that part. Just like at my workplace people think it is impossible to write micro service without SpringBoot.

> Loom could help if you had blocking APIs to start, but you get much better results if you just resolve to use async, non-blocking wrapped in ReactiveX.

1 more reply

dikei4y ago

Currently for efficiency, you would need at least 2 pools: 1 small bounded pool for dequeuing the requests and create the IO operation, and 1 unbounded pool for actually executing the IO operation.

ovisOP4y ago

What benefits does loom provide vs using something like cats-effect fibres?

_old_dude_4y ago

You can actually debug the code you write because you get a real stacktrace, not few frames that shows the underlying implementation.

clhodapp4y ago

Admittedly, loom will do much better but cats-effect does try its best within the limitations of the current JVM: https://typelevel.org/cats-effect/docs/2.x/guides/tracing

Nullabillity4y ago

On the other hand, you'll spend a lot more time debugging Loom code, because it reuses the same broken-by-design thread API.

1 more reply

ackfoobar4y ago

For the team that I am in, I can see a huge productivity boost if my teammates can write in direct style instead of wrapping their heads around monads.

hamandcheese4y ago

1 more reply

bestinterest4y ago

Whats the difference between goroutines and project loom? Is their any?

cogman104y ago

Terminology mostly :D

I've not looked into the goroutine implementation, so I couldn't tell you how it compares to what I've read loom is doing.

Loom is looking to have some extremely compact stacks which means each new "virtual thread" as they are calling them will end up having bytes worth of memory allocated.

jayd164y ago

>structured concurrency

1 more reply

ccday4y ago

Not sure if it counts as structured concurrency but Go has the feature you describe: https://gobyexample.com/waitgroups

_old_dude_4y ago

vips7L4y ago

Unless I'm misunderstanding, virtual threads are preemptive: https://cr.openjdk.java.net/~rpressler/loom/loom/sol1_part1....

1 more reply

jayd164y ago

The biggest difference is probably that the JVM will support both OS and lightweight threads. That's really useful for certain things talking to the GPU in a single thread context.

jeffbee4y ago

Are you quite certain that a (linux, nptl) thread costs more memory than a goroutine? You've implied that but it's not obviously true.

dragontamer4y ago

Wouldn't any linux/nptl thread require at at least the register-state of the entire x86 (or ARM) CPU?

------

Linux/NPTL threads don't know when they are interrupted. So all register state (including AVX512 state if those are being used) needs to be saved. AVX512 x 32 is 2kB alone.

jeffbee4y ago

1 more reply

jackcviers34y ago

1. https://github.com/zio/zio

2. https://typelevel.org/cats-effect/versions

rzzzt4y ago

Quasar has similar functionality: https://docs.paralleluniverse.co/quasar/

cogman104y ago

Fun fact, one of the primary loom devs wrote quasar.

AzzieElbab4y ago

That gist is from D.J. Spiewak - one of the authors of cats effect :)

jfoutz4y ago

Anyway, they have their place, but if you've got a fancy chain of micro services calling out to wherever, think hard before putting those calls in an unbounded thread pool.

sk5t4y ago

jeffbee4y ago

dimitrov4y ago

> and bounded thread pools with bounded queues, FIFO policies, and silent drops are also bad.

Care to elaborate please? Seems like the author is recommending unbounded thread pools with bounded queues for blocking IO. Isn't that pretty similar?

jfoutz4y ago

I can't speak for the parent, some things that stand out to me

2. Silently dropping is bad, you want an alert - really you want an alert when you get close, so you can add more capacity

3. bounded queue with unbounded threads is really just an unbounded queue - a short line with a mob pushing to get in line

Then, you know, memory on k8s, pod gets OOM killed. that sucks cause you have to reschedule and restart. all the pending requests are dropped.

charleslmunger4y ago

bobbylarrybobby4y ago

https://www.hyrumslaw.com

hellectronic4y ago

nice!

0xffff24y ago

This seems like good advice in general. Is any of it really specific to the JVM? If I was doing thread pooling with CPU and IO bound tasks, I would approach threading in a similar way in C++.

cogman104y ago

It'll depend on if your language has either coroutines or lightweight threads.

Threadpooling only matters if you have neither of those things.

Otherwise, you should be using one or the other over a thread pool. You might still spin up a threadpool for CPU bound operations, but you wouldn't have one dedicated to IO.

As of C++ 20, there are coroutines which you should be looking at (IMO).

https://en.cppreference.com/w/cpp/language/coroutines

dragontamer4y ago

Threadpools are probably better on CPU-bound bound (or CPU-ish bound tasks: like RAM-bound) without any I/O.

Coroutines / Goroutines and the like are probably better on I/O bound tasks where the CPU-effort in task-switching is significant.

--------

For example: Matrix Multiplication is better with a Threadpool. Handling 1000 simultaneous connections when you get Slashdotted (or "Hacker News hug of death") is better solved with coroutines.

cogman104y ago

I agree.

2 more replies

0xffff24y ago

>As of C++ 20, there are coroutines which you should be looking at (IMO).

Ha! Maybe in 20 years. Sadly, I'm still writing new code targeting C++98 on one project. The most current project I'm a part of is on C++11.

valbaca4y ago

> Is any of it really specific to the JVM?

Not for languages with go/coroutines (e.g. Go, Clojure, Crystal) as those were designed specifically to help with the thread-per-IO constraint.

WatchDog4y ago

If your app is fully non-blocking, doesn't it make sense to just do everything on the one pool, CPU bound tasks and IO polling. Rather than passing messages between threads.

tadfisher4y ago

elric4y ago

> you're almost always going to have some sort of singleton object somewhere in your application which just has these three pools, pre-configured for use

I'm bemused by this statement, and I can't figure out whether this is an assertion rooted in supreme confidence, or just idle, wishful thinking.

u678u4y ago

With Python at first I was scared of GIL being single threaded, now I'm used to it and it works great. Thousands of threads used to be normal for my old Java projects but seems crazy to me now.

j / k navigate · click thread line to collapse