undefined | Better HN

0 pointsimmibis1y ago0 comments

So... What is it seeking to optimize? Why did you need a thread pool before but not any more? What resource was exhausted to prevent you from putting every request on a thread?

0 comments

chipdart1y ago

> So... What is it seeking to optimize?

The goal is to maximize the number of tasks you can run concurrently, while imposing on the developers a low cognitive load to write and maintain the code.

> Why did you need a thread pool before but not any more?

You still need a thread pool. Except with virtual threads you are no longer bound to run a single task per thread. This is specially desirable when workloads are IO-bound and will expectedly idle while waiting for external events. If you have a never-ending queue of tasks waiting to run, why should you block a thread consuming that task queue by running a task that stays idle while waiting for something to happen? You're better off starting the task and setting it aside the moment it awaits for something to happen.

> What resource was exhausted to prevent you from putting every request on a thread?

riku_iki1y ago

> why should you block a thread

if creating gazillion threads on modern hardware is super cheap why not? I have transparency and debuggability what threads are running, can check stacktrace of each and what are they blocked on.

virtual threads adds lots of magic under the hood, and if there will be some bug or lib in your infra with no vthreads support it is absolutely not clear how to debug it.

chipdart1y ago

> if creating gazillion threads on modern hardware is super cheap why not?

Virtual threads are a performance improvement over threads, no matter how cheap to create threads are. Virtual threads run on threads. If threads become cheaper to create, so do virtual threads. They are not mutually exclusive.

Virtual threads are on top of that a developer experience improvement. Code is easier to write and maintain.

Virtual threads improve throughput because the moment a task is waiting for anything like IO, the thread is able to service any other task in the queue.

1 more reply

stoperaticless1y ago

Each thread adds overhead.

Some usage types don’t care, some do.

From what I gather virtual threads are an alternative to “callback-hell” (js) or async coloring (python).

1 more reply

gregopet1y ago

It's mainly trying to make you not worry about how many threads you create (and not worry about the caveats that come with optimising how many threads you create, which is something you are very often forced to do).

You can create a thread in your code and not worry whether that thing will then be some day run in a huge loop or receive thousands of requests and therefore spend all your memory on thread overhead. Go and other languages (in Java's ecosystem there's Kotlin for example) employ similar mechanisms to avoid native thread overhead, but you have to think about them. Like, there's tutorial code where everything is nice & simple, and then there's real world code where a lot of it must run in these special constructs that may have little to do with what you saw in those first "Hello, world" samples.

Java's approach tries to erase the difference between virtual and real threads. The programmer should have to employ no special techniques when using virtual threads and should be able to use everything the language has to offer (this isn't true in many languages' virtual/green threads implementations). Old libraries should continue working and perhaps not even be aware they're being run on virtual threads (although, caveats do apply for low level/high performance stuff, see above posts). And libraries that you interact with don't have to care what "model" of green threading you're using or specifically expose "red" and "blue" functions.

giamma1y ago

You will still have to worry, too many virtual threads will imply too much context switching. However, virtual threads will be always interruptable on I/O, as they are not mapped to actual o.s. threads, but rather simulated by the JVM which will executed a number of instructions for each virtual thread.

This gives the chance to the JVM to use real threads more efficiently, avoiding that threads remain unused while waiting on I/O (e.g. a response from a stream). As soon as the JVM detects that a physical thread is blocked on I/O, a semaphore, a lock or anything, it will reallocate that physical thread to running a new virtual thread. This will reduce latency, context switch time (the switching is done by the JVM that already globally manages the memory of the Java process in its heap) and will avoid or at least largely reduce the chance that a real thread remains allocated but idle as it's blocked on I/O or something else.

frant-hartm1y ago

What do you mean by context switching?

My understanding is that virtual threads mostly eliminate context switching - for N CPUs JVM creates N platform threads and they run virtual threads as needed. There is no real context switching apart from GC and other JVM internal threads.

A platform thread picking another virtual thread to run after its current virtual thread is blocked on IO is not a context switch, that is an expensive OS-level operation.

3 more replies

immibisOP1y ago

It seems that the answer to the question was "memory". Stack allocations, presumably. You have answered by telling us that virtual threads are better than real threads because real threads suck, but you didn't say why they suck or why virtual threads don't suck in the same way.

mike_hearn1y ago

Real threads don't suck but they pay a price for generality. The kernel doesn't know what software you're going to run, and there's no standards for how that software might use the stack. So the kernel can't optimize by making any assumptions.

Virtual threads are less general than kernel threads. If you use a virtual thread to call out of the JVM you lose their benefits, because the JVM becomes like the kernel and can't make any assumptions about the stack.

But if you are running code controlled by the JVM, then it becomes possible to do optimizations (mostly stack related) that otherwise can't be done, because the GC and the compiler and the threads runtime are all developed together and work together.

Specifically, what HotSpot can do moving stack frames to and from the heap very fast, which interacts better with the GC. For instance if a virtual thread resumes, iterates in a loop and suspends again, then the stack frames are never copied out of the heap onto the kernel stack at all. Hotspot can incrementally "pages" stack frames out of the heap. Additionally, the storage space used for a suspended virtual thread stack is a lot smaller than a suspended kernel stack because a lot of administrative goop doesn't need to be saved at all.

brabel1y ago

OS Threads do not suck, they're great. But they are expensive to create as they require a syscall, and they're expensive to maintain as they consume quite a bit of memory just to exist, even if you don't need it (due to how they must pre-allocate a stack which apparently is around 2MB initially, and can't be made smaller as in most cases you will need even more, so it would make most cases worse).

Virtual Threads are very fast to create and allocate only the memory needed by the actual call stack, which can be much less than for OS Threads.

Also, blocking code is very simple compared to the equivalent async code. So using blocking code makes your code much easier to follow. Check out examples of reactive frameworks for Java and you will quickly understand why.

1 more reply

jmaker1y ago

Briefly: The cost of spawning schedulable entities, memory and the time to execution. Virtual threads, i.e., fibers, entertain lightweight stacks. You can spawn as many as you like immediately. Your runtime system won’t go out of memory as easily. In addition, the spawning happens much faster in user space. You’re not creating kernel threads, which is a limited and not cheap resource, whence the pooling you’re comparing it to. With virtual threads you can do thread per request explicitly. It makes most sense for IO-bound tasks.

davidgay1y ago

A thread per request has a high risk of overcommitting on CPU use, leading to a different set of problems. Virtual threads are scheduled on a fixed-size (based on number of cores) underlying (non-virtual) thread pool to avoid this problem.

immibisOP1y ago

Why can't virtual threads overcommit CPU use? If I have 4 CPUs and 4000 virtual threads running CPU-bound code, is that not overcommit? A system without overcommit would refuse to create the 5th thread.

detinho1y ago

I think parent is saying overcommit with OS threads. 4k requests = 4k OS threads. That would lead to the problems parent is talking about.

1 more reply

gifflar1y ago

This article nicely describes the differences between threads and virtual threads: https://www.infoq.com/articles/java-virtual-threads/

I think it’s definitely worth a read.

twic1y ago

The memory overhead of threads.

j / k navigate · click thread line to collapse

0 comments

chipdart1y ago

> So... What is it seeking to optimize?

The goal is to maximize the number of tasks you can run concurrently, while imposing on the developers a low cognitive load to write and maintain the code.

> Why did you need a thread pool before but not any more?

> What resource was exhausted to prevent you from putting every request on a thread?

riku_iki1y ago

> why should you block a thread

if creating gazillion threads on modern hardware is super cheap why not? I have transparency and debuggability what threads are running, can check stacktrace of each and what are they blocked on.

virtual threads adds lots of magic under the hood, and if there will be some bug or lib in your infra with no vthreads support it is absolutely not clear how to debug it.

chipdart1y ago

> if creating gazillion threads on modern hardware is super cheap why not?

Virtual threads are on top of that a developer experience improvement. Code is easier to write and maintain.

Virtual threads improve throughput because the moment a task is waiting for anything like IO, the thread is able to service any other task in the queue.

1 more reply

stoperaticless1y ago

Each thread adds overhead.

Some usage types don’t care, some do.

From what I gather virtual threads are an alternative to “callback-hell” (js) or async coloring (python).

1 more reply

gregopet1y ago

giamma1y ago

frant-hartm1y ago

What do you mean by context switching?

A platform thread picking another virtual thread to run after its current virtual thread is blocked on IO is not a context switch, that is an expensive OS-level operation.

3 more replies

immibisOP1y ago

mike_hearn1y ago

brabel1y ago

Virtual Threads are very fast to create and allocate only the memory needed by the actual call stack, which can be much less than for OS Threads.

1 more reply

jmaker1y ago

davidgay1y ago

immibisOP1y ago

detinho1y ago

I think parent is saying overcommit with OS threads. 4k requests = 4k OS threads. That would lead to the problems parent is talking about.

1 more reply

gifflar1y ago

This article nicely describes the differences between threads and virtual threads: https://www.infoq.com/articles/java-virtual-threads/

I think it’s definitely worth a read.

twic1y ago

The memory overhead of threads.

j / k navigate · click thread line to collapse