It solves the problem for environments where problems like interrupt latency and timing criticality usually show up - embedded and real-time systems. In many systems, the set of running tasks in a system is fixed - there are even some very simple real-time operating systems (such as some OSEK configurations in the automotive sector) which require to statically define the set of tasks at compile time. After all, you don't suddenly feel the urge to start a game of Doom on your car's ABS controller :) (though, of course, somebody will try to do this...).
The (early) XMOS chips, for example, run at 500 MHz with four threads or, if you needed more threads, you could also configure the system to run eight threads at half the speed IIRC. If you used e.g. three threads, some execution time remained unused in the four-thread mode, there was no arbitrary division of time by the number of threads.
For real-time critical systems, you could then still run up to seven critical threads at guaranteed speed and reserve the remaining one for non timing-critical tasks (which you could then to schedule using cooperative multitasking).
The RAM was a fast on-chip SRAM, so there were no problems with refresh, access latencies etc. that you have with DRAM. However, you were constrained to 64 kB RAM per core (probably not enough to run Doom...).
The XMOS development toolchain even includes a real-time analyzer for the C/C++ code you throw at it. Unfortunately, most of the XMOS toolchain is closed source.