Read again the proposal.
A top level thread scans the opcodes only to solve this, with no decoding and no writing, thus progressing faster in the stream than the child chunked decoding threads it progressively spawns.
Not as quick as a format with a chunk table, but faster than naive single core.