They then provide you with a very naive implementation that runs on their (very simple) VLIW architecture that you are to optimize.
If at the end of that someone is still lost I think it is safe to say it was their goal that person should fail.
The problem is about pipelining memory loads and ALU operations, so why not just give clear documentatation and state the task rather than "here's a kernel - optimize it"? \_(ツ)_/
And perhaps a third purpose is to use the simulator to test your ability to reason about hardware that you are only just getting familiar with.
Maybe they specified the challenge in this half-assed way to deliberately test those sorts of skills (even if irrelevant to the job), or maybe it was just lazily put together.
The other thing to note is that if you look at what the reference_kernel() is actually doing, it really looks like a somewhat arbitrary synthetic task (hashes, wraparound), so any accurate task specification would really need to be a "line by line" description of the steps, at which point you may as well just say "here's some code - do this".