Update - I'm still cautious about this paper, but I had the table numbers inverted in my head while thinking about it. The paper shows better perplexity results than competing models at larger parameter sizes, so I was wrong.
I was pretty unhappy and suspicious for the same reason. Not reporting perplexity for a 70B network while reporting its efficiency means that someone did something and the result wasn't good enough to put in the paper.
"Is not fully trained" can also mean "we did not figure out how to reach an acceptable loss" or "training was unstable," both of which are common for ML systems.
One can forgive the lack of quality results for the 70B model, but apparently they trained 7B and 13B versions of their model, and don't report those either.