Forgive me, it was a Monday morning when we reshot this. I think the final cart was the original PRGROM + MMC3 / 32 CHRROM banks, so it was probably more like 270k total. The video playback I was commenting on entirely fit into the 256K. In my cursory understanding of the MMC3, I think that's the limit that still fits into a normal mapper 4 and will run on device - I know you can up the PRGROM (add PRGRAM), but that would kill our direct cart --> PPU path for the fast playback.
As for the processor, we were definitely pushing the limit with the design we were coding - we had to back off from glitching quite a bit due to busting the number of cycles we had before PPU. I'm sure we could do better with what we know now, or a lot more assembly tuning, but there was the hard limits on being ready by hackday!