In person, a single proctor can monitor 200+ students; our eyes capture far more detail and field of view than computer screens, and our peripheral vision is tuned for detecting unexpected motion. With Zoom, either you have one meeting with everyone in it, in which case students can look directly at each other, or you have a separate meeting for every student, in which case you need a large number of devices, all visible at the same time. I can't see the former scaling past 49, and I can't see the latter scaling past 20. If you had some software designed specifically for this and several big screens, it would still be pretty hard to pay anywhere near as close attention as you can in person.
On top of that, there's always a bunch of annoying mucking about getting set up for Zoom invigilation: angle of camera, light, checking environment, etc. All that needs communication to and fro, and it can take up to 5 minutes for a single student. Now multiply that by say 100 (a bit of parallelism is possible, but individual communication is needed with each student).