Therefore, the scanning is local. There's really nothing more to it: The distinction is based on where the input is read from, in addition to where the input is processed. Both are happening inside the phone while you hold it in your hand.
It is scanning images locally.
This is totally unacceptable, and should never become acceptable.