It’s very common for scale-out architectures to read more data than is ultimately returned, because the former is pulled from individual shards and then some centralized filtering / post processing is applied in some API middleware layer.
Trying to fix that by pushing down more of the query/execution is sometimes but not always feasible or practical.