My heuristic is how much noise is in the closest vectors. Even if the top k matches seem good, if the following noise has practically identical distance scores, it is going to fail a lot in practice. Ideally you could calculate some constant threshold so that everything closer is relevant and everything further is irrelevant.