No, it's not either of those, it's just launching useless threads, plus all the down-stream effects of launching useless threads, e.g. if you have blending on, that will block the ROP unit which needs to wait for the threads for a given pixel in-order. If you have depth write on, that will move the write to late-Z.
More vertices is not a big problem, doubling your vertex count is not a big deal, since most GPUs process vertices in groups of 32 or more, and whether multiple instances get packed in the same group depends on the GPU vendor.