To the same extent that you need to prove the correctness of the compiler and the language runtime I guess. Arbitrary width integers are a core feature of quite a few languages.
Given Lean's raison d'etre it wouldn't surprise me if the relevant library code was verified. If using another language such as Python I think it would be fairly reasonable to take the underlying implementation on faith. If we're going to start questioning whether a core part of the runtime that implements basic arithmetic operations has bugs then why are we not also suspecting the operations provided by the CPU and other hardware?
We probably should suspect those things, but as a matter of practicality we're going to need a boundary at present.