The other day I was talking about something related with my daughter, for example, who was learning about rounding in elementary school — we ended up in a discussion about accuracy in calculations versus "number of operations" very vaguely speaking (in elementary school terms), and the tradeoff, and how you're always rounding at some level in practice, so that tradeoff always exists at some level.
I also do research in information theory, and somehow the topic Tao discusses seems related. In that area, there's always some potential or actual loss of information due to information and computational constraints, things are always being discretized, and some representation always has some information cost. What Tao is talking about is an information cost, but cast in terms of numerical accuracy rather than stochastic terms.
This is all very vague in my head but it seems like there is some path from stochastic information costs of representation to deterministic information costs of representation, along the lines of approximations and limits. People use probabilistic arguments in proofs, for instance, and there's pseudorandom numbers; I imagine you could treat both what Tao is talking about and more traditional information theory problems in the same framework.
Convergence is presented as just a pattern. It doesn't have to be economic, but the example naturally suggests convergence, so that's ok.
But continuity and differentiability didn't make sense either. You don't "buy" continuity. There's no (increasing) value attached to smaller intervals, at least not in my understanding of it.
So, I'm way, way below the Olympic status of Terry Tao, but he might be abstracting a bit too much here. This may not help students understand the topic.
On a much more basic level, I plugged in e to formulae throughout my schooling to the age of 18, and only later realised that $e is equal to the amount of interest you'd have on a bank account of $1 if you applied 100% interest continuously compounded.
Systems of linear inequalities became transparent to me when I took a class on optimization and learned linear programming from the perspective of polytope geometry.
The basic concept is that you can define a halfspace by a linear inequality of the a^T x <= b. This means that taking the intersection of multiple halfspaces is the same as having multiple linear inequalities active simultaneously, which could be rewritten in matrix form as A x <= b. The intersection of two convex sets is again convex, and a halfspace is obvious convex, so it's clear that A x <= b is a convex polytope (polygon in 2D, polyhedron in 3D).
Systems of nonlinear inequalities are more complicated but you can sometimes approach them similarly.
This style of thinking is much more approachable for me because I have an easy time playing with these kinds of geometric objects in my head.