That’s the reason why linear (or affine) financial derivatives (such as forwards) can be priced without using volatility as an input, while products with convexity (such as options) require volatility as an input.
(Side note: I think Delta One desks should rename to Gamma Zero…)
Here’s how I like to conceptualise it: bounding mixed variable product by sum of single variable terms is useful. Logarithms change multiplication to addition. Jensen’s inequality lifts addition from the argument of a convex function outside. Compose.
By linearity of expectation, both sides are linear in f, and for linear f we have equality. Let's subtract the linear function whose graph is the tangent hyperplane to f at E(X). By above, this does not change the validity of the inequality. But now the left hand side is 0, and right hand side is non-negative by convexity, so we are done.
It's also now clear what the difference of the two sides is -- it's the expectation of the gap between f(X) an and the value of the tangent plane at X.
Now in general replace tangent hyperplane with graph of a subderivative, to recover what wiki says.
If l <= f is linear then E[f(X)] >= E[l(X)] = l(E[X]). Taking the sup shows E[f(X)] >= f(E[X]).