Revisiting the Classics: Jensen's Inequality (2023) (opens in new tab)

(francisbach.com)

89 pointscpp_frog1y ago8 comments

8 comments

And the extent to which the expectation of the function of the random variable exceeds the function of the expectation of the random variable depends on the variable’s variability (or variance), as can be seen eg by a Taylor expansion around the expectation.

That’s the reason why linear (or affine) financial derivatives (such as forwards) can be priced without using volatility as an input, while products with convexity (such as options) require volatility as an input.

(Side note: I think Delta One desks should rename to Gamma Zero…)

thehappyfellow1y ago

The proof of Young’s inequality is pretty neat but has the „magically think of taking a log of an arbitrary expression which happens to work” step. But it clarifies why the reciprocals of exponents have to sum up to 1: they are interpreted as probabilities when calculating expected value.

Here’s how I like to conceptualise it: bounding mixed variable product by sum of single variable terms is useful. Logarithms change multiplication to addition. Jensen’s inequality lifts addition from the argument of a convex function outside. Compose.

contravariant1y ago

You've got a product on one side and what looks like a convex combination on the other, taking the log and applying Jensen's inequality isn't as big a leap as it may sound.

thehappyfellow1y ago

Agreed, provided you have both sides of the inequality. Coming up with that particular convex combination is a bit of a leap that’s not super intuitive to me.

SpaceManNabs1y ago

if you work with a lot of convex optimization, it comes up pretty often. for example, if you learn fenchel conjugates, the lead up and motivation to learning them will often necessitate proving young's inequality with jensen's inequality. that is why learning different maths is cool. you intuit some ways to reshape the problem in order to make these "not super intuitive" connections.

contravariant1y ago

It often happens that coming up with the right theorem is a lot harder than finding its proof, but that's life. You can't have everything be easy, otherwise we'd have finished by now.

maxmininflect1y ago

A very natural explanation of "wikipedia proof 2" for differentiable functions seems to be missing:

By linearity of expectation, both sides are linear in f, and for linear f we have equality. Let's subtract the linear function whose graph is the tangent hyperplane to f at E(X). By above, this does not change the validity of the inequality. But now the left hand side is 0, and right hand side is non-negative by convexity, so we are done.

It's also now clear what the difference of the two sides is -- it's the expectation of the gap between f(X) an and the value of the tangent plane at X.

Now in general replace tangent hyperplane with graph of a subderivative, to recover what wiki says.

keithalewis1y ago

A simpler definition of a convex function f is f(x) = sup { l(x) | l <= f where l is linear }.

If l <= f is linear then E[f(X)] >= E[l(X)] = l(E[X]). Taking the sup shows E[f(X)] >= f(E[X]).

j / k navigate · click thread line to collapse

8 comments

FabHK1y ago

(Side note: I think Delta One desks should rename to Gamma Zero…)

thehappyfellow1y ago

contravariant1y ago

You've got a product on one side and what looks like a convex combination on the other, taking the log and applying Jensen's inequality isn't as big a leap as it may sound.

thehappyfellow1y ago

Agreed, provided you have both sides of the inequality. Coming up with that particular convex combination is a bit of a leap that’s not super intuitive to me.

SpaceManNabs1y ago

contravariant1y ago

It often happens that coming up with the right theorem is a lot harder than finding its proof, but that's life. You can't have everything be easy, otherwise we'd have finished by now.

maxmininflect1y ago

A very natural explanation of "wikipedia proof 2" for differentiable functions seems to be missing:

It's also now clear what the difference of the two sides is -- it's the expectation of the gap between f(X) an and the value of the tangent plane at X.

Now in general replace tangent hyperplane with graph of a subderivative, to recover what wiki says.

keithalewis1y ago

A simpler definition of a convex function f is f(x) = sup { l(x) | l <= f where l is linear }.

If l <= f is linear then E[f(X)] >= E[l(X)] = l(E[X]). Taking the sup shows E[f(X)] >= f(E[X]).

j / k navigate · click thread line to collapse