Written by Alexei Gilchrist, updated months ago
Quick introduction to convex sets, convex functions and Jensen's inequality
Level: 2, Subjects: Convexity

1 A convex set

A set \(C\) is convex if for all \(a,b\in C\) and \(0<\lambda<1\), the element \(\lambda a + (1-\lambda)b\) is also in \(C\).

Geometrically, if a line is connected between \(a\) and \(b\) then every point on the line is also in the set as depicted below.

Convex set: every point on a line connecting to members is also in the set

2 A convex function

A function \(f\) is convex over the convex domain \(C\) if for all \(x_{1},x_{2}\in C\) and \(\lambda\in[0,1]\) \begin{equation} f(\lambda x_{1}+(1-\lambda)x_{2}) \le \lambda f(x_{1}) + (1-\lambda)f(x_{2}) \end{equation} The function is strictly convex if equality only holds for \(\lambda=0\) or \(\lambda=1\). Note that if \(-f\) is convex we call the function concave.

In a convex function every chord lies above or on the function.

If the domain is the real numbers and \(f\) is twice differentiable then \(d^{2}f/dx^{2}\ge 0\) implies \(f\) is convex, and if \(d^{2}f/dx^{2}> 0\) then \(f\) is strictly convex.

3 Jensen's inequality

For a convex function \(f(x)\), \begin{equation} \langle f(x) \rangle \ge f(\langle x\rangle) \end{equation} where \(\langle\cdot\rangle\) denotes average.

If equality holds \(\langle f(x) \rangle = f(\langle x\rangle)\) and \(f(x)\) is strictly convex, then \(x_{j}=\langle x\rangle\).

Jensen's inequality can be easily proved by induction on the number of terms in the average:

Proof: With \(m\) terms, the averages are \(\langle x\rangle = \sum_{j=1}^{m} p_{j} x_{j}\) and \(\langle f(x) \rangle= \sum_{j=1}^{m}p_{j} f(x_{j})\) where \(p_{j}\) is the probability of \(x_{j}\). The case \(m=1\) is trivially true. The case \(m=2\) is \begin{equation} p_{1}f(x_{1})+p_{2}f(x_{2}) \ge f(p_{1}x_{1}+p_{2}x_{2}) \end{equation} which is just the definition of convexity for \(f(x)\). Now assume the result holds for \(m=n\) and look at \(m=n+1\). First note that if \(p_{1}=1\) then all the other probabilities are zero and we are back to the case \(m=1\) so we can assume \(p_{1}<1\). \begin{align} \langle f(x) \rangle\equiv \sum_{j=1}^{m+1}p_{j} f(x_{j}) &= p_{1}f(x_{1})+ (1-p_{1})\sum_{j=2}^{n+1}\frac{p_{j}}{1-p_{1}}f(x_{j})\\ &\ge p_{1}f(x_{1})+(1-p_{1})f\left(\sum_{j=2}^{n+1}\frac{p_{j}}{1-p_{1}}x_{j}\right)\\ &\ge f\left(p_{1}x_{1}+(1-p_{1})\sum_{j=2}^{n+1}\frac{p_{j}}{1-p_{1}}x_{j} \right)\\ &= f\left(\sum_{j=1}^{m+1}x_{j}\right) \equiv f(\langle x \rangle) \end{align} The first inequality is from using the assumed expression for \(m=n\), and the second from the expression for \(m=2\).