Written by Alexei Gilchrist, updated months ago
Quick introduction to convex sets, convex functions and Jensen's inequality
Level: 2, Subjects: Convexity

## 1 A convex set

A set $$C$$ is convex if for all $$a,b\in C$$ and $$0<\lambda<1$$, the element $$\lambda a + (1-\lambda)b$$ is also in $$C$$.

Geometrically, if a line is connected between $$a$$ and $$b$$ then every point on the line is also in the set as depicted below. Convex set: every point on a line connecting to members is also in the set

## 2 A convex function

A function $$f$$ is convex over the convex domain $$C$$ if for all $$x_{1},x_{2}\in C$$ and $$\lambda\in[0,1]$$ \begin{equation} f(\lambda x_{1}+(1-\lambda)x_{2}) \le \lambda f(x_{1}) + (1-\lambda)f(x_{2}) \end{equation} The function is strictly convex if equality only holds for $$\lambda=0$$ or $$\lambda=1$$. Note that if $$-f$$ is convex we call the function concave. In a convex function every chord lies above or on the function.

If the domain is the real numbers and $$f$$ is twice differentiable then $$d^{2}f/dx^{2}\ge 0$$ implies $$f$$ is convex, and if $$d^{2}f/dx^{2}> 0$$ then $$f$$ is strictly convex.

## 3 Jensen's inequality

For a convex function $$f(x)$$, \begin{equation} \langle f(x) \rangle \ge f(\langle x\rangle) \end{equation} where $$\langle\cdot\rangle$$ denotes average.

If equality holds $$\langle f(x) \rangle = f(\langle x\rangle)$$ and $$f(x)$$ is strictly convex, then $$x_{j}=\langle x\rangle$$.

Jensen's inequality can be easily proved by induction on the number of terms in the average:

Proof: With $$m$$ terms, the averages are $$\langle x\rangle = \sum_{j=1}^{m} p_{j} x_{j}$$ and $$\langle f(x) \rangle= \sum_{j=1}^{m}p_{j} f(x_{j})$$ where $$p_{j}$$ is the probability of $$x_{j}$$. The case $$m=1$$ is trivially true. The case $$m=2$$ is \begin{equation} p_{1}f(x_{1})+p_{2}f(x_{2}) \ge f(p_{1}x_{1}+p_{2}x_{2}) \end{equation} which is just the definition of convexity for $$f(x)$$. Now assume the result holds for $$m=n$$ and look at $$m=n+1$$. First note that if $$p_{1}=1$$ then all the other probabilities are zero and we are back to the case $$m=1$$ so we can assume $$p_{1}<1$$. \begin{align} \langle f(x) \rangle\equiv \sum_{j=1}^{m+1}p_{j} f(x_{j}) &= p_{1}f(x_{1})+ (1-p_{1})\sum_{j=2}^{n+1}\frac{p_{j}}{1-p_{1}}f(x_{j})\\ &\ge p_{1}f(x_{1})+(1-p_{1})f\left(\sum_{j=2}^{n+1}\frac{p_{j}}{1-p_{1}}x_{j}\right)\\ &\ge f\left(p_{1}x_{1}+(1-p_{1})\sum_{j=2}^{n+1}\frac{p_{j}}{1-p_{1}}x_{j} \right)\\ &= f\left(\sum_{j=1}^{m+1}x_{j}\right) \equiv f(\langle x \rangle) \end{align} The first inequality is from using the assumed expression for $$m=n$$, and the second from the expression for $$m=2$$.