Quick introduction to convex sets, convex functions and Jensen's inequality

A set \(C\) is *convex* if for all \(a,b\in C\) and \(0<\lambda<1\), the element \(\lambda a + (1-\lambda)b\) is also in \(C\).

Geometrically, if a line is connected between \(a\) and \(b\) then every point on the line is also in the set as depicted below.

A function \(f\) is *convex* over the convex domain \(C\) if for all \(x_{1},x_{2}\in C\) and \(\lambda\in[0,1]\)
\begin{equation}
f(\lambda x_{1}+(1-\lambda)x_{2}) \le \lambda f(x_{1}) + (1-\lambda)f(x_{2})
\end{equation}
The function is *strictly convex* if equality only holds for \(\lambda=0\) or \(\lambda=1\). Note that if \(-f\) is convex we call the function *concave*.

If the domain is the real numbers and \(f\) is twice differentiable then \(d^{2}f/dx^{2}\ge 0\) implies \(f\) is convex, and if \(d^{2}f/dx^{2}> 0\) then \(f\) is strictly convex.

For a convex function \(f(x)\), \begin{equation} \langle f(x) \rangle \ge f(\langle x\rangle) \end{equation} where \(\langle\cdot\rangle\) denotes average.

If equality holds \(\langle f(x) \rangle = f(\langle x\rangle)\) and \(f(x)\) is strictly convex, then \(x_{j}=\langle x\rangle\).

Jensen's inequality can be easily proved by induction on the number of terms in the average:

Proof: With \(m\) terms, the averages are \(\langle x\rangle = \sum_{j=1}^{m} p_{j} x_{j}\) and \(\langle f(x) \rangle= \sum_{j=1}^{m}p_{j} f(x_{j})\) where \(p_{j}\) is the probability of \(x_{j}\). The case \(m=1\) is trivially true.
The case \(m=2\) is
\begin{equation}
p_{1}f(x_{1})+p_{2}f(x_{2}) \ge f(p_{1}x_{1}+p_{2}x_{2})
\end{equation}
which is just the definition of convexity for \(f(x)\). Now assume the result holds for \(m=n\) and look at \(m=n+1\). First note that if \(p_{1}=1\) then all the other probabilities are zero and we are back to the case \(m=1\) so we can assume \(p_{1}<1\).
\begin{align}
\langle f(x) \rangle\equiv \sum_{j=1}^{m+1}p_{j} f(x_{j}) &= p_{1}f(x_{1})+ (1-p_{1})\sum_{j=2}^{n+1}\frac{p_{j}}{1-p_{1}}f(x_{j})\\
&\ge p_{1}f(x_{1})+(1-p_{1})f\left(\sum_{j=2}^{n+1}\frac{p_{j}}{1-p_{1}}x_{j}\right)\\
&\ge f\left(p_{1}x_{1}+(1-p_{1})\sum_{j=2}^{n+1}\frac{p_{j}}{1-p_{1}}x_{j} \right)\\
&= f\left(\sum_{j=1}^{m+1}x_{j}\right) \equiv f(\langle x \rangle)
\end{align}
The first inequality is from using the assumed expression for \(m=n\), and the second from the expression for \(m=2\).