Proofs of Karamata inequality and Muirhead inequality

The main goal of this post is to prove the fore mentioned inequalities, via separation theorem of functional analysis. The main result would be a proof of the result of Birkhorff:

If $\displaystyle (x_1,x_2,\cdots,x_n) \succ (y_1,y_2,\cdots,y_n)$, where the arrays $(x_1,x_2,\cdots,x_n)$, $(y_1,y_2,\cdots,y_n) \in \mathbb{R}^n$, then there exists non-negative numbers $a_{\sigma}$ ($\sigma \in S_n$) with $\sum_{\sigma \in S_n} a_{\sigma} = 1$, such that

$\displaystyle \sum_{\sigma \in S_n} a_{\sigma} \left( x_{\sigma(1)}, x_{\sigma(2)},\cdots,x_{\sigma(n)} \right) = (y_1, y_2, \cdots, y_n)$.

1. Definition and tools

Let $\textbf{x} = (x_1,x_2,\cdots,x_n) \succ \textbf{y} = (y_1,y_2,\cdots,y_n)$, where the arrays $(x_1,x_2,\cdots,x_n)$be arrays in $\mathbb{R}^n$. We say that $\textbf{x}$ majorizes $\textbf{y}$, in symbol $\textbf{x} \succ \textbf{y}$, if

1. $x_1 \ge x_2 \ge \cdots \ge x_n$ and $y_1 \ge y_2 \ge \cdots \ge y_n$
2. $x_1 + \cdots x_k \ge y_1 + \cdots y_k$ for all $1 \leq k \leq n-1$
3. $x_1 + \cdots + x_n = y_1 + \cdots + y_n$

Examples:

1. $(10, 0, 0) \succ (4, 3, 3)$
2. $(n, 0, \cdots , 0) \succ (1, 1, \cdots , 1)$ (both arrays have n terms)

To prove the theorems in this note, we will need some theorems listed here:

• Strong Separation Theorem

Let A,B be disjoint, nonempty convex subsets of a topological vector space X. If A is compact, B is closed and X is locally convex, then there is $f \in X^*$ such that

$\displaystyle sup\{ \textrm{Re} f(x): x \in A \} < inf \{ \textrm{Re} f(y): y \in B \}$.

• Rearrangement Inequality

Let $x_1 \ge \cdots \ge x_n$, $y_1 \ge \cdots \ge y_n$ be two sequences of real numbers. Then

$\displaystyle \sum_{i=1}^n x_iy_i = \textrm{max}_{\sigma, \tau \in S_n} \sum_{i=1}^n x_{\sigma(i)}y_{\tau(i)}$

$\displaystyle \sum_{i=1}^n x_iy_{n-i} = \textrm{min}_{\sigma, \tau \in S_n} \sum_{i=1}^n x_{\sigma(i)}y_{\tau(i)}$

• Jensen’s inequality

Let $f:\mathbb{R} \rightarrow \mathbb{R}$ be a convex function. Then for real numbers $a_1, \cdots , a_n \ge 0$ such that $a_1 + \cdots + a_n = 1$ and $x_1, \cdots, x_n \in \mathbb{R}$, we have

$\sum_{i=1}^n a_i f(x_i) \ge f \left(\sum_{i=1}^n a_ix_i \right)$

• Weighted AM-GM inequality

for real numbers $a_1, \cdots , a_n \ge 0$ such that $a_1 + \cdots + a_n = 1$ and $x_1, \cdots, x_n \ge 0$, we have

$\displaystyle \sum_{i=1}^n a_ix_i \ge \prod_{i=1}^n x_i^{a_i}$

2. Main Theorem

Theorem (Birkhoff 1):

If $(x_1,x_2,\cdots,x_n) \succ (y_1,y_2,\cdots,y_n)$, where the arrays $(x_1,x_2,\cdots,x_n)$, $(y_1,y_2,\cdots,y_n) \in \mathbb{R}^n$, then there exists non-negative numbers $a_{\sigma}$ ($\sigma \in S_n$) with $\sum_{\sigma \in S_n} a_{\sigma} = 1$, such that

$\displaystyle \sum_{\sigma \in S_n} a_{\sigma} \left( x_{\sigma(1)}, x_{\sigma(2)},\cdots,x_{\sigma(n)} \right) = (y_1, y_2, \cdots, y_n)$.

Geometrically, this means that the sequences majorized by $(x_1,x_2,\cdots,x_n) \in \mathbb{R}^n$ are exactly the convex hull formed by points with coordinates being all the permutations of $(x_1,x_2,\cdots,x_n)$.

Proof of Birkhoff’s theorem 2

Let $\textbf{x} = (x_1,x_2,\cdots,x_n) \succ \textbf{y} = (y_1,y_2,\cdots,y_n)$, where the arrays $(x_1,x_2,\cdots,x_n)$be arrays in $\mathbb{R}^n$. Assume that $\textbf{x} \succ \textbf{y}$, but $\textbf{y}$ does not lie in the convex hull $S$ of the points $(x_{\sigma (1)}, \cdots, x_{\sigma (n)})$ for all $\sigma \in S_n$. By separation theorem (take A to be $S$ and B to be $\textbf{y}$), we can find a linear functional $f \in \left(\mathbb{R}^n\right)^*$ such that

$\displaystyle sup\{ \textrm{Re} f(x): x \in A \} < inf \{ \textrm{Re} f(y): y \in B \}$

Note that a linear functional of $\mathbb{R}^n$ is of the form $f(x_1,\cdots, x_n) = c_1x_1+ \cdots + c_nx_n$ for some constants $c_1, \cdots ,c_n$. By adding a constant $c$ if needed, we see that

$\displaystyle \textrm{max}_{\sigma \in S_n} \{ c_1 x_{\sigma ( 1 )} + \cdots + c_n x_{\sigma ( n )} + c \} < 0 < c_1y_1 + \cdots + c_ny_n + c$

for some constants $c, c_1, \cdots, c_n$. The constant $c$ can be removed from this inequality.

Let $d_1 \ge d_2 \ge \cdots d_n$ be an rearrangement of $c_1, \cdots, c_n$. By rearrangement inequailty, we have

$\displaystyle d_1y_1 + \cdots + d_ny_n \\ \ge c_1y_1 + \cdots c_ny_n \\ \ge d_1x_1 + \cdots + d_nx_n$.

The last step is because

$\displaystyle d_1x_1 + \cdots + d_nx_n = {max}_{\sigma \in S_n} \{ c_1x_{\sigma(1)} + \cdots c_n x_{\sigma(n)} \}$

by rearrangement inequality.

$\displaystyle d_1x_1 + \cdots + d_nx_n \\ = d_n(x_1 + \cdots + x_n) + (d_{n-1} - d_n)(x_1 + \cdots + x_{n-1}) + \cdots d_1x_1\\ \ge d_n(y_1 + \cdots + y_n) + (d_{n-1} - d_n)(y_1 + \cdots + y_{n-1}) + \cdots d_1y_1 \\ = d_1y_1 + \cdots + d_ny_n$

3. Application 1: Karamata inequality (Majorization inequality)

Theorem (Karamata)

If $(x_1,x_2,\cdots,x_n) \succ (y_1,y_2,\cdots,y_n)$, where the arrays $(x_1,x_2,\cdots,x_n)$, $(y_1,y_2,\cdots,y_n) \in \mathbb{R}^n$, then for any convex function $f: \mathbb{R} \rightarrow \mathbb{R}$, we have

$\displaystyle f(x_1) + \cdots + f(x_n) \ge f(y_1) + \cdots + f(y_n)$

Proof

By Birkhoff’s theorem, we have

$\displaystyle \sum_{\sigma \in S_n} a_{\sigma} \left( x_{\sigma(1)}, x_{\sigma(2)},\cdots,x_{\sigma(n)} \right) = (y_1, y_2, \cdots, y_n)$.

By Jensen’s inequality,

$\displaystyle f(y_1) + \cdots + f(y_n) \\ = f(\sum_{\sigma \in S_n} a_{\sigma} x_{\sigma(1)}) + \cdots + f(\sum_{\sigma \in S_n} a_{\sigma} x_{\sigma(n)}) \\ \leq \sum_{\sigma \in S_n} a_{\sigma} \left(f(x_{\sigma(1)}) + \cdots f(x_{\sigma(n)})\right) \\ = f(x_1) + \cdots + f(x_n))$

If $(x_1,x_2,\cdots,x_n) \succ (y_1,y_2,\cdots,y_n)$, where the arrays $(x_1,x_2,\cdots,x_n)$, $(y_1,y_2,\cdots,y_n) \in \mathbb{R}^n$, then for any array of positive numbers $(e_1, \cdots, e_n)$, we have

$\displaystyle \sum_{\tau \in S_n} e_{\tau ( 1)}^{x_1} \cdots e_{\tau (n)}^{x_n} \ge \sum_{\tau \in S_n} e_{\tau ( 1)}^{y_1} \cdots e_{\tau (n)}^{y_n}$

Proof

By Birkhoff’s theorem, we have

$\displaystyle \sum_{\sigma \in S_n} a_{\sigma} \left( x_{\sigma(1)}, x_{\sigma(2)},\cdots,x_{\sigma(n)} \right) = (y_1, y_2, \cdots, y_n)$. Then by weighted AM-GM inequality,

$\displaystyle e_{\tau ( 1)}^{y_1} \cdots e_{\tau (n)}^{y_n} \\ = e_{\tau ( 1 )}^{\sum_{\sigma \in S_n} a_{\sigma} x_{\sigma(1)}} \cdots e_{\tau (n)}^{\sum_{\sigma \in S_n} a_{\sigma} x_{\sigma(n)}} \\ = \prod_{\sigma \in S_n} \left(e_{\tau(1)}^{x_{\sigma(1)}} \cdots e_{\tau(n)}^{x_{\sigma(n)}}\right)^{a_{\sigma}} \\ \leq \sum_{\sigma \in S_n} a_{\sigma} e_{\tau(1)}^{x_{\sigma(1)}} \cdots e_{\tau(n)}^{x_{\sigma(n)}}$

Summing over all the possible $\tau \in S_n$, we have the desired inequality.

References

1. Albert W. Marshall and Ingram Olkin, Inequalities: Theory of Majorization and Its Applications. Academic Press, 1979.
2. Fedor Petrov, “Majorization Sequences”, http://www.mathlinks.ro/viewtopic.php?search_id=1697861374&t=5849