Math Problem Statement

Let A \in \mathbb{R}^{n \times n} be a constant matrix and b \in \mathbb{R}^{n} be a constant vector. Let z \in \mathbb{R}^{n}. Consider the function g(z) defined as g(z)=\frac{1}{2}|A z-b|{2}^{2}+\frac{1}{2}|z|{2}^{2} Consider the problem of finding the minimum of g(z) using gradient descent. Derive the optimal step size \alpha_{k} to be used at iteration number k of the gradient descent scheme assuming estimate z_{k-1} from the previous iteration.

Solution

To solve the optimization problem using gradient descent, we start by computing the gradient of the function g(z)g(z) with respect to zz.

Step 1: Compute the gradient g(z)\nabla g(z)

Given the function:

g(z)=12Azb22+12z22g(z) = \frac{1}{2}\|A z - b\|_{2}^{2} + \frac{1}{2}\|z\|_{2}^{2}

We first need to express the norm terms explicitly:

Azb22=(Azb)(Azb)\|A z - b\|_{2}^{2} = (A z - b)^\top (A z - b)

z22=zz\|z\|_{2}^{2} = z^\top z

Thus, the function g(z)g(z) becomes:

g(z)=12(Azb)(Azb)+12zzg(z) = \frac{1}{2}(A z - b)^\top (A z - b) + \frac{1}{2}z^\top z

Now, we compute the gradient g(z)\nabla g(z):

g(z)=z[12(Azb)(Azb)+12zz]\nabla g(z) = \frac{\partial}{\partial z} \left[ \frac{1}{2}(A z - b)^\top (A z - b) + \frac{1}{2}z^\top z \right]

Step 2: Gradient calculation

Let's compute the gradient term by term.

First term:

z[12(Azb)(Azb)]\frac{\partial}{\partial z} \left[ \frac{1}{2}(A z - b)^\top (A z - b) \right]

Expanding the quadratic form:

(Azb)(Azb)=zAAz2bAz+bb(A z - b)^\top (A z - b) = z^\top A^\top A z - 2b^\top A z + b^\top b

Taking the derivative with respect to zz:

z(zAAz2bAz+bb)=2AAz2Ab\frac{\partial}{\partial z} \left( z^\top A^\top A z - 2b^\top A z + b^\top b \right) = 2A^\top A z - 2A^\top b

The factor of 12\frac{1}{2} cancels the 2, so:

z[12(Azb)(Azb)]=AAzAb\frac{\partial}{\partial z} \left[ \frac{1}{2}(A z - b)^\top (A z - b) \right] = A^\top A z - A^\top b

Second term:

z(12zz)=z\frac{\partial}{\partial z} \left( \frac{1}{2}z^\top z \right) = z

So the total gradient is:

g(z)=AAzAb+z\nabla g(z) = A^\top A z - A^\top b + z

Step 3: Gradient descent update

The gradient descent update rule is:

zk=zk1αkg(zk1)z_k = z_{k-1} - \alpha_k \nabla g(z_{k-1})

Substitute the expression for the gradient:

zk=zk1αk(AAzk1Ab+zk1)z_k = z_{k-1} - \alpha_k \left( A^\top A z_{k-1} - A^\top b + z_{k-1} \right)

Step 4: Derive the optimal step size αk\alpha_k

To find the optimal step size αk\alpha_k, minimize g(z)g(z) along the direction of the negative gradient:

αk=argminαg(zk1αg(zk1))\alpha_k = \arg\min_\alpha g(z_{k-1} - \alpha \nabla g(z_{k-1}))

Substituting z=zk1αg(zk1)z = z_{k-1} - \alpha \nabla g(z_{k-1}) into g(z)g(z), we get:

[ g(z_{k-1} - \alpha \nabla g(z_{k-1})) = \frac{1}{

Ask a new question for Free

By Image

Drop file here or Click Here to upload

Math Problem Analysis

Mathematical Concepts

Gradient Descent
Matrix Operations
Quadratic Forms

Formulas

Gradient of a quadratic function

Theorems

-

Suitable Grade Level

Advanced Undergraduate