Optimal Step Size in Gradient Descent for Quadratic Function

Math Problem Statement

Let A \in \mathbb{R}^{n \times n} be a constant matrix and b \in \mathbb{R}^{n} be a constant vector. Let z \in \mathbb{R}^{n}. Consider the function g(z) defined as g(z)=\frac{1}{2}|A z-b|{2}^{2}+\frac{1}{2}|z|{2}^{2} Consider the problem of finding the minimum of g(z) using gradient descent. Derive the optimal step size \alpha_{k} to be used at iteration number k of the gradient descent scheme assuming estimate z_{k-1} from the previous iteration.

Solution

To solve the optimization problem using gradient descent, we start by computing the gradient of the function $g(z)$ with respect to $z$ .

Step 1: Compute the gradient $\nabla g(z)$

Given the function:

$g(z) = \frac{1}{2}\|A z - b\|_{2}^{2} + \frac{1}{2}\|z\|_{2}^{2}$

We first need to express the norm terms explicitly:

$\|A z - b\|_{2}^{2} = (A z - b)^\top (A z - b)$

$\|z\|_{2}^{2} = z^\top z$

Thus, the function $g(z)$ becomes:

$g(z) = \frac{1}{2}(A z - b)^\top (A z - b) + \frac{1}{2}z^\top z$

Now, we compute the gradient $\nabla g(z)$ :

$\nabla g(z) = \frac{\partial}{\partial z} \left[ \frac{1}{2}(A z - b)^\top (A z - b) + \frac{1}{2}z^\top z \right]$

Step 2: Gradient calculation

Let's compute the gradient term by term.

First term:

$\frac{\partial}{\partial z} \left[ \frac{1}{2}(A z - b)^\top (A z - b) \right]$

Expanding the quadratic form:

$(A z - b)^\top (A z - b) = z^\top A^\top A z - 2b^\top A z + b^\top b$

Taking the derivative with respect to $z$ :

$\frac{\partial}{\partial z} \left( z^\top A^\top A z - 2b^\top A z + b^\top b \right) = 2A^\top A z - 2A^\top b$

The factor of $\frac{1}{2}$ cancels the 2, so:

$\frac{\partial}{\partial z} \left[ \frac{1}{2}(A z - b)^\top (A z - b) \right] = A^\top A z - A^\top b$

Second term:

$\frac{\partial}{\partial z} \left( \frac{1}{2}z^\top z \right) = z$

So the total gradient is:

$\nabla g(z) = A^\top A z - A^\top b + z$

Step 3: Gradient descent update

The gradient descent update rule is:

$z_k = z_{k-1} - \alpha_k \nabla g(z_{k-1})$

Substitute the expression for the gradient:

$z_k = z_{k-1} - \alpha_k \left( A^\top A z_{k-1} - A^\top b + z_{k-1} \right)$

Step 4: Derive the optimal step size $\alpha_k$

To find the optimal step size $\alpha_k$ , minimize $g(z)$ along the direction of the negative gradient:

$\alpha_k = \arg\min_\alpha g(z_{k-1} - \alpha \nabla g(z_{k-1}))$

Substituting $z = z_{k-1} - \alpha \nabla g(z_{k-1})$ into $g(z)$ , we get:

[ g(z_{k-1} - \alpha \nabla g(z_{k-1})) = \frac{1}{

Ask a new question for Free

By Image

Drop file here or Click Here to upload

Math Problem Analysis

Mathematical Concepts

Gradient Descent
Matrix Operations
Quadratic Forms

Formulas

Gradient of a quadratic function

Theorems

Suitable Grade Level

Advanced Undergraduate

Related Recommendation

Optimal Step Size in Gradient Descent for Quadratic Function Minimization

Derive the Optimal Step Size for Gradient Descent at Iteration k