Optimal Step Size in Gradient Descent for Quadratic Function Minimization

Math Problem Statement

Let A ∈ Rn×n be a constant matrix and b ∈ Rn be a constant vector.

Let z ∈ Rn. Consider the function g(z) defined as

g(z) =

∥Az − b∥22

∥z∥22

Consider the problem of finding the minimum of g(z) using gradient

descent. Derive the optimal step size αk to be used at iteration number

k of the gradient descent scheme assuming estimate zk−1 from the

previous iteration.

Solution

To derive the optimal step size $\alpha_k$ for minimizing the function $g(z)$ using gradient descent, we first need to calculate the gradient of $g(z)$ and then find the step size that minimizes $g(z)$ along the direction of the negative gradient at each iteration.

Step 1: Define the function and compute its gradient

The function $g(z)$ is given by:

$g(z) = \frac{1}{2} \|Az - b\|_2^2 + \frac{1}{2} \|z\|_2^2$

This can be expanded as:

$g(z) = \frac{1}{2} \left((Az - b)^T (Az - b)\right) + \frac{1}{2} \left(z^T z\right)$

Expanding the first term:

$(Az - b)^T (Az - b) = z^T A^T A z - 2b^T A z + b^T b$

Thus, $g(z)$ becomes:

$g(z) = \frac{1}{2} \left( z^T A^T A z - 2b^T A z + b^T b \right) + \frac{1}{2} z^T z$

The gradient of $g(z)$ with respect to $z$ is:

$\nabla g(z) = \frac{\partial g(z)}{\partial z} = A^T A z - A^T b + z$

So,

$\nabla g(z) = (A^T A + I)z - A^T b$

Step 2: Gradient descent update rule

In gradient descent, the update rule is given by:

$z_k = z_{k-1} - \alpha_k \nabla g(z_{k-1})$

where $\alpha_k$ is the step size at iteration $k$ .

Step 3: Derive the optimal step size $\alpha_k$

The optimal step size $\alpha_k$ is determined by minimizing $g(z)$ along the direction of the negative gradient. This requires minimizing the function:

$\phi(\alpha) = g(z_{k-1} - \alpha \nabla g(z_{k-1}))$

Substituting $z_k = z_{k-1} - \alpha \nabla g(z_{k-1})$ into $g(z)$ :

$\phi(\alpha) = g(z_{k-1} - \alpha \nabla g(z_{k-1}))$

Expanding $g(z_{k-1} - \alpha \nabla g(z_{k-1}))$ :

$\phi(\alpha) = \frac{1}{2} \|A(z_{k-1} - \alpha \nabla g(z_{k-1})) - b\|_2^2 + \frac{1}{2} \|z_{k-1} - \alpha \nabla g(z_{k-1})\|_2^2$

This expands to:

$\phi(\alpha) = \frac{1}{2} \left\{ \|Az_{k-1} - \alpha A \nabla g(z_{k-1}) - b\|_2^2 + \|z_{k-1} - \alpha \nabla g(z_{k-1})\|_2^2 \right\}$

To find the optimal $\alpha_k$ , differentiate $\phi(\alpha)$ with respect to $\alpha$ and set it to zero:

$\frac{d\phi(\alpha)}{d\alpha} = -\nabla g(z_{k-1})^T \left( A^T(Az_{k-1} - b) + z_{k-1} \right) + \alpha \nabla g(z_{k-1})^T (A^T A + I) \nabla g(z_{k-1})$

Setting this equal to zero and solving for $\alpha$ :

$0 = -\nabla g(z_{k-1})^T \nabla g(z_{k-1}) + \alpha \nabla g(z_{k-1})^T (A^T A + I) \nabla g(z_{k-1})$

$\alpha_k = \frac{\nabla g(z_{k-1})^T \nabla g(z_{k-1})}{\nabla g(z_{k-1})^T (A^T A + I) \nabla g(z_{k-1})}$

Conclusion

The optimal step size $\alpha_k$ for the gradient descent at iteration $k$ is given by:

$\alpha_k = \frac{\|\nabla g(z_{k-1})\|_2^2}{\nabla g(z_{k-1})^T (A^T A + I) \nabla g(z_{k-1})}$

This step size ensures the most efficient descent at each iteration.

Would you like further details on any part of this derivation or have additional questions? Here are some related questions for further exploration:

How does the optimal step size $\alpha_k$ affect the convergence rate of gradient descent?
What happens if we use a fixed step size instead of the optimal $\alpha_k$ ?
How can the convergence be analyzed using the eigenvalues of $A^T A + I$ ?
What would the gradient descent update look like if we introduced a regularization term in $g(z)$ ?
Can you implement this gradient descent method in a programming language like Python or MATLAB?

Tip: When optimizing functions using gradient descent, the choice of step size is crucial for convergence and efficiency, and can be problem-specific.

Ask a new question for Free

By Image

Drop file here or Click Here to upload

Math Problem Analysis

Mathematical Concepts

Matrix Algebra
Gradient Descent

Formulas

Gradient of a Quadratic Function

Theorems

Suitable Grade Level

Advanced Undergraduate

Related Recommendation

Optimal Step Size in Gradient Descent for Quadratic Function

Derive the Optimal Step Size for Gradient Descent at Iteration k