Optimal Step Size in Gradient Descent for Quadratic Function Minimization

Math Problem Statement

Let A \in \mathbb{R}^{n \times n} be a constant matrix and b \in \mathbb{R}^{n} be a constant vector. Let z \in \mathbb{R}^{n}. Consider the function g(z) defined as g(z)=\frac{1}{2}|A z-b|{2}^{2}+\frac{1}{2}|z|{2}^{2} Consider the problem of finding the minimum of g(z) using gradient descent. Derive the optimal step size \alpha_{k} to be used at iteration number k of the gradient descent scheme assuming estimate z_{k-1} from the previous iteration.

Solution

To find the optimal step size $\alpha_k$ for the gradient descent scheme to minimize the function $g(z)$ , we start by analyzing the function and its gradient.

Step 1: Define the function and compute the gradient

The function $g(z)$ is defined as: $g(z) = \frac{1}{2} \|Az - b\|_2^2 + \frac{1}{2} \|z\|_2^2$ Expanding the norm terms, we have: $g(z) = \frac{1}{2} (Az - b)^\top (Az - b) + \frac{1}{2} z^\top z$

Now, let’s compute the gradient of $g(z)$ with respect to $z$ : $\nabla g(z) = \nabla \left( \frac{1}{2} \|Az - b\|_2^2 + \frac{1}{2} \|z\|_2^2 \right)$

For the first term: $\nabla \left( \frac{1}{2} \|Az - b\|_2^2 \right) = \nabla \left( \frac{1}{2} (Az - b)^\top (Az - b) \right)$ Using the chain rule, we have: $\nabla \left( \frac{1}{2} (Az - b)^\top (Az - b) \right) = A^\top (Az - b)$

For the second term: $\nabla \left( \frac{1}{2} \|z\|_2^2 \right) = \nabla \left( \frac{1}{2} z^\top z \right) = z$

Thus, the gradient of $g(z)$ is: $\nabla g(z) = A^\top (Az - b) + z$

Step 2: Gradient Descent Update Rule

The gradient descent update rule at iteration $k$ is: $z_k = z_{k-1} - \alpha_k \nabla g(z_{k-1})$ Substituting the gradient: $z_k = z_{k-1} - \alpha_k \left( A^\top (A z_{k-1} - b) + z_{k-1} \right)$

Step 3: Find the Optimal Step Size $\alpha_k$

To find the optimal step size $\alpha_k$ , we minimize $g(z)$ along the direction of the negative gradient. Define the univariate function: $h(\alpha) = g(z_{k-1} - \alpha \nabla g(z_{k-1}))$ Differentiate $h(\alpha)$ with respect to $\alpha$ and set it to zero to find the optimal $\alpha_k$ .

Substitute $z = z_{k-1} - \alpha \nabla g(z_{k-1})$ into $g(z)$ : $h(\alpha) = \frac{1}{2} \|A(z_{k-1} - \alpha \nabla g(z_{k-1})) - b\|_2^2 + \frac{1}{2} \|z_{k-1} - \alpha \nabla g(z_{k-1})\|_2^2$ Expanding the norm:

+ \frac{1}{2} \left( z_{k-1} - \alpha \nabla g(z_{k-1}) \right)^\top \left( z_{k-1} - \alpha \nabla g(z_{k-1}) \right)$$ Taking the derivative with respect to $$\alpha$$ and setting it to zero: $$\frac{d h(\alpha)}{d \alpha} = -\nabla g(z_{k-1})^\top \left( A^\top (Az_{k-1} - b) + z_{k-1} \right) + \alpha \nabla g(z_{k-1})^\top \left( A^\top A + I \right) \nabla g(z_{k-1}) = 0$$ Thus, the optimal step size $$\alpha_k$$ is: $$\alpha_k = \frac{\nabla g(z_{k-1})^\top \nabla g(z_{k-1})}{\nabla g(z_{k-1})^\top (A^\top A + I) \nabla g(z_{k-1})}$$ This expression provides the optimal step size $$\alpha_k$$ for each iteration in the gradient descent method. Would you like more details on any part of the derivation, or do you have any questions? Here are 5 related questions you might find interesting: 1. How does the convergence rate of gradient descent change with different step sizes? 2. Can the optimal step size $$\alpha_k$$ be constant for all iterations, and under what conditions? 3. How does the matrix $$A$$ affect the optimization problem and the gradient descent convergence? 4. What are the implications of choosing a step size $$\alpha_k$$ that is too large or too small? 5. How would you apply a line search method to determine the step size in a more general setting? **Tip:** When implementing gradient descent, it’s important to monitor the objective function's value at each step to ensure the algorithm is converging as expected.

Ask a new question for Free

By Image

Drop file here or Click Here to upload

Math Problem Analysis

Mathematical Concepts

Matrix algebra
Gradient descent
Norms in vector spaces

Formulas

Gradient of a quadratic function

Theorems

Suitable Grade Level

Advanced undergraduate level

Related Recommendation

Gradient Descent for Quadratic Function Minimization

Optimal Step Size Calculation for Gradient Descent Algorithm

Gradient Descent for Quadratic Optimization with Optimal Solution

Optimal Step Size in Gradient Descent for Quadratic Function