Math Problem Statement

Q1) Let A ∈ R n×n be a constant matrix and b ∈ R n be a constant vector. Let z ∈ R n . Consider the function g(z) defined as g(z) = 1 2 ∥Az − b∥ 2 2 + 1 2 ∥z∥ 2 2

  1. Find the gradient of g(z) with respect to z.
  2. Consider the problem of finding the minimum of g(z) using gradient descent. Derive the optimal step size αk to be used at iteration number k of the gradient descent scheme assuming estimate zk−1 from the previous iteration.

Solution

1. Gradient of g(z)g(z) with respect to zz

Given the function:

g(z)=12Azb22+12z22g(z) = \frac{1}{2} \|Az - b\|^2_2 + \frac{1}{2} \|z\|^2_2

where AA is an n×nn \times n matrix, bb is an nn-dimensional vector, and zz is an nn-dimensional vector, we want to find the gradient g(z)\nabla g(z) with respect to zz.

Step 1: Expand the terms

The first term Azb22\|Az - b\|_2^2 is the squared norm, which can be written as:

Azb22=(Azb)(Azb)\|Az - b\|^2_2 = (Az - b)^\top (Az - b)

Expanding this:

Azb22=zAAz2bAz+bb\|Az - b\|^2_2 = z^\top A^\top Az - 2b^\top Az + b^\top b

The second term is:

z22=zz\|z\|^2_2 = z^\top z

So, the function g(z)g(z) becomes:

g(z)=12(zAAz2bAz+bb)+12zzg(z) = \frac{1}{2} \left( z^\top A^\top A z - 2b^\top A z + b^\top b \right) + \frac{1}{2} z^\top z

Step 2: Differentiate with respect to zz

Now, we differentiate each term with respect to zz.

  • The derivative of 12zAAz\frac{1}{2} z^\top A^\top A z with respect to zz is AAzA^\top A z.
  • The derivative of bAz- b^\top A z with respect to zz is Ab- A^\top b.
  • The derivative of 12zz\frac{1}{2} z^\top z with respect to zz is zz.
  • The term 12bb\frac{1}{2} b^\top b is independent of zz and vanishes upon differentiation.

Therefore, the gradient g(z)\nabla g(z) is:

g(z)=AAzAb+z\nabla g(z) = A^\top A z - A^\top b + z

This can be simplified to:

g(z)=(AA+I)zAb\nabla g(z) = (A^\top A + I)z - A^\top b

where II is the identity matrix of appropriate dimensions.

2. Optimal Step Size αk\alpha_k in Gradient Descent

In gradient descent, the update rule is:

zk=zk1αkg(zk1)z_k = z_{k-1} - \alpha_k \nabla g(z_{k-1})

where αk\alpha_k is the step size to be determined.

To find the optimal step size αk\alpha_k, we want to minimize g(zk)g(z_k) with respect to αk\alpha_k. Substitute the update rule into g(zk)g(z_k):

g(zk)=12A(zk1αkg(zk1))b22+12zk1αkg(zk1)22g(z_k) = \frac{1}{2} \|A(z_{k-1} - \alpha_k \nabla g(z_{k-1})) - b\|^2_2 + \frac{1}{2} \|z_{k-1} - \alpha_k \nabla g(z_{k-1})\|^2_2

This expression is generally difficult to minimize directly. However, for quadratic functions like this one, the optimal step size can be obtained by differentiating g(zk)g(z_k) with respect to αk\alpha_k and setting it to zero.

To derive the optimal step size αk\alpha_k, consider:

αk=g(zk1)g(zk1)g(zk1)(AA+I)g(zk1)\alpha_k = \frac{\nabla g(z_{k-1})^\top \nabla g(z_{k-1})}{\nabla g(z_{k-1})^\top (A^\top A + I)\nabla g(z_{k-1})}

This choice of αk\alpha_k ensures that the step size is optimal at each iteration, given the current gradient.


Do you want details or have any questions?

Here are some related questions:

  1. What is the significance of the identity matrix II in the gradient expression?
  2. How does the condition number of AA affect the convergence of gradient descent in this scenario?
  3. Can you apply gradient descent to a non-quadratic function? How would the step size change?
  4. How would the gradient descent update rule change if g(z)g(z) had an additional regularization term?
  5. How does the choice of initial point z0z_0 impact the convergence of gradient descent?

Tip: When choosing a step size in gradient descent, it's often useful to perform a line search to find the optimal value, particularly when dealing with non-quadratic functions.

Ask a new question for Free

By Image

Drop file here or Click Here to upload

Math Problem Analysis

Mathematical Concepts

Matrix Operations
Vector Calculus
Gradient Descent

Formulas

Norms in Vector Spaces
Gradient of Quadratic Functions

Theorems

-

Suitable Grade Level

Advanced Undergraduate