Math Problem Statement

Derive gradient of 1/2 ((Ax - b)T(Ax - b))

Solution

Let's derive the gradient of the function:

f(x)=12((Axb)(Axb))f(x) = \frac{1}{2} \left( (Ax - b)^\top (Ax - b) \right)

where AA is a matrix, xx is a vector, and bb is a vector.

Step 1: Expand the Expression

First, we expand the quadratic expression inside the function:

f(x)=12((Axb)(Axb))f(x) = \frac{1}{2} \left( (Ax - b)^\top (Ax - b) \right)

This can be expanded as:

f(x)=12(xAAxxAbbAx+bb)f(x) = \frac{1}{2} \left( x^\top A^\top Ax - x^\top A^\top b - b^\top A x + b^\top b \right)

Since xAbx^\top A^\top b is a scalar, it is equal to its transpose, i.e., xAb=(xAb)=bAxx^\top A^\top b = (x^\top A^\top b)^\top = b^\top A x. Thus, the expression simplifies to:

f(x)=12(xAAx2xAb+bb)f(x) = \frac{1}{2} \left( x^\top A^\top Ax - 2x^\top A^\top b + b^\top b \right)

Step 2: Compute the Gradient with Respect to xx

To find the gradient xf(x)\nabla_x f(x), we differentiate f(x)f(x) with respect to xx.

  1. Gradient of 12xAAx\frac{1}{2} x^\top A^\top Ax:

    The term xAAxx^\top A^\top Ax is a quadratic form. The gradient of 12xAAx\frac{1}{2} x^\top A^\top Ax with respect to xx is:

    x(12xAAx)=AAx\nabla_x \left( \frac{1}{2} x^\top A^\top Ax \right) = A^\top A x

  2. Gradient of xAb-x^\top A^\top b:

    The gradient of xAb-x^\top A^\top b with respect to xx is:

    x(xAb)=Ab\nabla_x \left( -x^\top A^\top b \right) = -A^\top b

  3. Gradient of 12bb\frac{1}{2} b^\top b:

    The term 12bb\frac{1}{2} b^\top b is a constant with respect to xx, so its gradient is:

    x(12bb)=0\nabla_x \left( \frac{1}{2} b^\top b \right) = 0

Step 3: Combine the Results

The gradient of the entire function is obtained by summing the gradients of the individual terms:

xf(x)=AAxAb\nabla_x f(x) = A^\top A x - A^\top b

Final Result

Therefore, the gradient of the function f(x)=12((Axb)(Axb))f(x) = \frac{1}{2} \left( (Ax - b)^\top (Ax - b) \right) with respect to xx is:

xf(x)=A(Axb)\nabla_x f(x) = A^\top (Ax - b)

This result is compact and elegant, showing how the gradient depends on both the matrix AA and the vector bb.

Follow-Up Questions

  1. How would the gradient change if the function were f(x)=(Axb)(Axb)f(x) = (Ax - b)^\top (Ax - b) without the 12\frac{1}{2} factor?
  2. How does this gradient expression simplify if AA is a symmetric matrix?
  3. What is the Hessian matrix of f(x)f(x)?
  4. How does the gradient change if AA is a diagonal matrix?
  5. Can this gradient result be extended to the case where AA is not a matrix but a tensor?

Tip

When dealing with quadratic forms like xAAxx^\top A^\top A x, always remember that the gradient can be efficiently computed using matrix derivatives, and symmetry properties of the matrices often simplify the computations.

Ask a new question for Free

By Image

Drop file here or Click Here to upload

Math Problem Analysis

Mathematical Concepts

Matrix Algebra
Gradient Calculus
Quadratic Forms

Formulas

Matrix transpose and multiplication
Quadratic form expansion

Theorems

-

Suitable Grade Level

Advanced Undergraduate