Derive gradient of 1/2 ((Ax - b)T(Ax - b))

Let's derive the gradient of the function:

\[
f(x) = \frac{1}{2} \left( (Ax - b)^\top (Ax - b) \right)
\]

where \( A \) is a matrix, \( x \) is a vector, and \( b \) is a vector.

### Step 1: Expand the Expression
First, we expand the quadratic expression inside the function:

\[
f(x) = \frac{1}{2} \left( (Ax - b)^\top (Ax - b) \right)
\]

This can be expanded as:

\[
f(x) = \frac{1}{2} \left( x^\top A^\top Ax - x^\top A^\top b - b^\top A x + b^\top b \right)
\]

Since \( x^\top A^\top b \) is a scalar, it is equal to its transpose, i.e., \( x^\top A^\top b = (x^\top A^\top b)^\top = b^\top A x \). Thus, the expression simplifies to:

\[
f(x) = \frac{1}{2} \left( x^\top A^\top Ax - 2x^\top A^\top b + b^\top b \right)
\]

### Step 2: Compute the Gradient with Respect to \( x \)
To find the gradient \( \nabla_x f(x) \), we differentiate \( f(x) \) with respect to \( x \).

1. **Gradient of \( \frac{1}{2} x^\top A^\top Ax \)**:

   The term \( x^\top A^\top Ax \) is a quadratic form. The gradient of \( \frac{1}{2} x^\top A^\top Ax \) with respect to \( x \) is:

   \[
   \nabla_x \left( \frac{1}{2} x^\top A^\top Ax \right) = A^\top A x
   \]

2. **Gradient of \( -x^\top A^\top b \)**:

   The gradient of \( -x^\top A^\top b \) with respect to \( x \) is:

   \[
   \nabla_x \left( -x^\top A^\top b \right) = -A^\top b
   \]

3. **Gradient of \( \frac{1}{2} b^\top b \)**:

   The term \( \frac{1}{2} b^\top b \) is a constant with respect to \( x \), so its gradient is:

   \[
   \nabla_x \left( \frac{1}{2} b^\top b \right) = 0
   \]

### Step 3: Combine the Results
The gradient of the entire function is obtained by summing the gradients of the individual terms:

\[
\nabla_x f(x) = A^\top A x - A^\top b
\]

### Final Result
Therefore, the gradient of the function \( f(x) = \frac{1}{2} \left( (Ax - b)^\top (Ax - b) \right) \) with respect to \( x \) is:

\[
\nabla_x f(x) = A^\top (Ax - b)
\]

This result is compact and elegant, showing how the gradient depends on both the matrix \( A \) and the vector \( b \).

### Follow-Up Questions
1. How would the gradient change if the function were \( f(x) = (Ax - b)^\top (Ax - b) \) without the \( \frac{1}{2} \) factor?
2. How does this gradient expression simplify if \( A \) is a symmetric matrix?
3. What is the Hessian matrix of \( f(x) \)?
4. How does the gradient change if \( A \) is a diagonal matrix?
5. Can this gradient result be extended to the case where \( A \) is not a matrix but a tensor?

### Tip
When dealing with quadratic forms like \( x^\top A^\top A x \), always remember that the gradient can be efficiently computed using matrix derivatives, and symmetry properties of the matrices often simplify the computations.

Learn how to derive the gradient of the function f(x) = 1/2 ((Ax - b)^T(Ax - b)) with a detailed step-by-step solution. This problem covers matrix algebra, gradient calculus, and quadratic forms in mathematical optimization. Suitable for advanced undergraduate students and researchers.

Math Problem Statement

Solution

Ask a new question for Free

By Image

Math Problem Analysis

Mathematical Concepts

Formulas

Theorems

Suitable Grade Level

Related Recommendation