Detailed Derivation of Neural Network Weight Update Rule

Math Problem Statement

is my question 1 correct Definition w_jk=the weight between the hidden neuron of k and the output neuron of j y_j=the output neuron of j which calculates as f_out=f(〖net〗_j )=f(∑_k▒〖w_jk y_j 〗) Objective Function of E=c^2/2 log_e⁡〖(((t_i-y_i)/c+b)^2 )+∑_k^w▒((w_k^2)/θ)/(1+(w_k^2)/θ)〗 η is the learning rate In respect to w_jk, the calculation of the derivative of E δE/(δw_jk )= δE/(δy_j ) * (δy_j)/(δ〖net〗_j ) * (δ〖net〗_j)/(δw_jk )

calculating δE/(δy_j ) E(y_i )=c^2/2 log_e⁡〖(((t_i-y_i)/c+b)^2 )+∑_k^w▒((w_k^2)/θ)/(1+(w_k^2)/θ)〗 The log⁡〖 property〗 of log⁡〖(x^2 ) is 2 log⁡(x) 〗 E(y_i )=c^2/22log_e⁡〖((t_i-y_i)/c+b)+∑_k^w▒((w_k^2)/θ)/(1+(w_k^2)/θ)〗 E(y_i )=c^2*log_e⁡〖((t_i-y_i)/c+b)+∑_k^w▒((w_k^2)/θ)/(1+(w_k^2)/θ)〗

Let’s differentiate in terms of y_j Term 1: c^2log_e⁡((t_i-y_i)/c+b) c^2 is a constant so it will stay the same let (t_i-y_i)/c+b be x ∂/(∂y_J )=c^2log_e⁡x =c^21/x∂x/(∂y_j ) =c^21/x((0-1)/c+0) =c^21/x-1/c =〖-c〗^2/c1/x =(-c)/11/x =-c/x remember that x is (t_i-y_i)/c+b =-c/((t_i-y_i)/c+b) To simplify my answer: =-c/((t_i-y_i)/c+(bc)/(c1)) =-c/((t_i-y_i+bc)/c) Using KCF (Keep the Numerator, Change the Sign, Flip the Fraction) =-c/1*c/(t_i-y_i+bc) =-c^2/(t_i-y_i+bc)

Term 2: ∑_k^w▒((w_k^2)/θ)/(1+(w_k^2)/θ) As there is no instance of y_j, the derivate will be 0 ∂/(∂y_J )=0 Therefore: ∂E/(∂y_J )= -c^2/(t_i-y_i+bc) calculating (δy_J)/(δ〖net〗_j ) Output Layer f(x)=x/(1+|x| )

let x be 〖net〗(i^out ) using the quotient rule of (u^' v-v^' u)/v^2 (δy_J)/(δ〖net〗j )=f^' (net(i^out ) )=((〖net〗(i^out )'*(1+n〖et〗(i^out ) ))-((1+net(i^out ) )^'*net_(i^out ) ))/(1+net_(i^out ) )^2

(δy_J)/(δ〖net〗j )=f^' (net(i^out ) )=((1*(1+n〖et〗(i^out ) ))-((0+1)*net(i^out ) ))/(1+net_(i^out ) )^2

(δy_J)/(δ〖net〗j )=f^' (net(i^out ) )=(n〖et〗(i^out )+1-n〖et〗(i^out ))/(1+net_(i^out ) )^2 (δy_J)/(δ〖net〗j )=f^' (net(i^out ) )=1/(1+net_(i^out ) )^2

calculating (δ〖net〗_j)/(δw_jk ) (δ〖net〗_j)/(δw_jk )=y_k

δE/(δw_jk )= δE/(δy_J ) * (δy_J)/(δ〖net〗_j ) * (δ〖net〗_j)/(δw_jk )

δE/(δw_jk )=(-c^2/(t_i-y_i+bc)) (1/(1+net_(i^out ) )^2 ) * y_k δE/(δw_jk )=((-c^2 y_k )/〖〖(t〗i-y_i+bc)*(1+net(i^out ) )〗^2 ) δE/(δw_jk )=(-c^2* y_k )/〖〖(t〗i-y_i+bc)*(1+net(i^out ) )〗^2

w_jk^new= w_jk^old-ηδE/(δw_jk ) =w_jk^old-η(-c^2 y_k )/〖〖(t〗i-y_i+bc)*(1+net(i^out ) )〗^2 =w_jk^old+η (c^2 y_k )/〖〖(t〗i-y_i+bc)*(1+net(i^out ) )〗^2

Solution

Your solution for Question 1 seems mostly correct, but there are some key points to review and verify for accuracy. I'll break down the process and the components to ensure everything aligns properly.

Breakdown:

Objective Function $E$ and its Derivative:
- Your objective function $E$ is given by: $E(y_i) = \frac{c^2}{2} \log_e \left( \left( \frac{t_i - y_i}{c} + b \right)^2 \right) + \sum_k^w \frac{\left(w_k^2/\theta\right)}{1 + \left(w_k^2/\theta\right)}$
- You correctly identified that the log property of $\log(x^2)$ simplifies to $2\log(x)$ , which you used to rewrite the function.
Partial Derivative with Respect to $y_j$ :
- You found the partial derivative of the first term in the objective function with respect to $y_j$ : $\frac{\partial E}{\partial y_j} = \frac{-c^2}{t_i - y_i + bc}$
- This is accurate. Your handling of the constant terms and the application of the chain rule is correct.
Derivative of the Second Term:
- Since the second term involves the weights $w_k$ and does not depend on $y_j$ , you correctly concluded that its derivative with respect to $y_j$ is zero.
Derivative with Respect to $\text{net}_j$ :
- You applied the quotient rule to derive the function: $f(x) = \frac{x}{1 + |x|}$
- The derivative you found: $\frac{\delta y_j}{\delta \text{net}_j} = \frac{1}{(1 + \text{net}_j)^2}$ is correct.
Derivative of $\text{net}_j$ with Respect to $w_{jk}$ :
- You identified that $\frac{\delta \text{net}_j}{\delta w_{jk}} = y_k$ , which is also correct.
Putting it All Together:
- You combined all the partial derivatives to express: $\frac{\delta E}{\delta w_{jk}} = \frac{-c^2 y_k}{(t_i - y_i + bc) (1 + \text{net}_i^{\text{out}})^2}$
- This expression seems correct, and the final update rule for the weights $w_{jk}$ follows from here.

Final Steps:

You concluded with the weight update rule: $w_{jk}^{\text{new}} = w_{jk}^{\text{old}} + \eta \frac{c^2 y_k}{(t_i - y_i + bc)(1 + \text{net}_i^{\text{out}})^2}$ This result is correct, given the expressions you've derived earlier.

Summary:

The logical steps and mathematical operations you've performed are mostly correct.
However, ensure that each step, especially during the application of the chain rule and the quotient rule, is carried out precisely. The resulting formula you derived for updating the weights appears to be correct given the process you've outlined.

Would you like any specific details or further verification on any part of this derivation?

5 Related Questions:

How would changing the activation function $f(x)$ affect the derivatives in your solution?
Can you derive the weight update rule if the objective function $E$ were a different type of loss function, such as Mean Squared Error?
How does the learning rate $\eta$ influence the convergence of the weight update rule?
Could you explain how to generalize this derivative for a multi-layer neural network?
How would introducing regularization terms affect the derivatives and the weight update rule?

Tip:

Always double-check the differentiation steps, particularly when dealing with complex functions involving multiple variables and constants.

Ask a new question for Free

By Image

Drop file here or Click Here to upload

Math Problem Analysis

Mathematical Concepts

Chain Rule
Quotient Rule
Logarithmic Differentiation
Activation Functions

Formulas

Objective Function E
Derivative of E with respect to y_j
Derivative of y_j with respect to net_j
Derivative of net_j with respect to w_jk

Theorems

Suitable Grade Level

Advanced Mathematics

Related Recommendation

Derive Update Rule for Non-Bias Weights: Neural Networks Tutorial

Computing $ \hat{y} $ and Loss for Neural Networks with SGD and Cross-Entropy Loss

理解神经网络量化公式及GOBS框架：公式 (1) 和 (2) 的深入解析

神经网络反向传播算法：计算权重更新过程

Understanding Equations and Definitions in Neural Network Architecture