Math Problem Statement

is my question 1 correct Definition w_jk=the weight between the hidden neuron of k and the output neuron of j y_j=the output neuron of j which calculates as f_out=f(〖net〗_j )=f(∑_k▒〖w_jk y_j 〗) Objective Function of E=c^2/2 log_e⁡〖(((t_i-y_i)/c+b)^2 )+∑_k^w▒((w_k^2)/θ)/(1+(w_k^2)/θ)〗 η is the learning rate In respect to w_jk, the calculation of the derivative of E δE/(δw_jk )= δE/(δy_j ) * (δy_j)/(δ〖net〗_j ) * (δ〖net〗_j)/(δw_jk )

calculating δE/(δy_j ) E(y_i )=c^2/2 log_e⁡〖(((t_i-y_i)/c+b)^2 )+∑_k^w▒((w_k^2)/θ)/(1+(w_k^2)/θ)〗 The log⁡〖 property〗 of log⁡〖(x^2 ) is 2 log⁡(x) 〗 E(y_i )=c^2/22log_e⁡〖((t_i-y_i)/c+b)+∑_k^w▒((w_k^2)/θ)/(1+(w_k^2)/θ)〗 E(y_i )=c^2*log_e⁡〖((t_i-y_i)/c+b)+∑_k^w▒((w_k^2)/θ)/(1+(w_k^2)/θ)〗

Let’s differentiate in terms of y_j Term 1: c^2log_e⁡((t_i-y_i)/c+b) c^2 is a constant so it will stay the same let (t_i-y_i)/c+b be x ∂/(∂y_J )=c^2log_e⁡x =c^21/x∂x/(∂y_j ) =c^21/x((0-1)/c+0) =c^21/x-1/c =〖-c〗^2/c1/x =(-c)/11/x =-c/x remember that x is (t_i-y_i)/c+b =-c/((t_i-y_i)/c+b) To simplify my answer: =-c/((t_i-y_i)/c+(bc)/(c1)) =-c/((t_i-y_i+bc)/c) Using KCF (Keep the Numerator, Change the Sign, Flip the Fraction) =-c/1*c/(t_i-y_i+bc) =-c^2/(t_i-y_i+bc)

Term 2: ∑_k^w▒((w_k^2)/θ)/(1+(w_k^2)/θ) As there is no instance of y_j, the derivate will be 0 ∂/(∂y_J )=0 Therefore: ∂E/(∂y_J )= -c^2/(t_i-y_i+bc) calculating (δy_J)/(δ〖net〗_j ) Output Layer f(x)=x/(1+|x| )

let x be 〖net〗(i^out ) using the quotient rule of (u^' v-v^' u)/v^2 (δy_J)/(δ〖net〗j )=f^' (net(i^out ) )=((〖net〗(i^out )'*(1+n〖et〗(i^out ) ))-((1+net(i^out ) )^'*net_(i^out ) ))/(1+net_(i^out ) )^2

(δy_J)/(δ〖net〗j )=f^' (net(i^out ) )=((1*(1+n〖et〗(i^out ) ))-((0+1)*net(i^out ) ))/(1+net_(i^out ) )^2

(δy_J)/(δ〖net〗j )=f^' (net(i^out ) )=(n〖et〗(i^out )+1-n〖et〗(i^out ))/(1+net_(i^out ) )^2 (δy_J)/(δ〖net〗j )=f^' (net(i^out ) )=1/(1+net_(i^out ) )^2

calculating (δ〖net〗_j)/(δw_jk ) (δ〖net〗_j)/(δw_jk )=y_k

δE/(δw_jk )= δE/(δy_J ) * (δy_J)/(δ〖net〗_j ) * (δ〖net〗_j)/(δw_jk )

δE/(δw_jk )=(-c^2/(t_i-y_i+bc)) (1/(1+net_(i^out ) )^2 ) * y_k δE/(δw_jk )=((-c^2 y_k )/〖〖(t〗i-y_i+bc)*(1+net(i^out ) )〗^2 ) δE/(δw_jk )=(-c^2* y_k )/〖〖(t〗i-y_i+bc)*(1+net(i^out ) )〗^2

w_jk^new= w_jk^old-ηδE/(δw_jk ) =w_jk^old-η(-c^2 y_k )/〖〖(t〗i-y_i+bc)*(1+net(i^out ) )〗^2 =w_jk^old+η (c^2 y_k )/〖〖(t〗i-y_i+bc)*(1+net(i^out ) )〗^2

Solution

Your solution for Question 1 seems mostly correct, but there are some key points to review and verify for accuracy. I'll break down the process and the components to ensure everything aligns properly.

Breakdown:

  1. Objective Function EE and its Derivative:

    • Your objective function EE is given by: E(yi)=c22loge((tiyic+b)2)+kw(wk2/θ)1+(wk2/θ)E(y_i) = \frac{c^2}{2} \log_e \left( \left( \frac{t_i - y_i}{c} + b \right)^2 \right) + \sum_k^w \frac{\left(w_k^2/\theta\right)}{1 + \left(w_k^2/\theta\right)}
    • You correctly identified that the log property of log(x2)\log(x^2) simplifies to 2log(x)2\log(x), which you used to rewrite the function.
  2. Partial Derivative with Respect to yjy_j:

    • You found the partial derivative of the first term in the objective function with respect to yjy_j: Eyj=c2tiyi+bc\frac{\partial E}{\partial y_j} = \frac{-c^2}{t_i - y_i + bc}
    • This is accurate. Your handling of the constant terms and the application of the chain rule is correct.
  3. Derivative of the Second Term:

    • Since the second term involves the weights wkw_k and does not depend on yjy_j, you correctly concluded that its derivative with respect to yjy_j is zero.
  4. Derivative with Respect to netj\text{net}_j:

    • You applied the quotient rule to derive the function: f(x)=x1+xf(x) = \frac{x}{1 + |x|}
    • The derivative you found: δyjδnetj=1(1+netj)2\frac{\delta y_j}{\delta \text{net}_j} = \frac{1}{(1 + \text{net}_j)^2} is correct.
  5. Derivative of netj\text{net}_j with Respect to wjkw_{jk}:

    • You identified that δnetjδwjk=yk\frac{\delta \text{net}_j}{\delta w_{jk}} = y_k, which is also correct.
  6. Putting it All Together:

    • You combined all the partial derivatives to express: δEδwjk=c2yk(tiyi+bc)(1+netiout)2\frac{\delta E}{\delta w_{jk}} = \frac{-c^2 y_k}{(t_i - y_i + bc) (1 + \text{net}_i^{\text{out}})^2}
    • This expression seems correct, and the final update rule for the weights wjkw_{jk} follows from here.

Final Steps:

You concluded with the weight update rule: wjknew=wjkold+ηc2yk(tiyi+bc)(1+netiout)2w_{jk}^{\text{new}} = w_{jk}^{\text{old}} + \eta \frac{c^2 y_k}{(t_i - y_i + bc)(1 + \text{net}_i^{\text{out}})^2} This result is correct, given the expressions you've derived earlier.

Summary:

  • The logical steps and mathematical operations you've performed are mostly correct.
  • However, ensure that each step, especially during the application of the chain rule and the quotient rule, is carried out precisely. The resulting formula you derived for updating the weights appears to be correct given the process you've outlined.

Would you like any specific details or further verification on any part of this derivation?

5 Related Questions:

  1. How would changing the activation function f(x)f(x) affect the derivatives in your solution?
  2. Can you derive the weight update rule if the objective function EE were a different type of loss function, such as Mean Squared Error?
  3. How does the learning rate η\eta influence the convergence of the weight update rule?
  4. Could you explain how to generalize this derivative for a multi-layer neural network?
  5. How would introducing regularization terms affect the derivatives and the weight update rule?

Tip:

Always double-check the differentiation steps, particularly when dealing with complex functions involving multiple variables and constants.

Ask a new question for Free

By Image

Drop file here or Click Here to upload

Math Problem Analysis

Mathematical Concepts

Chain Rule
Quotient Rule
Logarithmic Differentiation
Activation Functions

Formulas

Objective Function E
Derivative of E with respect to y_j
Derivative of y_j with respect to net_j
Derivative of net_j with respect to w_jk

Theorems

-

Suitable Grade Level

Advanced Mathematics