diff --git a/2014/12/29/deep-learning-in-a-nutshell/index.xhtml b/2014/12/29/deep-learning-in-a-nutshell/index.xhtml index 354127d..a2ac941 100644 --- a/2014/12/29/deep-learning-in-a-nutshell/index.xhtml +++ b/2014/12/29/deep-learning-in-a-nutshell/index.xhtml @@ -378,7 +378,7 @@ - + y = @@ -397,39 +397,136 @@ -

The neuron computes the weighted sum of its inputs, the logit, . It then feeds into the input function to compute , its final output. These functions have very nice derivatives, which makes learning easy! For learning, we want to compute the gradient of the error function with respect to the weights. To do so, we start by taking the derivative of the logit, , with respect to the inputs and the weights. By linearity of the logit:

+

The neuron computes the weighted sum of its inputs, the logit, z. It then feeds z into the input function to compute y, its final output. These functions have very nice derivatives, which makes learning easy! For learning, we want to compute the gradient of the error function with respect to the weights. To do so, we start by taking the derivative of the logit, z, with respect to the inputs and the weights. By linearity of the logit:

- + + + + + + z + + + + + w + k + + + + = + + x + k + + - + + + + + + z + + + + + x + k + + + + = + + w + k + +

Also, quite surprisingly, the derivative of the output with respect to the logit is quite simple if you express it in terms of the output. Verifying this is left as an exercise for the reader:

- + + + + + d + y + + + d + z + + + = + y + ( + 1 + + y + ) +

We then use the chain rule to get the derivative of the output with respect to each weight:

- + + + + + + y + + + + + w + k + + + + = + + + + z + + + + + w + k + + + + + + d + y + + + d + z + + + = + + x + k + + y + ( + 1 + + y + ) +

Putting all of this together, we can now compute the derivative of the error function with respect to each weight:

- +

Thus, the final rule for modifying the weights becomes:

- +

As you may notice, the new modification rule is just like the delta rule, except with extra multiplicative terms included to account for the logistic component of the sigmoidal neuron.