diff --git a/2014/12/29/deep-learning-in-a-nutshell/index.xhtml b/2014/12/29/deep-learning-in-a-nutshell/index.xhtml index a2ac941..8beff10 100644 --- a/2014/12/29/deep-learning-in-a-nutshell/index.xhtml +++ b/2014/12/29/deep-learning-in-a-nutshell/index.xhtml @@ -522,11 +522,199 @@

Putting all of this together, we can now compute the derivative of the error function with respect to each weight:

- + + + + + + E + + + + + w + k + + + + = + + + i + + + + + + y + + ( + i + ) + + + + + + + w + k + + + + + + + E + + + + + y + + ( + i + ) + + + + + = + + + + i + + + x + k + + ( + i + ) + + + + y + + ( + i + ) + + + + ( + + 1 + + + y + + ( + i + ) + + + + ) + + + ( + + + t + + ( + i + ) + + + + + y + + ( + i + ) + + + + ) + +

Thus, the final rule for modifying the weights becomes:

- + + + Δ + + w + k + + = + + + i + + ϵ + + x + k + + ( + i + ) + + + + y + + ( + i + ) + + + + ( + + 1 + + + y + + ( + i + ) + + + + ) + + + ( + + + t + + ( + i + ) + + + + + y + + ( + i + ) + + + + ) + +

As you may notice, the new modification rule is just like the delta rule, except with extra multiplicative terms included to account for the logistic component of the sigmoidal neuron.

@@ -540,46 +728,104 @@ Reference diagram for the derivation of the backpropagation algorithm -

The subscript we use will refer to the layer of the neuron. The symbol will refer to the activity of a neuron, as usual. Similarly the symbol will refer to the logit of a neuron. We start by taking a look at the base case of the dynamic programming problem, the error function derivatives at the output layer:

+

The subscript we use will refer to the layer of the neuron. The symbol y will refer to the activity of a neuron, as usual. Similarly the symbol z will refer to the logit of a neuron. We start by taking a look at the base case of the dynamic programming problem, the error function derivatives at the output layer:

- + + + E + = + + 1 + 2 + + + + + j + + o + u + t + p + u + t + + + + + ( + + + t + j + + + + y + j + + + ) + + 2 + + + + + + + + E + + + + + y + j + + + + = + + ( + + t + j + + + + y + j + + ) + -

Now we tackle the inductive step. Let's presume we have the error derivatives for layer . We now aim to calculate the error derivatives for the layer below it, layer . To do so, we must accumulate information for how the output of a neuron in layer affects the logits of every neuron in layer . This can be done as follows, using the fact that the partial derivative of the logit with respect to the incoming output data from the layer beneath is merely the weight of the connection :

+

Now we tackle the inductive step. Let's presume we have the error derivatives for layer j. We now aim to calculate the error derivatives for the layer below it, layer i. To do so, we must accumulate information for how the output of a neuron in layer i affects the logits of every neuron in layer j. This can be done as follows, using the fact that the partial derivative of the logit with respect to the incoming output data from the layer beneath is merely the weight of the connection wij:

- +

Now we can use the following to complete the inductive step:

- + -

+

Combining these two together, we can finally express the partial derivatives of layer in terms of the partial derivatives of layer .

- +

Then once we've gone through the whole dynamic programming routine, having filled up the table appropriately with all of our partial derivatives (of the error function with respect to the hidden unit activities), we can then determine how the error changes with respect to the weights. This gives us how to modify the weights after each training example:

- +

In order to do backpropagation with batching of training examples, we merely sum up the partial derivatives over all the training examples in the batch. This gives us the following modification formula:

- +

We have succeeded in deriving the backpropagation algorithm for a feed-forward neural net utilizing sigmoidal neurons!