From b8bf2db5465c8d1d1d0f344e17eb3ad0d3708394 Mon Sep 17 00:00:00 2001 From: Charles Iliya Krempeaux Date: Sun, 17 Dec 2023 08:33:54 -0800 Subject: [PATCH] mathml --- .../deep-learning-in-a-nutshell/index.xhtml | 137 +++++++++++++++--- 1 file changed, 117 insertions(+), 20 deletions(-) diff --git a/2014/12/29/deep-learning-in-a-nutshell/index.xhtml b/2014/12/29/deep-learning-in-a-nutshell/index.xhtml index 354127d..a2ac941 100644 --- a/2014/12/29/deep-learning-in-a-nutshell/index.xhtml +++ b/2014/12/29/deep-learning-in-a-nutshell/index.xhtml @@ -378,7 +378,7 @@ - + y = @@ -397,39 +397,136 @@ -

The neuron computes the weighted sum of its inputs, the logit, . It then feeds into the input function to compute , its final output. These functions have very nice derivatives, which makes learning easy! For learning, we want to compute the gradient of the error function with respect to the weights. To do so, we start by taking the derivative of the logit, , with respect to the inputs and the weights. By linearity of the logit:

+

The neuron computes the weighted sum of its inputs, the logit, z. It then feeds z into the input function to compute y, its final output. These functions have very nice derivatives, which makes learning easy! For learning, we want to compute the gradient of the error function with respect to the weights. To do so, we start by taking the derivative of the logit, z, with respect to the inputs and the weights. By linearity of the logit:

- + + + + + + z + + + + + w + k + + + + = + + x + k + + - + + + + + + z + + + + + x + k + + + + = + + w + k + +

Also, quite surprisingly, the derivative of the output with respect to the logit is quite simple if you express it in terms of the output. Verifying this is left as an exercise for the reader:

- + + + + + d + y + + + d + z + + + = + y + ( + 1 + + y + ) +

We then use the chain rule to get the derivative of the output with respect to each weight:

- + + + + + + y + + + + + w + k + + + + = + + + + z + + + + + w + k + + + + + + d + y + + + d + z + + + = + + x + k + + y + ( + 1 + + y + ) +

Putting all of this together, we can now compute the derivative of the error function with respect to each weight:

- +

Thus, the final rule for modifying the weights becomes:

- +

As you may notice, the new modification rule is just like the delta rule, except with extra multiplicative terms included to account for the logistic component of the sigmoidal neuron.