From 704e1e5e1c717b03498cfd3d1f1eb19d24e89dc9 Mon Sep 17 00:00:00 2001 From: Charles Iliya Krempeaux Date: Sun, 17 Dec 2023 07:34:19 -0800 Subject: [PATCH] mathml --- .../deep-learning-in-a-nutshell/index.xhtml | 183 +++++++++++++++++- 1 file changed, 178 insertions(+), 5 deletions(-) diff --git a/2014/12/29/deep-learning-in-a-nutshell/index.xhtml b/2014/12/29/deep-learning-in-a-nutshell/index.xhtml index ab3e390..ebfafdd 100644 --- a/2014/12/29/deep-learning-in-a-nutshell/index.xhtml +++ b/2014/12/29/deep-learning-in-a-nutshell/index.xhtml @@ -99,7 +99,42 @@ - neuron in the layer with the neuron in the layer: + neuron in the + + + + k + + t + h + + + + layer with the + + + + j + + t + h + + + + neuron in the + + + k + + + + 1 + + s + t + + + + layer:

Neural Net @@ -110,7 +145,45 @@

Similar to how neurons are generally organized in layers in the human brain, neurons in neural nets are often organized in layers as well, where neurons on the bottom layer receive signals from the inputs, where neurons in the top layers have their outlets connected to the "answer," and where are usually no connections between neurons in the same layer (although this is an optional restriction, more complex connectivities require more involved mathematical analysis). We also note that in this example, there are no connections that lead from a neuron in a higher layer to a neuron in a lower layer (i.e., no directed cycles). These neural networks are called feed-forward neural networks as opposed to their counterparts, which are called recursive neural networks (again these are much more complicated to analyze and train). For the sake of simplicity, we focus only on feed-forward networks throughout this discussion. Here's a set of some more important notes to keep in mind:

1) Although every layer has the same number of neurons in this example, this is not necessary.

2) It is not required that a neuron has its outlet connected to the inputs of every neuron in the next layer. In fact, selecting which neurons to connect to which other neurons in the next layer is an art that comes from experience. Allowing maximal connectivity will more often than not result in overfitting, a concept which we will discusss in more depth later.

-

3) The inputs and outputs are vectorized representations. For example, you might imagine a neural network where the inputs are the individual pixel RGB values in an image represented as a vector. The last layer might have 2 neurons which correspond to the answer to our problem: if the image contains a dog, if the image contains a cat, if it contains neither, and if it contains both.

+

+ 3) The inputs and outputs are vectorized representations. For example, you might imagine a neural network where the inputs are the individual pixel RGB values in an image represented as a vector. The last layer might have 2 neurons which correspond to the answer to our problem: + + + [ + 0 + , + 1 + ] + + if the image contains a dog, + + + [ + 1 + , + 0 + ] + + if the image contains a cat, + + + [ + 0 + , + 0 + ] + + if it contains neither, and + + + [ + 1 + , + 1 + ] + + if it contains both. +

4) The layers of neurons that lie sandwiched between the first layer of neurons (input layer) and the last layer of neurons (output layer), are called hidden layers. This is because this is where most of the magic is happening when the neural net tries to solve problems. Taking a look at the activities of hidden layers can tell you a lot about the features the network has learned to extract from the data.

@@ -125,11 +198,111 @@ The neuron we want to train for the Dining Hall Problem
-

Let's say we have a bunch of training examples. Then we can calculate what the neural network will output on the training example using the simple formula in the diagram. We want to train the neuron so that we pick the optimal weights possible - the weights that minimize the errors we make on the training examples. In this case, let's say we want to minimize the square error over all of the training examples that we encounter. More formally, if we know that is the true answer for the training example and is the value computed by the neural network, we want to minimize the value of the error function :

- +

+ Let's say we have a bunch of training examples. Then we can calculate what the neural network will output on the + + + + i + + t + h + + + + training example using the simple formula in the diagram. We want to train the neuron so that we pick the optimal weights possible - the weights that minimize the errors we make on the training examples. In this case, let's say we want to minimize the square error over all of the training examples that we encounter. More formally, if we know that + + + + t + + ( + i + ) + + + + is the true answer for the + + + + i + + t + h + + + + training example and + + + + y + + ( + i + ) + + + + is the value computed by the neural network, we want to minimize the value of the error function + : + + E + : +

+ + + E + = + + 1 + 2 + + + + + i + + + ( + + t + + ( + i + ) + + + + + y + + ( + i + ) + + + + ) + 2 + +

Now at this point you might be thinking, wait up... Why do we need to bother ourselves with this error function nonsense when we have a bunch of variables (weights) and we have a set of equations (one for each training example)? Couldn't we just solve this problem by setting up a system of linear system of equations? That would automaically give us an error of zero assuming that we have a consistent set of training examples, right?

That's a smart observation, but the insight unfortunately doesn't generalize well. Remember that although we're using a linear neuron here, linear neurons aren't used very much in practice because they're constrained in what they can learn. And the moment you start using nonlinear neurons like the sigmoidal neurons we talked about, we can no longer set up a system of linear equations!

-

So maybe we can use an iterative approach instead that generalizes to nonlinear examples. Let's try to visualize how we might minimize the squared error over all of the training examples by simplifying the problem. Let's say we're dealing with a linear neuron with only two inputs (and thus only two weights, and ). Then we can imagine a 3-dimensional space where the horizontal dimensions correspond to the weights and , and there is one vertical dimension that corresponds to the value of the error function . So in this space, points in the horizontal plane correspond to different settings of the weights, and the height at those points corresponds to the error that we're incurring, summed over all training cases. If we consider the errors we make over all possible weights, we get a surface in this 3-dimensional space, in particular a quadratic bowl:

+

+ So maybe we can use an iterative approach instead that generalizes to nonlinear examples. Let's try to visualize how we might minimize the squared error over all of the training examples by simplifying the problem. Let's say we're dealing with a linear neuron with only two inputs (and thus only two weights, + + and + ). + Then we can imagine a 3-dimensional space where the horizontal dimensions correspond to the weights + + and + , + and there is one vertical dimension that corresponds to the value of the error function + . + So in this space, points in the horizontal plane correspond to different settings of the weights, and the height at those points corresponds to the error that we're incurring, summed over all training cases. If we consider the errors we make over all possible weights, we get a surface in this 3-dimensional space, in particular a quadratic bowl: +

Quadratic Error Surface