From 704e1e5e1c717b03498cfd3d1f1eb19d24e89dc9 Mon Sep 17 00:00:00 2001
From: Charles Iliya Krempeaux
Let's say we have a bunch of training examples. Then we can calculate what the neural network will output on the training example using the simple formula in the diagram. We want to train the neuron so that we pick the optimal weights possible - the weights that minimize the errors we make on the training examples. In this case, let's say we want to minimize the square error over all of the training examples that we encounter. More formally, if we know that is the true answer for the training example and is the value computed by the neural network, we want to minimize the value of the error function :
- ++ Let's say we have a bunch of training examples. Then we can calculate what the neural network will output on the + + + training example using the simple formula in the diagram. We want to train the neuron so that we pick the optimal weights possible - the weights that minimize the errors we make on the training examples. In this case, let's say we want to minimize the square error over all of the training examples that we encounter. More formally, if we know that + + + is the true answer for the + + + training example and + + + is the value computed by the neural network, we want to minimize the value of the error function + : + : +
+ +Now at this point you might be thinking, wait up... Why do we need to bother ourselves with this error function nonsense when we have a bunch of variables (weights) and we have a set of equations (one for each training example)? Couldn't we just solve this problem by setting up a system of linear system of equations? That would automaically give us an error of zero assuming that we have a consistent set of training examples, right?
That's a smart observation, but the insight unfortunately doesn't generalize well. Remember that although we're using a linear neuron here, linear neurons aren't used very much in practice because they're constrained in what they can learn. And the moment you start using nonlinear neurons like the sigmoidal neurons we talked about, we can no longer set up a system of linear equations!
-So maybe we can use an iterative approach instead that generalizes to nonlinear examples. Let's try to visualize how we might minimize the squared error over all of the training examples by simplifying the problem. Let's say we're dealing with a linear neuron with only two inputs (and thus only two weights, and ). Then we can imagine a 3-dimensional space where the horizontal dimensions correspond to the weights and , and there is one vertical dimension that corresponds to the value of the error function . So in this space, points in the horizontal plane correspond to different settings of the weights, and the height at those points corresponds to the error that we're incurring, summed over all training cases. If we consider the errors we make over all possible weights, we get a surface in this 3-dimensional space, in particular a quadratic bowl:
++ So maybe we can use an iterative approach instead that generalizes to nonlinear examples. Let's try to visualize how we might minimize the squared error over all of the training examples by simplifying the problem. Let's say we're dealing with a linear neuron with only two inputs (and thus only two weights, + + and + ). + Then we can imagine a 3-dimensional space where the horizontal dimensions correspond to the weights + + and + , + and there is one vertical dimension that corresponds to the value of the error function + . + So in this space, points in the horizontal plane correspond to different settings of the weights, and the height at those points corresponds to the error that we're incurring, summed over all training cases. If we consider the errors we make over all possible weights, we get a surface in this 3-dimensional space, in particular a quadratic bowl: +