Backpropagation Programming

Posted on 18.09.201922.08.2017by admin

Back Propagation Dynamic Programming
Intuition

What is a Neural Network?

Since it’s a lot to explain, I will try to stay on subject and talk only about the backpropagation algorithm. What is Backpropagation? Backpropagation is a supervised-learning method used to train neural networks by adjusting the weights and the biases of each neuron. A new learning algorithm for feedforward neural networks based on linear programming is introduced. This alternative to back-propagation gives faster and more reliable learning on reasonably sized examples. Extensions of the method for efficient (approximate) implementations in large networks are considered.

Before we get started with the how of building a Neural Network, we need to understand the what first.

Neural networks can be intimidating, especially for people new to machine learning. However, this tutorial will break down how exactly a neural network works and you will have a working flexible neural network by the end. Let's get started!

Understanding the process

With approximately 100 billion neurons, the human brain processes data at speeds as fast as 268 mph! In essence, a neural network is a collection of neurons connected by synapses. This collection is organized into three main layers: the input layer, the hidden layer, and the output layer. You can have many hidden layers, which is where the term deep learning comes into play. In an artificial neural network, there are several inputs, which are called features, and produce a single output, which is called a label.

via Kabir Shah

The circles represent neurons while the lines represent synapses. The role of a synapse is to take the multiply the inputs and weights. You can think of weights as the 'strength' of the connection between neurons. Weights primarily define the output of a neural network. However, they are highly flexible. After, an activation function is applied to return an output.

Here's a brief overview of how a simple feedforward neural network works:

Takes inputs as a matrix (2D array of numbers)
Multiplies the input by a set weights (performs a dot product aka matrix multiplication)
Applies an activation function
Returns an output
Error is calculated by taking the difference from the desired output from the data and the predicted output. This creates our gradient descent, which we can use to alter the weights
The weights are then altered slightly according to the error.
To train, this process is repeated 1,000+ times. The more the data is trained upon, the more accurate our outputs will be.

At its core, neural networks are simple. They just perform a dot product with the input and weights and apply an activation function. When weights are adjusted via the gradient of loss function, the network adapts to the changes to produce more accurate outputs.

Our neural network will model a single hidden layer with three inputs and one output. In the network, we will be predicting the score of our exam based on the inputs of how many hours we studied and how many hours we slept the day before. Our test score is the output. Here's our sample data of what we'll be training our Neural Network on:

As you may have noticed, the ? in this case represents what we want our neural network to predict. In this case, we are predicting the test score of someone who studied for four hours and slept for eight hours based on their prior performance.

Forward Propagation

Let's start coding this bad boy! Open up a new python file. You'll want to import numpy as it will help us with certain calculations.

First, let's import our data as numpy arrays using np.array. We'll also want to normalize our units as our inputs are in hours, but our output is a test score from 0-100. Therefore, we need to scale our data by dividing by the maximum value for each variable.

Next, let's define a python class and write an init function where we'll specify our parameters such as the input, hidden, and output layers.

It is time for our first calculation. Remember that our synapses perform a dot product, or matrix multiplication of the input and weight. Note that weights are generated randomly and between 0 and 1.

The calculations behind our network

In the data set, our input data, X, is a 3x2 matrix. Our output data, y, is a 3x1 matrix. Each element in matrix X needs to be multiplied by a corresponding weight and then added together with all the other results for each neuron in the hidden layer. Here's how the first input data element (2 hours studying and 9 hours sleeping) would calculate an output in the network:

This image breaks down what our neural network actually does to produce an output. First, the products of the random generated weights (.2, .6, .1, .8, .3, .7) on each synapse and the corresponding inputs are summed to arrive as the first values of the hidden layer. These sums are in a smaller font as they are not the final values for the hidden layer.

To get the final value for the hidden layer, we need to apply the activation function. The role of an activation function is to introduce nonlinearity. An advantage of this is that the output is mapped from a range of 0 and 1, making it easier to alter weights in the future.

There are many activation functions out there. In this case, we'll stick to one of the more popular ones - the sigmoid function.

Now, we need to use matrix multiplication again, with another set of random weights, to calculate our output layer value.

Lastly, to normalize the output, we just apply the activation function again.

And, there you go! Theoretically, with those weights, out neural network will calculate .85 as our test score! However, our target was .92. Our result wasn't poor, it just isn't the best it can be. We just got a little lucky when I chose the random weights for this example.

How do we train our model to learn? Well, we'll find out very soon. For now, let's continue coding our network.

If you are still confused, I highly recommend you check out this informative video which explains the structure of a neural network with the same example.

Implementing the calculations

Now, let's generate our weights randomly using np.random.randn(). Remember, we'll need two sets of weights. One to go from the input to the hidden layer, and the other to go from the hidden to output layer.

Once we have all the variables set up, we are ready to write our forward propagation function. Let's pass in our input, X, and in this example, we can use the variable z to simulate the activity between the input and output layers. As explained, we need to take a dot product of the inputs and weights, apply an activation function, take another dot product of the hidden layer and second set of weights, and lastly apply a final activation function to recieve our output:

Lastly, we need to define our sigmoid function:

And, there we have it! A (untrained) neural network capable of producing an output.

As you may have noticed, we need to train our network to calculate more accurate results.

Backpropagation

The 'learning' of our network

Since we have a random set of weights, we need to alter them to make our inputs equal to the corresponding outputs from our data set. This is done through a method called backpropagation.

Backpropagation works by using a loss function to calculate how far the network was from the target output.

Calculating error

One way of representing the loss function is by using the mean sum squared loss function:

In this function, o is our predicted output, and y is our actual output. Now that we have the loss function, our goal is to get it as close as we can to 0. That means we will need to have close to no loss at all. As we are training our network, all we are doing is minimizing the loss.

To figure out which direction to alter our weights, we need to find the rate of change of our loss with respect to our weights. In other words, we need to use the derivative of the loss function to understand how the weights affect the input.

In this case, we will be using a partial derivative to allow us to take into account another variable.

via Kabir Shah

This method is known as gradient descent. By knowing which way to alter our weights, our outputs can only get more accurate.

Here's how we will calculate the incremental change to our weights:

1) Find the margin of error of the output layer (o) by taking the difference of the predicted output and the actual output (y)

2) Apply the derivative of our sigmoid activation function to the output layer error. We call this result the delta output sum.

3) Use the delta output sum of the output layer error to figure out how much our z² (hidden) layer contributed to the output error by performing a dot product with our second weight matrix. We can call this the z² error.

4) Calculate the delta output sum for the z² layer by applying the derivative of our sigmoid activation function (just like step 2).

Back Propagation Dynamic Programming

5) Adjust the weights for the first layer by performing a dot product of the input layer with the hidden (z²) delta output sum. For the second weight, perform a dot product of the hidden(z²) layer and the output (o) delta output sum.

Calculating the delta output sum and then applying the derivative of the sigmoid function are very important to backpropagation. The derivative of the sigmoid, also known as sigmoid prime, will give us the rate of change, or slope, of the activation function at output sum.

Let's continue to code our Neural_Network class by adding a sigmoidPrime (derivative of sigmoid) function:

Then, we'll want to create our backward propagation function that does everything specified in the four steps above:

We can now define our output through initiating foward propagation and intiate the backward function by calling it in the train function:

To run the network, all we have to do is to run the train function. Of course, we'll want to do this multiple, or maybe thousands, of times. So, we'll use a for loop.

Here's the full 60 lines of awesomeness:

There you have it! A full-fledged neural network that can learn from inputs and outputs. While we thought of our inputs as hours studying and sleeping, and our outputs as test scores, feel free to change these to whatever you like and observe how the network adapts! After all, all the network sees are the numbers. The calculations we made, as complex as they seemed to be, all played a big role in our learning model. If you think about it, it's super impressive that your computer, an object, managed to learn by itself!

Stay tuned for more machine learning tutorials on other models like Linear Regression and Classification!

This tutorial was originally posted on Enlight, a site to learn by building projects. For more content like this, be sure to check out Enlight!

References

Special thanks to Kabir Shah for his contributions to the development of this tutorial

Active10 months ago

I have been trying to get a simple double XOR neural network to work and I am having problems getting backpropagation to train a really simple feed forward neural network.
I have been mostly been trying to follow this guide in getting a neural network but have at best made programs that learn at extremely slow rate.

As I understand neural networks:

Values are computed by taking the result of a sigmoid function from the sum of all inputs to that neuron. This is then fed to the next layer using the weight for each neuron
At the end of running the error is computed for the output neurons, then using the weights, error is back propagated back by simply multiplying the values and then summing at each Neuron
When all of the errors are computed the weights are adjusted by the delta = weight of connection * derivative of the sigmoid (value of Neuron weight is going to) * value of Neuron that connection is to * error of neuron * amount of output error of neuron going to * beta (some constant for learning rate)

This is my current muck of code that I am trying to get working. I have a lot of other attempts somewhat mixed in, but the main backpropagation function that I am trying to get working is on line 293 in Net.cpp

juzzlin

28.9k3 gold badges15 silver badges27 bronze badges

Matthew FLMatthew FL

7133 gold badges9 silver badges21 bronze badges

4 Answers

Have a look at 15 Steps to implement a Neural Network, it should get you started.

ForceBru

25.3k9 gold badges39 silver badges62 bronze badges

Gregory PakoszGregory Pakosz

60.7k14 gold badges126 silver badges159 bronze badges

I wrote a simple a 'Tutorial' that you can check out below.

It is a simple implementation of the perceptron model. You can imagine a perceptron as a neural network with only one neuron. There is of curse code that you can test out that I wrote in C++. I go through the code step by step so you shouldn't have any issues.

Although the perceptron isn't really a 'Neural Network' it is really helpful if you want to get started and might help you better understand how a full Neural Network works.

Hope that helps!Cheers! ^_^

In this example I will go through the implementation of the perceptron model in C++ so that you can get a better idea of how it works.

First things first it is a good practice to write down a simple algorithm of what we want to do.

Algorithm:

Make a the vector for the weights and initialize it to 0 (Don't forget to add the bias term)
Keep adjusting the weights until we get 0 errors or a low error count.
Make predictions on unseen data.

Having written a super simple algorithm let's now write some of the functions that we will need.

We will need a function to calculate the net's input (e.i *x * wT* multiplying the inputs time the weights)
A step function so that we get a prediction of either 1 or -1
And a function that finds the ideal values for the weights.

So without further ado let's get right into it.

Let's start simple by creating a perceptron class:

Now let's add the functions that we will need.

Notice how the function fit takes as an argument a vector of vector< float >. That is because our training dataset is a matrix of inputs. Essentially we can imagine that matrix as a couple of vectors x stacked the one on top of another and each column of that Matrix being a feature.

Finally let's add the values that our class needs to have. Such as the vector w to hold the weights, the number of epochs which indicates the number of passes that we will do over the training dataset. And the constant eta which is the learning rate of which we will multiply each weight update in order to make the training procedure faster by dialing this value up or if eta is too high we can dial it down to get the ideal result( for most applications of the perceptron I would suggest an eta value of 0.1 ).

Now with our class set. It's time to write each one of the functions.

We will start from the constructor ( perceptron(float eta,int epochs); )

As you can see what we will be doing is very simple stuff. So let's move on to another simple function. The predict function( int predict(vector X); ). Remember that what the all predict function does is taking the net input and returning a value of 1 if the netInput is bigger than 0 and -1 otherwhise.

Notice that we used an inline if statement to make our lives easier. Here's how the inline if statement works:

condition ? if_true : else

So far so good. Let's move on to implementing the netInput function( float netInput(vector X); )

The netInput does the following; multiplies the input vector by the transpose of the weights vector

*x * wT*

In other words, it multiplies each element of the input vector x by the corresponding element of the vector of weights w and then takes their sum and adds the bias.

*(x1 * w1 + x2 * w2 + ... + xn * wn) + bias*

*bias = 1 * w0*

Alright so we are now pretty much done last thing we need to do is to write the fit function which modifies the weights.

So that was essentially it. With only 3 functions we now have a working perceptron class that we can use to make predictions!

In case you want to copy-paste the code and try it out. Here is the entire class (I added some extra functionality such as printing the weights vector and the errors in each epoch as well as added the option to import/export weights.)

Here is the code:

The class header:

The class .cpp file with the functions:

Also if you want to try out an example here is an example I made:

main.cpp:

The way I imported the iris dataset isn't really ideal but I just wanted something that worked.

The data files can be found here.

I hope that you found this helpful!

Note: The code above is there only as an example. As noted by juzzlin it is important that you use const vector<float> &X and in general pass the vector/vector<vector> objects by reference because the data can be very large and passing it by value will make a copy of it (which is inefficient).

PanosPanos

Sounds to me like you are struggling with backprop and what you describe above doesn't quite match how I understand it to work, and your description is a bit ambiguous.

You calculate the output error term to backpropagate as the diffrence between the prediction and the actual value multiplied by the derivative of the transfer function. It is that error value which you then propagate backwards. The derivative of a sigmoid is calculated quite simply as y(1-y) where y is your output value. There are lots of proofs of that available on the web.

For a node on the inner layer, you multiply that output error by the weight between the two nodes, and sum all those products as the total error from the outer layer being propagated to the node in the inner layer. The error associated with the inner node is then multiplied by the derivative of the transfer function applied to the original output value. Here's some pseudocode:

This error is then propagated backwards in the same manner right back through the input layer weights.

The weights are adjusted using these error terms and the output values of the nodes

the learning rate is important because the weight change is calculated for every pattern/row/observation in the input data. You want to moderate the weight change for each row so the weights don't get unduly changed by any single row and so that all rows have an effect on the weights. The learning rate gives you that and you adjust the weight change by multiplying by it

It is also normal to remember these changes between epochs (iterations) and to add a fraction of it to the change. The fraction added is called momentum and is supposed to speed you up through regions of the error surface where there is not much change and slow you down where there is detail.

There are algorithms for adjusting the learning rate and momentum as the training proceeds.

The weight is then updated by adding the change

Intuition

I had a look through your code, but rather than correct it and post that I thought it was better to describe back prop for you so you can code it up yourself. If you understand it you'll be able to tune it for your circumstances too.

HTH and good luck.

datdinhquoc

1,3132 gold badges13 silver badges33 bronze badges

SimonSimon

34.2k24 gold badges78 silver badges116 bronze badges

How about this open-source code. It defines a simple 1 hidden layer net (2 input, 2 hidden, 1 output) and solves XOR problem:

Oleg ShirokikhOleg Shirokikh

1,9403 gold badges19 silver badges42 bronze badges

buxlogoboss