NEURAL NETWORK
PROJECT # 4
Problem
This problem deals with using Backpropagation algorithm to classify a given set of points in a square. The purpose of this classification is to separate out all “Type A” points to a different area (within the circle) and “Type B” to points outside of the circle. In a real life example, one can say that Backpropagation algorithm can be used in robotics to classify and separate out different patterns in the environment in which it has to navigate. For example, the classification and separation of a surface into different types of elements such as grass, dirt, rocks etc. The advantage of Backpropagation methodology is that outputs of patterns are compared to desired result to create “an error” which is fed back to the network in an iterative manner to a point where errors are minimized or zeroed, to achieve the desired output.
The architecture of the network consists of three layers – input units, output units and hidden units.
The input layer is taking the x and y coordinates of the points on the Cartesian plane.
The middle layer is the eight hidden units are the adjustable connection weights of the network. Through a process of trial and error, the errors are reduced with each iteration to reach a desired result with little or no errors.
Backpropagation algorithm 

GUI user interface 
Backpropagation Algorithm Application :
D. Describe how the training set was chosen and how the training was done?
The training set was randomly selected from both types A and B of input vectors:
trainSet[i][j] = Math.random() * 4  2.0;
The training set x and y coordinates are fed forward to the network and tested for classification errors – the desire output for type A points is (1,0) and type B points is (0,1). The errors are fed back to the network to adjust the connection weights of the network , trying to reduce the “sumsquarederrors” (SSE). The process keeps cycling through the network until the classification errors are very small or equal to zero then the network is trained.
E. Value of the “step size”:
The default step size was used in this simulation: 0.1. In Lippmann notation the involvement of step size or “gain factor” is as follow:
W_{ij} (t +1) = W_{ij} (t) + hd_{j}. X_{i} Where h is the “gain factor”
Step Size 
SSE 
Percentage 
Remarks 
0.1 
2.94 
97% 

0.3 
3.0 
98% 

0.5 
19.6 
99% 
In some cases the SSE went up to 52. Error graph shows unstable. 
0.7 
7.7 
99% 
Process went fast to completion 
0.9 
7.87 
96% 
Process went really fast to completion 
C. Value of the “momentum” term:
The default step size was used in this simulation: 0.5. This parameter has a smoothing effect:
W_{ij} (t +1) = W_{ij} (t) + hd_{j}. X_{i} + a(W_{ij} (t)  W_{ij} (t1) ).
Where a(W_{ij} (t)  W_{ij} (t1) ) is the “momentum term”.
Momentum 
SSE 
Percentage 
Remarks 
0.1 
1.08 
96% 

0.3 
5.79 
96% 

0.5 
5.81 
98% 

0.7 
0.4 
100% 
The best 
0.9 
9.07 
96% 
Error graph show unstable 
This backpropagation can be done either after each pattern that the network is shown or after the end of a set of patterns. In this program I selected to show after each point was shown. To complete the weight updater process the algorithm has to first update the weight between the input to the hidden bias first then update the weight from the hidden bias to the output.
For the network to learn the difference between types A and B, this depends on the simulation parameters such as: momentum, gain, initial weights. After experimenting with different combinations of these parameters, I came up with the conclusion that the network would take 3940 epochs to classify the types A and B. The parameters were used to get this result: gain: 0.1, momentum: 0.7 and initial weights: 0.001.
After the network was trained the percentage of time it gets the classification correct was 99%
In reality the ideal values is impossible to be achieved. For example: The network needs to go through lots of epochs before the SSE reaches the desired value that is zero (ideally). Or the output of types A and B cannot be obtained accurately as (1.0) or (0.1). Therefore, we have to define a set of thresholds to fill up the reality gaps and meet the requirements for the network fully classified. In this assignment I used decision threshold: 0.1 and training threshold: 0.1.
References: Previous student’s work and particularly Susan Zabaronick.
Wolfe, W. Neural Networks Course Materials