Posts Tagged ‘ code ’

Artificial Neural Networks (ANN) – Introduction

An Artificial Neural Network is an emulation of the more complex biological neural system. Why do we need such an abstract model ? Although computing performances reached these days are very high, there are certain tasks that a common microprocessor is unable to perform. In some of these cases, the ANN approach can provide better results.

Biological Neural Network

You can see how our brain is structured.  A huge number of neurons are interconnected using their synapses. A neuron is activated  when the sum of its inputs (electrical signals) is greater than a threshold.  If a neuron is activated, it produces an output (another electrical signal) which is propagated in the network. No reaction(inhibition)  means no electrical signal.

This model can be easily copied in Computer Science. Take a look at the next images:

Brain neuronArtificial neuron

Now, we can give a better definition for ANN:  a group of simple and identical computing units (strong related to biological neurons) which operates in parallel being able to perform complex tasks. Each link between two neurons has a weight associated with. We can say that the weights  represent the neural network’s knowledge.

This approach has a lot of advantages, but also some disadvantages.

ANN advantages:

  • Fault tolerance: when an element/group of elements or links of the neural network fail, the network’s performance, in most of the cases, won’t be affected.
  • Adaptive learning (from examples)
  • A neural network learns and does not need to be reprogrammed.
  • A neural network can perform tasks that a linear program can not.
  • Real-time operating after the learning process is finished (due to the parallel structure)
  • Self organizing capacity: an ANN can create its own  organization and representation of data during the training process.
  • ANN can be implemented in any application without a great effort. Once you have your own ANN source, many problems can be mapped.

Disadvantages:

  • The neural network needs training to operate and, sometimes, training data is not available.
  • The architecture of a neural network is different from the architecture of microprocessors therefore needs to be emulated.
  • Requires high processing time for large neural networks.

For a better understanding of the way an ANN works, let’s  analyze the mathematical model of a neuron.

mathematical model of a neuron

The synapses of the neuron are modeled as weights. The strength of the connection between an input and a neuron is noted by the value of the weight. Negative weight values reflect inhibitory connections, while positive values designate excitatory connections [Haykin].

An adder sums up all the inputs modified by their respective weights.

The activation function controls the amplitude of the output of the neuron. An acceptable range of output is usually between 0 and 1, or -1 and 1.

A well-known activation function is the Threshold Function which takes on a value of “0” if the summed input is less than a certain threshold value  and the value “1” if the summed input is greater than or equal to the threshold value.

The sigmoid function uses the formula : f(t) = 1 / (1 + e^-t).

Activation functions graphics can be seen below.

activation functions

It’s important to distinguish three types of units (neurons):

  • input neurons  (i) – receive data from outside the neural network
  • output units (o) – send data out of the neural network
  • hidden units (h) whose input and output signals remain within the neural network.

During training, the weights can be updated either synchronously or asynchronously.

There are two types of neural network topologies:

  • Feed-forward neural networks, where the data flow from input to output units is strictly feed-forward, no feedback is provided for any training step.
  • Feedback networks, where the network’s output (from a training step) is backward propagated in order to update neurons weights. The most representative networks for this category are the Back-Propagation Networks, which I will detail in the next article.

The training process can be performed using

  • supervised learning – learning from a set of examples – the network is trained by providing it with input and matching output patterns
  • unsupervised learning – or self-organizing neural networks – the system must develop its own representation of the input stimuli because there is no classified set of examples to learn from. A well-known unsupervised network is SOM (Self-Organizing Map) which can be used in many domains to group (cluster) similar units. I will dedicate a new article for this type of ANN.

Finally, let’s see how an ANN looks like:

Neural Network example

Notice that the Middle Layer is called the Hidden Layer and we can have a different number of neurons on each layer and also a different number of hidden layers. Actually, there is no rule for choosing these numbers, you will have to discover by yourself what’s the best configuration for the problem you want to solve.

This method of “tuning” is used in every Machine Learning algorithm and it consists in changing different constants and running the algorithm on different data sets and finally choosing the best values.

Also, you should already know that every link from the previous image has a weight associated with. The final scope of the training phase is to set the right weights (usually initiated with small random numbers).

The Perceptron is  the simplest neural network. Actually, it is less than a neural network, it is a computational unit with a threshold value (Theta) which, for x1,x2,…xn inputs and w1,w2,…wn weights, it produces an +1 output if the sum of wi*xi (for i=1,N) is >= Theta and 0 (or -1) output otherwise.

Perceptron

Perceptron

The Perceptron divides the inputs space into 2 areas (one for points with +1 output, and one for points with 0 output).

A limitation of the Perceptron is that it can only compute linearly separable functions like the OR function:

OR-function

OR function using a Perceptron

But, even with such a primitive network, we can perform interesting tasks like edge detection, corner detection and even character recognition (using pattern matching). Here are some examples:

Corner-detection

Corner detection

Character recognition

Character recognition

These are the Neuron and NeuralNetwork Java interfaces which I’ll implement in the next article for the BackPropagation Neural Network. You are free to use them and you can try to implement a Perceptron to recognize the T letter.

/**
 * Neuron representation for NeuralNetwork algorithm
 * @param     type of data used by network
 * @author Octavian Sima
 */
public interface Neuron {

 /**
 * Initialize neuron
 * @param parent                    network which contains the neuron
 * @param inputsNumber              number of neuron inputs
 * @param initialWeightMinValue     minValue used in initial weight computing
 * @param initialWeightMaxValue     maxValue used in initial weight computing
 */
 void initNeuron(NeuralNetwork parent, int inputsNumber,
 E initialWeightMinValue, E initialWeightMaxValue);

 /**
 * Initialize neuron with default values
 * @param parent            network which contains the neuron
 * @param inputsNumber      number of neuron inputs
 */
 void initNeuron(NeuralNetwork parent, int inputsNumber);

 /**
 * Neuron reaction (computes the output generated by an input array)
 * @param input       input dataset
 * @return            output value
 */
 E compute(E[] input);

 /**
 * Neuron activation function
 * @param value
 * @return
 */
 double activationFunction(E value);

 /**
 * ActivationFunction first derivate
 * @param value
 * @return
 */
 double activationFunctionDerivative(E value);
}
/**
 * NeuralNetwork template
 * @param     type of data used by network
 * @author Octavian Sima
 */
public interface NeuralNetwork {

 /**
 * Initialize neural network
 * @param layersNumber            network's number of layers
 * @param neuronsNumberOnLayer    array with number of neurons on each layer
 * @param learningRate            neuron learning rate
 * @param learnCalmingRate        calming rate of learningRate(1.0 - constant)
 */
 void initNetwork(int layersNumber, int neuronsNumberOnLayer[],
 double learningRate, double learnCalmingRate);

 /**
 * Initialize neural network with default values
 * @param layersNumber            network's number of layers
 * @param neuronsNumberOnLayer    array with number of neurons on each layer
 */
 void initNetwork(int layersNumber, int neuronsNumberOnLayer[]);

 /**
 * Train network on a given data set
 * @param inputs            inputs data set
 * @param outputs           expected results
 * @param maxAcceptedError  maximum accepted error for training phase to stop
 * @param maxSteps          maximum number of steps in training
 * @return                  number of steps really used in training before convergence
 */
 int train(E[][] inputs, E[][] outputs, double maxAcceptedError, int maxSteps);

 /**
 * Train network on a given data set with default maxSteps
 * @param inputs            inputs data set
 * @param outputs           expected results
 * @param maxAcceptedError  maximum accepted error for training phase to stop
 * @return                  number of steps really used in training before convergence
 */
 int train(E[][] inputs, E[][] outputs, double maxAcceptedError);

 /**
 * Test network on a given input
 * @param input       input data
 * @return            result
 */
 E[] test(E[] input);
}