November 2015 – DANIEL SLATER'S BLOG

I’ve been reading a lot of research papers on how the structure of ANN can be generated/detected. Here are some quick summaries of interesting papers in that area.

Dynamic Node creation in backpropagation networks

Looks at attempting to find a good number of hidden nodes for a network by starting small and growing the correct number of hidden nodes
Claims that the approach of training the network then freezing existing nodes before adding new unfrozen nodes does not work well.
Adding new nodes then retraining the whole network appears to work better
The technique:

A one hidden layer network is created with predefined number of hidden nodes
The network is trained, when the error rate stops decreasing over a number of iterations a new hidden node is added
Nodes stop being added either when a certain precision is reached or when the error starts increasing on a validation set

Results: Seems to train in not a significantly longer amount of time than fixed networks and tends to find near optimal solutions
Remaining questions they want to investigate is how big the new node adding window should be and where nodes should be added in multi layer networks?

Optimal Brain Damage

After training a standard ANN we can potentially improve speed and generalization performance if we delete some of the nodes/connections
Optimal brain damage deletes parameters (sets them to 0 and freezes them from future use) based on there impact on the error of the model.
Results: Against the MNIST data it was shown that up to a 5th of parameters could be deleted with no degradation in the training performance and some improvement in the generalization.

The Cascade Correlation Algorithms

A different approach to evolving structure for neural networks.
Starts with a network with just fully connected input and output nodes. These are trained using quickprop until error rate stops decreasing.
Then multiple candidate nodes are created connected to every node except the output nodes(eventually hidden nodes will be added) with different random weights
These are then all trained to try and maximize the correlation with the output error.
The candidate with the best correlation is selected
Repeat training and adding in candidate nodes until stopping point
Results: Appears to do very well and producing results for things like xor and various other functions that are difficult for normal ANN’s to do.
It has 2 problems

Over fitting
Because the eventual network is a cascade of neurons(later neurons are really deep because they depend on every previous neuron) it runs a lot slower than a normal ANN

Can also be extended to work on recurrent networks

Cascade-Correlation Neural Networks: A Survey

Presents some approaches to reducing the problems with cascade correlation.
Sibling/descendant cascade attempts to reduce the problem with networks going too deep.

There are 2 pools of candidate neurons, sibling(only connected to neurons in previous layers) and descendant(same as for normal cascade).
Often siblings do get selected ahead of descendants, reducing the depth of the network

Says Optimal Brain Damage is effective at pruning the network after structure is discovered this helps with over fitting.
Knowledge based CCNN are another approach where candidates nodes can be normal functions or fully fledged ANN of there own. This approach sounds a lot of fun, would love more info on how successful it is.
Rule based CCNN, similar to the above but things like OR and AND operators are used. Again sounds alot of fun would love to know how it performs.

Month: November 2015

Quick summaries of research papers around dynamically generating network structure

Dynamic Node creation in backpropagation networks

Optimal Brain Damage

The Cascade Correlation Algorithms

Cascade-Correlation Neural Networks: A Survey