Quick summaries of research papers around dynamically generating network structure

I’ve been reading a lot of research papers on how the structure of ANN can be generated/detected. Here are some quick summaries of interesting papers in that area.

Dynamic Node creation in backpropagation networks

Optimal Brain Damage

  • After training a standard ANN we can potentially improve speed and generalization performance if we delete some of the nodes/connections
  • Optimal brain damage deletes parameters (sets them to 0 and freezes them from future use) based on there impact on the error of the model.
  • Results: Against the MNIST data it was shown that up to a 5th of parameters could be deleted with no degradation in the training performance and some improvement in the generalization.

The Cascade Correlation Algorithms

  • A different approach to evolving structure for neural networks. 
  • Starts with a network with just fully connected input and output nodes. These are trained using quickprop until error rate stops decreasing.
  • Then multiple candidate nodes are created connected to every node except the output nodes(eventually hidden nodes will be added) with different random weights
  • These are then all trained to try and maximize the correlation with the output error.
  • The candidate with the best correlation is selected
  • Repeat training and adding in candidate nodes until stopping point
  • Results: Appears to do very well and producing results for things like xor and various other functions that are difficult for normal ANN’s to do.
  • It has 2 problems
    • Over fitting
    • Because the eventual network is a cascade of neurons(later neurons are really deep because they depend on every previous neuron) it runs a lot slower than a normal ANN
  • Can also be extended to work on recurrent networks

Cascade-Correlation Neural Networks: A Survey

  • Presents some approaches to reducing the problems with cascade correlation.
  • Sibling/descendant cascade attempts to reduce the problem with networks going too deep.
    • There are 2 pools of candidate neurons, sibling(only connected to neurons in previous layers) and descendant(same as for normal cascade). 
    • Often siblings do get selected ahead of descendants, reducing the depth of the network
  • Says Optimal Brain Damage is effective at pruning the network after structure is discovered this helps with over fitting.
  • Knowledge based CCNN are another approach where candidates nodes can be normal functions or fully fledged ANN of there own. This approach sounds a lot of fun, would love more info on how successful it is.
  • Rule based CCNN, similar to the above but things like OR and AND operators are used. Again sounds alot of fun would love to know how it performs.
Exit mobile version