Quick summaries of research papers around dynamically generating network structure

I’ve been reading a lot of research papers on how the structure of ANN can be generated/detected. Here are some quick summaries of interesting papers in that area.

Dynamic Node creation in backpropagation networks

  • Looks at attempting to find a good number of hidden nodes for a network by starting small and growing the correct number of hidden nodes
  • Claims that the approach of training the network then freezing existing nodes before adding new unfrozen nodes does not work well.
  • Adding new nodes then retraining the whole network appears to work better
  • The technique: 
    • A one hidden layer network is created with predefined number of hidden nodes
    • The network is trained, when the error rate stops decreasing over a number of iterations a new hidden node is added
    • Nodes stop being added either when a certain precision is reached or when the error starts increasing on a validation set
  • Results: Seems to train in not a significantly longer amount of time than fixed networks and tends to find near optimal solutions
  • Remaining questions they want to investigate is how big the new node adding window should be and where nodes should be added in multi layer networks?

Optimal Brain Damage

  • After training a standard ANN we can potentially improve speed and generalization performance if we delete some of the nodes/connections
  • Optimal brain damage deletes parameters (sets them to 0 and freezes them from future use) based on there impact on the error of the model.
  • Results: Against the MNIST data it was shown that up to a 5th of parameters could be deleted with no degradation in the training performance and some improvement in the generalization.

The Cascade Correlation Algorithms

  • A different approach to evolving structure for neural networks. 
  • Starts with a network with just fully connected input and output nodes. These are trained using quickprop until error rate stops decreasing.
  • Then multiple candidate nodes are created connected to every node except the output nodes(eventually hidden nodes will be added) with different random weights
  • These are then all trained to try and maximize the correlation with the output error.
  • The candidate with the best correlation is selected
  • Repeat training and adding in candidate nodes until stopping point
  • Results: Appears to do very well and producing results for things like xor and various other functions that are difficult for normal ANN’s to do.
  • It has 2 problems
    • Over fitting
    • Because the eventual network is a cascade of neurons(later neurons are really deep because they depend on every previous neuron) it runs a lot slower than a normal ANN
  • Can also be extended to work on recurrent networks

Cascade-Correlation Neural Networks: A Survey

  • Presents some approaches to reducing the problems with cascade correlation.
  • Sibling/descendant cascade attempts to reduce the problem with networks going too deep.
    • There are 2 pools of candidate neurons, sibling(only connected to neurons in previous layers) and descendant(same as for normal cascade). 
    • Often siblings do get selected ahead of descendants, reducing the depth of the network
  • Says Optimal Brain Damage is effective at pruning the network after structure is discovered this helps with over fitting.
  • Knowledge based CCNN are another approach where candidates nodes can be normal functions or fully fledged ANN of there own. This approach sounds a lot of fun, would love more info on how successful it is.
  • Rule based CCNN, similar to the above but things like OR and AND operators are used. Again sounds alot of fun would love to know how it performs.