I’ve been reading a lot of research papers on how the structure of ANN can be generated/detected. Here are some quick summaries of interesting papers in that area.
Dynamic Node creation in backpropagation networks
- Looks at attempting to find a good number of hidden nodes for a network by starting small and growing the correct number of hidden nodes
- Claims that the approach of training the network then freezing existing nodes before adding new unfrozen nodes does not work well.
- Adding new nodes then retraining the whole network appears to work better
- The technique:
- A one hidden layer network is created with predefined number of hidden nodes
- The network is trained, when the error rate stops decreasing over a number of iterations a new hidden node is added
- Nodes stop being added either when a certain precision is reached or when the error starts increasing on a validation set
- Results: Seems to train in not a significantly longer amount of time than fixed networks and tends to find near optimal solutions
- Remaining questions they want to investigate is how big the new node adding window should be and where nodes should be added in multi layer networks?
Optimal Brain Damage
- After training a standard ANN we can potentially improve speed and generalization performance if we delete some of the nodes/connections
- Optimal brain damage deletes parameters (sets them to 0 and freezes them from future use) based on there impact on the error of the model.
- Results: Against the MNIST data it was shown that up to a 5th of parameters could be deleted with no degradation in the training performance and some improvement in the generalization.
The Cascade Correlation Algorithms
- A different approach to evolving structure for neural networks.
- Starts with a network with just fully connected input and output nodes. These are trained using quickprop until error rate stops decreasing.
- Then multiple candidate nodes are created connected to every node except the output nodes(eventually hidden nodes will be added) with different random weights
- These are then all trained to try and maximize the correlation with the output error.
- The candidate with the best correlation is selected
- Repeat training and adding in candidate nodes until stopping point
- Results: Appears to do very well and producing results for things like xor and various other functions that are difficult for normal ANN’s to do.
- It has 2 problems
- Over fitting
- Because the eventual network is a cascade of neurons(later neurons are really deep because they depend on every previous neuron) it runs a lot slower than a normal ANN
- Can also be extended to work on recurrent networks
Cascade-Correlation Neural Networks: A Survey
- Presents some approaches to reducing the problems with cascade correlation.
- Sibling/descendant cascade attempts to reduce the problem with networks going too deep.
- There are 2 pools of candidate neurons, sibling(only connected to neurons in previous layers) and descendant(same as for normal cascade).
- Often siblings do get selected ahead of descendants, reducing the depth of the network
- Says Optimal Brain Damage is effective at pruning the network after structure is discovered this helps with over fitting.
- Knowledge based CCNN are another approach where candidates nodes can be normal functions or fully fledged ANN of there own. This approach sounds a lot of fun, would love more info on how successful it is.
- Rule based CCNN, similar to the above but things like OR and AND operators are used. Again sounds alot of fun would love to know how it performs.