Following on from part 1 and part 2 here are even more quick summaries of research papers,
Generative NeuroEvolution for DeepLearning
- Applies HyperNEAT to deep learning
- Uses the MNIST hand-drawn digits dataset, trains both a normal deep network and a convolutional network
- HyperNEAT on it’s own performs very badly.
- Using HyperNEAT to generate a number of layers and then backprop on the final layer is achieves 58.4% for normal ANNs
- Using HyperNEAT to generate a number of layers and then backprop on the final layer for a Convolution Neural net achieved performance of 92.1%
- This paper seems to miss that one of the advantages of HyperNEAT is the ability to scale it to different sizes of input so it would be nice to see how it performans when given images with different dimensions vs a traditional approach which has to do a standard Photoshop resize and then work off that image.
- Also would love to see some details of exactly how the algorithms were implemented
- A big problem with HyperNeat is that though it can express a potentially infinite number of nodes and connection, the numbers of hidden nodes must be decided in advance
- ES-HyperNeat is an attempt to extend HyperNeat to be able to determine how many hidden nodes it should have.
- ES-HyperNeat uses the connection weights values themselves to determine node placement, areas with more complex(higher variance) weight patterns should be given more nodes.
- There is an addition called Link Expression Output (LEO) for HyperNEAT where weather or not there is a connection between 2 nodes node is not calculated from the magnitude of weights but from another evolved parameter. This also works with ES-HyperNEAT
- Algorithm is:
- When creating a connection between 2 nodes (e.g an input and an output) rather than just calculate a single weight value we create 4 hidden nodes(quad tree style) and calculate the weights for each
- if the variance of the weight is above some threshold we create all for nodes
- otherwise just have the single connection
- when doing the connection for the 4 sub nodes we may do more subdivisions(up to some max)
- Run experiments against maze navigation, multi model problems and retina problem
- Results: Outperformed HyperNeat
Deep Learning using Genetic Algorithms
- Tests using ga’s to encode and then decode features of an image
- Seems to have some success
- They encounter some problems that may be better solved using NEAT/HyperNEAT
Novelty Search and the Problem with Objectives
- Normally Genetic algorithms are trained with an objective(fitness) function and those agents that perform best against it are the selected for
- This leads to problems of local minima and will often result in lots of similar behaviors with small tweaks being selected for not allowing for the more complex sets of interactions required.
- Novelty search instead adds a measure of the distinctness or novelty or difference in the result of a behavior to the fitness function.
- Gives the example of teaching a robot to walk, a fitness function would start off with just selecting the robot that fell the furthest to the goal. Which would likely never lead to complex interesting behavior.
- But if you reward different kinds of falls some may arise that balance for a bit in nay direction over time these can adapt to move to the actual objective
- Recommends average distance to k nearest neighbors as a good measure of novelty.
- Novelty search shows very good results vs objective search in maze problems and bi-pedal walking among others
Unsupervised Feature Learning through Divergent Discriminative Feature Accumulation
- Uses HyperNEAT with novelty search as the unsupervised layer in deep learning
- Number of hidden nodes not set in advance, rather successive features are learned based on there difference from prevous features until a certain number reached
- Only 2 layer network trained, learning 1500->3000 features
- Results: Against MNIST data does amazingly well, incredibly impressive
- Looks like this is a good alternative to restricted boltzman machine
- Would love to see how it might do with more layers/convolutional architecture, scaling to different image sizes