Even more quick summaries of research papers

Following on from part 1 and part 2 here are even more quick summaries of research papers,

Generative NeuroEvolution for DeepLearning

Applies HyperNEAT to deep learning
Uses the MNIST hand-drawn digits dataset, trains both a normal deep network and a convolutional network
Results:

HyperNEAT on it’s own performs very badly.
Using HyperNEAT to generate a number of layers and then backprop on the final layer is achieves 58.4% for normal ANNs
Using HyperNEAT to generate a number of layers and then backprop on the final layer for a Convolution Neural net achieved performance of 92.1%

This paper seems to miss that one of the advantages of HyperNEAT is the ability to scale it to different sizes of input so it would be nice to see how it performans when given images with different dimensions vs a traditional approach which has to do a standard Photoshop resize and then work off that image.
Also would love to see some details of exactly how the algorithms were implemented

A big problem with HyperNeat is that though it can express a potentially infinite number of nodes and connection, the numbers of hidden nodes must be decided in advance
ES-HyperNeat is an attempt to extend HyperNeat to be able to determine how many hidden nodes it should have.
ES-HyperNeat uses the connection weights values themselves to determine node placement, areas with more complex(higher variance) weight patterns should be given more nodes.
There is an addition called Link Expression Output (LEO) for HyperNEAT where weather or not there is a connection between 2 nodes node is not calculated from the magnitude of weights but from another evolved parameter. This also works with ES-HyperNEAT
Algorithm is:

When creating a connection between 2 nodes (e.g an input and an output) rather than just calculate a single weight value we create 4 hidden nodes(quad tree style) and calculate the weights for each
if the variance of the weight is above some threshold we create all for nodes
otherwise just have the single connection
when doing the connection for the 4 sub nodes we may do more subdivisions(up to some max)

Run experiments against maze navigation, multi model problems and retina problem

Normally Genetic algorithms are trained with an objective(fitness) function and those agents that perform best against it are the selected for
This leads to problems of local minima and will often result in lots of similar behaviors with small tweaks being selected for not allowing for the more complex sets of interactions required.
Novelty search instead adds a measure of the distinctness or novelty or difference in the result of a behavior to the fitness function.
Gives the example of teaching a robot to walk, a fitness function would start off with just selecting the robot that fell the furthest to the goal. Which would likely never lead to complex interesting behavior.
But if you reward different kinds of falls some may arise that balance for a bit in nay direction over time these can adapt to move to the actual objective
Recommends average distance to k nearest neighbors as a good measure of novelty.
Novelty search shows very good results vs objective search in maze problems and bi-pedal walking among others

Uses HyperNEAT with novelty search as the unsupervised layer in deep learning
Number of hidden nodes not set in advance, rather successive features are learned based on there difference from prevous features until a certain number reached
Only 2 layer network trained, learning 1500->3000 features
Results: Against MNIST data does amazingly well, incredibly impressive
Looks like this is a good alternative to restricted boltzman machine
Would love to see how it might do with more layers/convolutional architecture, scaling to different image sizes