Quick summaries of research papers around NEAT Part 2

Daniel Slater

11 years ago

Continuing on from my previous post here are some more quick summaries of research papers.

Evolving Reusable Neural Modules

Attempts to improve on NEAT by dividing the evolved nets into modules.
These modules are in themselves smaller neural nets with input, output and hidden nodes.
Blueprints are used to combine the modules into the final neural nets. Blueprints contain lists of modules to be used and mappings from the real inputs/outputs and the module input/outputs
Both blueprints and module are evolved and speciated.
The idea is having modules reduces the number of dimensions in the search space. It is somewhat analogous to the modules being chromosomes and blueprints being arrangements of chromosomes.
Experiment: NEAT and modular NEAT were run on a board game(very roughly like go).
Results: Modular NEAT was seen to evolve better solutions and for those better solutions to appear about 4 times faster. Though this level of improvement is possibly quite tied to the kind of task being learned.

Transfer Learning Across Heterogeneous Tasks Using Behavioural Genetic Principles

Transfer learning is applying learning in one domain to a related but different domain, e.g. speech recognition on male voices, to speech recognition on female voices.
4 challenges of transfer learning:

Successfully learning related tasks from source tasks
Determining task relatedness
Avoiding negative transfer
More closely imitate human learning

Approach steps:

Have a set of potentially related tasks, choose one as the source, we aim to learn them all
Have all the parameters for a neural nets encoded for a genetic algorithm, including number of hidden nodes and learning rate.
Create a population of x pairs of identical neural nets and x pairs of neural nets where 50% of genes are shared(between pairs)
unique training sets are created for each individual in the population by randomly filtering out a subset of the training data.
Train every individual on the source task and then each other task independently.
After training measure the results and calculate how much of the performance was down to genes and how much down to environment by comparing the performance of identical and non-identical twins.
Select from the identical twin population taking into account how much of there performance was down to genes rather than environment.
Select until convergence

Tasks where

Learning past tenses of English words
Mapping patterns to identical patterns
Categorizing patterns into
Patterns with errors
Arbitrary patterns, since random should be no generalization

Results: The networks were able to use direction of change in heritablity(performance resulting from genes), to indicate task relatedness.
Related tasks were learned better than using standard methods
Would be interesting to see how NEAT would work with this method?

Transfer learning approach for financial applications

This uses the approach from the above paper on 3 pieces of financial data.

Statlog – Australian credit approval
Statlog – German credit data
Banknote authentication

Results: Seems reasonably successful

How Important is Weight Symmetry in Backpropagation?

When they say weight symmetry they are referring to the weights used in a network feeding forward vs the weights used when doing back propagation.
Interesting food for thought is if weight symmetry is not important this could mitigate the vanishing gradient problem in deep neural nets…
They run 15 different data sets in the experiment all of which may be worth looking at for other experiments.
Results: Seemed pretty convincing that weight symmetry was not important, in particular an update rule they called Batch-Manhattan actually outperformed standard SGD.
Batch-Manhattan update rule i:

mini_batch = [x for x in order(datasetsamples, lambda x : rand.Next()][:mini_batch_size] #select a random set of samples to be our mini-batch

update_magnitude = -sign(sum([weight_derivate(x) for x in mini_batch]))*momentum * previous_update_magnitude – decay * current_weight
new_weight = current_weight + learning_rate * update_magnitude

previous_update_magnitude = update_magnitude

One thing to node about the above is that the function weight_derivative above potentially does use the weights in the back propagation step. This is where I would love to see the actual source used to generate this results.
Though the magnitude of update was not found to be that important the sign (unsurprisingly)was.
Would love to see more analysis of how remove the weights in back prop might affect very deep networks.

Evolving Reusable Neural Modules

Transfer Learning Across Heterogeneous Tasks Using Behavioural Genetic Principles

Transfer learning approach for financial applications

How Important is Weight Symmetry in Backpropagation?

Share this: