The Big Bang vs. The Kluge: perturbing our learnome

Connectome (http://www.humanconnectomeproject.org) is a term often used to indicate the set of connections between the neurons in our brain, a sort of highway mapping of the brain. The quest for a connectome is still eluding the scientific community, but more importantly, perhaps, it is the search for a ‘learnome’ or an ‘algorithm-ome’ i.e. the set of learning algorithms which our brain operates with, how neuron assemblies ”learn” and are able to derive general principles from sparse examples.

Neural networks have been around for decades, and their performance has steadily increased. Deep Learning works by modeling neurons using interconnected nodes whose connection weights are iteratively defined through different learning rules, depending on whether we apply them for supervised or unsupervised learning. From back-propagation to Restricted Boltzmann machines, artificial neural networks seek to emulate brain operation by creating networks of nodes that learn new rules to be able to provide better predictions.

AlphaGo, the Google artificial intelligence that has beaten the Go world champion Lee Sedol in a 4-1 match, and recently appeared and dominated in some 50 odd master matches under pseudonyms, has attracted much attention in the media similar to the result of the 90’s when the IBM computer DeepBlue beat chess world champion Kasparov. However, there is a major difference: AlphaGo has not been taught how to play Go by smart programmers, but instead learnt by playing many thousand of games against itself. This is so-called reinforcement learning.

While AlphaGo has the ability to learn, i.e. to modify its behaviour to better its results, DeepBlue was simply applying the rules it had been taught. DeepBlue was acting like an animal following its instinct, while AlphaGo has the ability to modify its ‘beliefs’.This means that DeepBlue was simply choosing from a set of pre-defined rules: it did have the ability to know ‘which’ rule to choose and ‘when’, but could not create new ones. On the other hand, AlphaGo could also decide whether old rules (or behaviours) would suffice or new rules needed to be ‘invented’ to satisfy a new requirement or a new playing situation.

During the early 80’s, Australian jewel beetles were found trying to mate with empty beer bottles (http://onlinelibrary.wiley.com/doi/10.1111/j.1440-6055.1983.tb01846.x/epdf). Apparently the colour and shape of the discarded beer bottles were attractive for the beetles, no matter that the bottles were not beetles. Instinct leads some animals to perform the same task regardless of whether it ‘makes sense’ or not, due to the inability of those animal to learn new behaviours.

Intelligence, instead, allows for learning, i.e. the ability to gather new information to be processed to acquire a new understanding of reality, possibly triggering a different response. A standard computer program reflects a jewel beetle, in that it will repeat the same behaviour over and over (and, if this ‘bug’ has a bug, it will crash at the same identical point). A classic chess computer program may end up losing the same game over and over repeating the same identical mistakes. On the other hand, a computer program based on deep learning and artificial intelligence will learn from its own mistakes and will try to correct them.

However, despite huge advances in neuroscience, we are still far away from being able to build a computer as intelligent as a human. It is true that AlphaGo was able to beat Lee Sedol, but it is also true that AlphaGo cannot drive a car, or summarise the content of a book, for example. It is also true that, should the rules of Go be changed slightly, or should the board size be changed, it is likely that Lee Sedol would be able to learn the new rules faster than AlphaGo and therefore get his revenge. This is the difference between “AI” and Artificial General Intelligence (AGI) to which Gary Marcus, a professor of psychology and neural sciences at New York University who was also CEO of Geometric Intelligence, and other researchers refer.

Perturbation theory is a method for finding approximate solutions to a problem by modifying the exact solution of a related, but easier problem and by perturbing the solvable problem. Basically, one or more perturbations approximate the more difficult problem building on the simple, solvable ‘core’ of known, soluble problems. Along these lines, ‘playing on a slightly-modified game board’ can be seen as a different problem, although close to the problem whose solution we have found. While humans seem to be good at adapting to small semantic ‘perturbations’, it is unclear how to define a neural network to be able to similarly adapt. In other words the ‘learnome’ must be able to switch on different subroutines to find approximate solutions starting from others that are known, and this must be done not only with the existing data, but with the rules that manage the data as well

Computers can do many things better or faster (or both) than humans, largely they focus on a single well-defined task, but they still lack the ability to generalize their abilities and their learning in order to adapt to different rules, different situations. Humans are remarkably flexible in adapting to new tasks, unlike any of the neural networks we have built so far.

This is due in part to our inability to operationally define ‘generalizing’ and our knock-on inability to build that generalizability into machines As we model the full human brain, it is clear we won’t be able to immediately build machines that match its learning powers. The lack of algorithmic targets to calibrate our nearing this objective makes our task more difficult and our progress less clearly definable and measurable. A rigorous perturbation theory of learning would clear the way to allowing for better measurable states and calibration of the networks.

Much of why we cannot build machines that think like we do, is simply that we do not fully and deeply know how we think. Philosophers and natural scientists long ago understood the basic outline for brain behavior (i.e. association), but, below that level, there is no general consensus on how the brain works, or what general rules it follows to learn. The hope is that understanding the ‘learnome’ will allow us to understand how the brain works and what is general rules the brain uses to create a model of reality that it applies to different situations.

The search for a ‘general rule’ or an organizing structure is something that science strives to achieve in all disciplines. For example Chemistry has its periodic table and for Biology the discovery of the DNA structure has been the holy grail allowing us to decode the genetic coding of life. Even more importantly, the periodic table did not only echo the world as it was known then, but it went further, using a sensed pattern to predict as-yet unknown elements which were in fact found later. It was well received because of its power to generalise. In this way, though shaped by the external world, our brain has the ability to accept and generate models that do more than simply match direct experience — they generalize coherently in unique ways what we do not yet fully understand.

This process of generalising through the conception of unifying theories also applies in Physics, where the search for a theory unifying the physical forces has been the quest of many generations of theoretical physicists.

In Physics we study a reality comprising subatomic particles that mediate fields through which other particles interact, and we are looking for a general theory that can elegantly explain all phenomena. As Hans Christian Oersted demonstrated the similar natures of electricity and magnetism, we can now prove through powerful particle accelerators many physical theories. Though we do not have a unique model, relativity and quantum mechanics proving difficult to be merged into a single theory, there is a general optimism that we can one day achieve a unifying theory in Physics. If, as we believe, our universe was created by a ‘Big Bang’ 13.8 billions of years ago, it’s possible to believe that from this unique event a single set of laws emerged and that is what shapes our current universe.

Unifying rules are something humans naturally tend to look for. Our brain, after all, is quite good at generalisations, and we can quickly extrapolate the most important facts; we can, in other words, see a forest where there are thousands of different trees. It is then our own nature to look for the underlying abstraction of the reality we come in touch with.

Many hold similar hopes of finding a unifying set of laws when trying to understand how our brain works, they hope to find a similar ‘grand rule’ that governs our thoughts, a hard-wiring of our brain that is suited to quickly create an internal model of reality to process information and make predictions. However, finding this ‘grand rule’ has so far proved elusive. Gary Marcus has described the brain as a ‘kluge’ insofar as it may not follow a single simple rule, but may be made up of many different algorithms patched together over time which are then applied to different situations. This makes sense if we think that our brain was not created in a ‘big bang’, but has undergone millions of years of evolution and layered new algorithms on top of others.

It is like creating a computer that is made up of different parts: one that knows how to drive a car, another one that can tell cats from people, and another one that can play Go, all in one. This machine would be better than any human at any of these tasks, but it is unclear if it would be better at adapting to yet further new situations. An analogy that is often made is that with Google, the search engine. While the undergoing major algorithm is the PageRank, Google keeps its dominance in the search engines competitions by adding many different sub-routines that also check for other factors, like the location of where the search is performed. Similarly, our brain may have developed through millions of years of evolution thousands of sub-routines to quickly adapt to different tasks.

Unifying rules, though, apply to different layers of reality: for each layer of complexity we adapt a local theory that best approximates it. In other words, we don’t explain elasticity starting from the physics of particles, but by using simple general laws about elasticity of different materials. This means that even if we did find a standard model unifying and predicting all known laws of Physics, we would still use a kluge of different theories depending on the complexity of the problem at hand, over which we might apply a perturbation to solve harder and more complex problems as for the three body problem (the number of closed-form solutions to which keeps growing, incidentally, due to supercomputer numerical analysis.)

Most of the functionality we humans can think of are in the realm of the physical reality we live in, and therefore we naturally should excel at it, since we were naturally selected for it. After all the jewel beetles of Australia may have soon gone extinct if laws had not been enacted to change the shape of beer bottles. Similarly, it makes sense we are not able to quickly remember a string of 15 digits we quickly glanced at, since, in the wild, it is unlikely that such pattern recognition would make the difference between life and death. On the other hand, spotting the differences between a cat and a tiger — or a friend and an enemy — is much more significant for our own survival. Humans are good at extrapolation and inferrals of new notions from our knowledge, i.e. we excel in inductive reasoning more than deductive reasoning, which is what computers do faster than us. Inductive reasoning is a better model for survival than deductive reasoning, (and including abductive reasoning adds yet more power) but our computer models tend to follow a deductive reasoning model.

Our brain is made of hundreds of different and highly specialised cell types, which would hardly make sense if there were a general single rule and, in addition, cognitive science has identified many specialised circuits in the brain.

On the other hand, billions of different people have brains that do work similarly, therefore implying the existence of a set number of rules under which our brain operates. Like a general search engine, they all work on some analogy of the PageRank algorithm, which is the general underpinning of search engines. It is true that Google may have some more aces up its sleeve and outperform its competitors by applying a ‘kluge’ of algorithms to better its prediction, but that does not mean it does not have a general underlying algorithm. In addition, while evolution has likely added many sub-routines on top of others, it has evolved along the same general evolution lines, without throwing out old ones. All beings need to be able to recognise reality (though not necessarily consciously) in order to fetch food, water, flee from danger, and mate. The same underlying mechanism must be in force for all organisms, even though humans may have a few more arrows in their quiver.

The human brain may appear confusing, disordered, a set of unrelated and competing rules, a mix of competing and confusing feelings wanting us to quit and soon after light a new cigarette, but it can also be a logical machine able to abstract mathematical rules from the observation of leaf arrangements in plants. It follows logical rules in learning but is not likely to be a single, unified, clean machine.

It is an irony that the machine most likely to be the best at generalising will be made up of a ‘kluge’ of different algorithms. This may well be what is needed to make sense of a reality layered through different expressions.

(A collaboration with Ross Mohan, CEO of Real Data Machines)

Valentino Zocca

The Big Bang vs. The Kluge: perturbing our learnome

Leave a Reply Cancel reply