The following is a rough transcript which has not been revised by The Jim Rutt Show or by Ken Stanley. Please check with us before using any quotations from this transcript. Thank you.
Jim: Today’s guest is Ken Stanley. Ken leads a research team at open AI on the challenge of open-endedness. He was previously professor of computer science at the University of Central Florida, was also a co-founder of Geometric Intelligence, Inc, which was acquired by Uber. Welcome, Ken.
Ken: Thank you, Jim. Glad to be back.
Jim: It’s great to have you back. Ken also appeared recently on EP 130 wherein we discussed his extremely interesting book “Why Greatness Cannot be Planned” not technical, but some big ideas. If you like this episode, you might want to check that one out too. I really enjoyed that book and our conversation. Ken is also one of the leading practitioners and thought leaders I dare say so in the domain of neuro evolution, which is a topic we’re going to talk about today. Those people who listen to the podcast know that this is one of my obsessions neuro evolution.
Jim: When I retired from business back in 2001 and set up my lab, the very first project I took on was to design a neuro evolutionary system to grow players for the game of Othello. And there was even a New York Times article written about it, something like internet CEO Dodger what it said. When I first saw the things actually getting smarter on their own, the hair on the back of my neck stood up because it was like, wow. I had been reading for a couple of years about evolutionary algorithms that even slipped off the gecko one year, but this was my first attempt to actually build something and it worked and it’s been my kind of go-to tool.
Jim: Neural evolutions I used to say it’s the weakest of all methods, but can have some tracks on almost any problem. So anyway, it’s a real honor to talk to Ken Stanley, one of the dudes from the field. So welcome. We’re going to talk fairly broadly today, but a recent paper that Ken was coauthor on is one of the anchors. I would say the conversation it’s from nature of machine learning and it’s called designing neural networks through neural evolution. Ken was one of four authors. The others are Jeff Clune, Joel Layman and Risto Miikkulainen. Is that how you pronounce his name?
Ken: I’m probably going to do a Vegemite think it’s Miikkulainen.
Jim: I know he taught me how to say it once we’ve met a few times. In fact, we had him at, at the Santa Fe Institute at a workshop on evolutionary computing for three days. But yeah, it’s a tough one. You either have one too many or one too few syllables.
Ken: In Grad school he was my advisor and everybody just called him Risto so we didn’t even know didn’t even bother to try.
Jim: Risto kind of like the godfather neuro evolution. It’s amazing how many people you track their career back and they start out at the UT lab. You were one of them, right?
Ken: It was a great place for that.
Jim: In fact, my own work, I actually based it on work of two other people there, Brad Fullmer and David Moriarty.
Ken: Right, right. Predecessors for me.
Jim: They had something marker based in coding, which was kind of a simple minded, but remarkably general method of folding a linear genic algorithm sequence into a neural net that had both arbitrary weights and arbitrary typology, which was kind of fun. So anyway, before we get started, let’s talk a little bit about some definitional things. What is neuro evolution kind of at the highest level? And then we’ll kind of dig down from there a little bit before we start talking about the current state of play.
Ken: So neuro evolution is basically a combination of two fields. One of them is neural networks, which these days, everybody calls deep learning, but neural networks, which means artificial neural networks, which are inspired in some loose way from the fact that we have neural networks inside of our head, which we call our brain. And of course that’s been a real focus of excitement in recent years in artificial intelligence in general.
Ken: And then the other side of neuro evolution is evolutionary computation, which means that you try to write programs that take some inspiration from how organisms evolved over the course of natural evolution. But now try to translate that into an algorithm so that you can actually evolve things inside the computer. And so what neuro evolution is, is basically trying to evolve these neural networks, or you could think of it as kind of like evolving brains. And in that sense, it’s a bit of, it’s a bit inspired by just the idea that brains did evolve. And so it’s kind of intriguing when you think about, I mean, I think that’s why I initially found it very intriguing. It’s intriguing to think that we could actually do something like what happened in nature inside of a computer and evolve hopefully increasingly complex brains.
Jim: It is pretty cool. Now it is interesting that the field, particularly of neuro evolution leaving neural computing has always been kind of a bit of a sideline. It found some glory in the nineties and things like machine control and I know it had a lot of interesting results in robotics, et cetera, but it’s never quite kind of gotten the glory that some of these other techniques have, even though it’s remarkably general and remarkably powerful, if you’re willing to dedicate enough computation to it. In fact, one of your colleagues, Jeff Clune wrote an article, I couldn’t find get the title, which he basically recommended if you’re working in this field, try not to use the word evolution.Any theory on why it’s kind of been a quiet space?
Ken: That’s fair, that’s a fair characterization. To explain things like that it’s complicated. Cause it’s really a historical question ,like why did the field develop the way that it did? I think part of what happened in over the history of artificial intelligence is that there was a lot of fascination and respect for formalization and the ability to in effect prove things. And one of the things about evolution, when you think about evolution is that it’s sort of like contrary to that kind of philosophy. It’s more kind of fuzzy and fuggy and you sort of just say, well, let’s just throw these things, these ingredients together and hope for the best. And so it feels a little ad hoc in that word ad hoc is not, not really something that people want to be associated with.
Ken: And so it kind of grew up outside the mainstream because of that, like the field of evolutionary computation in general, outside of the mainstream of AI or machine learning. And I think that there was a certain sub-community of people who just were fascinated despite all that, by the idea that you could evolve a brain. So it didn’t really matter all of these things. And that was a small community though, because it was hard to really break into the field if that was how you dedicated your time just to, to evolving stuff. And so when you combine evolution with neural networks, I think of a curve ball in general for the field because there is, neural networks have become extremely popular. But evolution remains kind of on the outside and so it takes a little processing for people to decide what their position is on that.
Ken: But I think the real issue though, is that it’s very important to articulate motivations. Like why are we doing this? Because I think ultimately why there wasn’t more initial interest than there was, was because there was a failure perhaps to explain or to motivate, why would we want to evolve a neural network? Because basically the response would be that there are better things to do that are more principled. Like you can follow a gradient, the gradient based algorithms like back propagation and it’s principles because we can compute the gradient and then we know which direction to go rather than what people would say it’s just random, like flipping around and mutations and just rolling the dice and hoping for the best. And the problem is that that deserves a response, but it doesn’t, it didn’t really get a response. So we’re sort of left with this assumption.
Ken: That’s not really true that basically neuro evolution is an optimization algorithm, which is inferior and that there’s like good optimization algorithms and there’s these kind of like roll the dice type of random stuff. But the real issue is that neuro evolution to me is not really about optimization neuro evolution is about how did we get something astronomically complex out of effectively nothing and in completely unguided process. And it’s true that it took maybe a billion years or more than a billion years, but it’s still an incredible story.
Ken: It’s the only way we’ve ever gotten something like our brain with its 100 trillion connections. And we want to understand, how is this possible? And the face of it, it doesn’t sound like it could be possible. There’s the gap between our nearest ancestor with the apes, which is something like 300,000 generations ago is pretty small, like 300,000 generations to get to a human brain. That’s a pretty strange thing. And so to understand how that few steps can be involved in such profound changes. That’s the kind of thing I think neuro evolution is about and ultimately understanding how complexity arises from simplicity.
Jim: My favorite topics, one of the reasons I got sucked into the area, cause it just seemed like, wow, I mean, this is, we have an existence proof, right? We know that high intelligence came from pond scum. Right. And it got there via evolution. At least if you’re not a believer that somewhere along the line and then a miracle occurred, which I am not. And so yeah, why not do some work on seeing if we can duplicate this one existence proof that we have, and then I will just make one little factual adjustment to what you said. You talk about billions of years of evolution that’s in some ways, even more amazing because the neuron only arose about 550 million years ago at the latest, right at the time of the Cambrian explosion. Some people argue just before some people say as part of, so it’s a mere 550 million years that we’ve gone from the simplest responsive neuron to humans. And that’s not all of that much time.
Ken: It’s remarkable. And you ultimately have to think about it generationally is like how many generations have passed. And like it’s getting to the point where the number of generations is not out of the possibility of actually running on a computer that many generations. And so yet I don’t think if we did run an algorithm today for that many generations, it would develop a human brain. So, so in some way we’re missing something and that’s, what’s intriguing about it. It’s like, what is it that we don’t know here? Like evolution is a theory that’s kind of well articulated in a biology textbook, but that’s different from actually understanding how to implement it, like to implement it, to be able to do something similar in a similar level of grandiosity is like all of life on earth. We don’t know how that’s possible. And that to me was probably inspiring initially that, that there’s some really missing information here that could be low hanging fruit, at least some of it where we might discover some insights that ultimately help us in the road to AI as well since after all the evolution led to intelligence.
Jim: Very cool. Let’s do a couple of definitional things for our audience before we get started. You mentioned that this field of neuro evolution is the junction essentially of evolutionary computing and neural nets. Maybe if you could start with describing the simplest generally used evolutionary algorithm, the genetic algorithm, which was developed by my dear friend now past and colleague at SFI John Holland. And so if you could describe a genetic algorithm in just the simplest basic form, I think that would be useful for the audience to understand what we were talking about later.
Ken: Sure. And I didn’t know you, John Holland so that’s cool.
Jim: Oh yeah. He was one of my dear friends. Great guy.
Ken: Wow. Really interesting. So to try to do justice to the genetic algorithm, it’s a kind of very simple attempt to take inspiration from what you see in nature in evolution. And you can think of it a little bit like an automatic breeding algorithm. It’s like if you take a group of things and they are encoded in something that’s like DNA. So we would call that a genome, but we’re talking about artificial stuff. So in the computer, you have a genome. So if that’s too abstract, maybe you can think about it as the genome is a description, which explains what something is. So maybe it’s like a description of a body plan or something like that. Maybe we were trying to evolve robot body plans.
Ken: Let’s just say for the sake of example. So we have these little descriptions each one’s a little bit different, maybe it’s different, slightly different parametrization. And we could say like let’s generate a hundred of these like randomized descriptions. They’re pretty terrible probably cause they’re randomized, but because of statistical variation, some are better than others. And so what we’re going to do is just like, what breeders would do is we’re going to pick things based on how good they are or how fit they are, I guess you would say in, in evolutionary parlance. And so we’re going to sort of rank them or score them. And then we’re going to say, let’s choose some subset of these to be parents and those parents are going to have children and we can do this in a computer because we can take those genomes, which are just little descriptions and we can perturb them or mutate them slightly to get a slightly different description.
Ken: And of course we don’t know whether it’s better or worse because it’s a mutation, but sometimes it is better. And so in the next generation, and also with crossover, we can combine two things as well. Like when you think of two parents and a child, so we get, can you have crossover and mutation and in the next generation, we let those chosen parents have children, they produce these children. Then we can test all of those and see what their fitness is or how well they perform. And so now we have slightly different robot body plans that are in some cases slightly better. And we do the breeding process again, and then we have another generation. And so now we’re basically just breeding over generations automatically in effect. And that’s what the genetic algorithm accomplishes. And it’s really cool if you think about the fact that if you just run this thing, there’s no human intervention, no human in the loop. Eventually you expect that things are going to improve just on their own and you’ll have something better at the end.
Jim: I usually do. And it’s actually interesting when I actually got down to writing my own GA code and this may go back to why the somewhat disrespect for the field, the actual technologies, remarkably simple to write a basic GA, I think it took me two hours to get it working, right. It took me probably half an hour to write the algorithm. And then of course my fat finger fumbling and idiot misunderstandings took me another hour and a half to debug.
Jim: But yeah, I literally had a simple GA running in an hour and a half or two hours and adding different kinds of crossover algorithms and different mutation rates and thinking about it as a circular or linear and you know, all those things are actually very easy. And so that may actually be one of the reasons that the lack of respect is too goddamn easy so at least to do the inner algorithm, but that, of course isn’t where the magic lies. So on the other side of it, let’s talk about neural nets. You know, talk very briefly about the neural net as a summation and output mechanism. And then maybe a little bit about the current state of the art of back prop gradient the sentence to cast it gradient descent.
Ken: So neural networks are, like I said, at some level kind of inspired by brains and how they work. And this statement is kind of endlessly controversial, unfortunately. So lots of people love to then jump up and say, it’s nothing like the brain at all. And other people might argue actually there are similarities to brain, but I don’t think it really matters. It’s just inspired by that at some loose level, it does have an appealing computational property that seem in some ways reminiscent of what brands can do. Nobody I think really thinks is identical to what brains do. We can try to push in that direction though. But so how it works with a neural network is that you can think of the fundamental unit in a neural network as, as what we might call a neuron. And so obviously it’s not the same as a biological neuron, but we use that nomenclature.
Ken: And so the neuron is a little computational unit that if it is sufficiently excited, it itself will fire. And it will output something that we would call its activation level. And so the neurons then are connected in a network. And so some neurons can have connections into other neurons and so if a neuron that’s excited has a connection into another neuron downstream, then that excitement can then trigger the neuron that’s further downstream. And those connections generally have something called weight, so it basically says well how strongly does the signal going along this connection influence the target of the connection and those weights you can think of as ultimately determining the functionality of the network. So there’s all of these different connections between all these different neurons signal is moving down from the inputs to the outputs and the inputs you could think of like sensors, like what you hear, what you see.
Ken: And the outputs are like control signals, like movement or something like that, or talking is another kind of output. And then the cascade of activation which travels from the inputs, the outputs ultimately determines the behavior. And it’s the weights that determine ultimately what that behavior will be. And so weights are sort of what we’re looking for when we talk about neural networks is like, that’s the if do you want this kind of stimulus to cause this kind of response? Well, it depends on whether there’re weights between those two points and what those weights are. And so what, what then happens is that if you think of like some big constellation of neurons, all connected to each other, how can you make it do anything useful? Of course, just connecting a bunch of things together doesn’t make it do anything you can send activation through that mess, but it’s just spaghetti.
Ken: So it’ll just do something arbitrary and random. And so the job of algorithms that try to optimize these networks to make them functional is to find weights that actually do the job that you want them to do and so finding weight is the problem. And two there algorithms for doing that, like gradient descent that you just mentioned and these algorithms. So in the sense that the algorithm probably the most popular algorithm or stochastic gradient descent, which basically is about saying, if I know what I want the output to be, so say I have some examples, I can show it like I can say, what is this? Is this a dog or a cat? And then it answers the question with like a dog or cat response. And then I can say, what was it right? Or was it wrong? Of course in initial randomized neural network will be wrong most of the time.
Ken: But what I can do then is I can say, well, I know what the answer should have been, so I can compute the difference between what it said and what it should have said. And then I can do something called back propagation, which basically means sending the signal back from that difference. So I’ve got this like Delta, which says what it should have been I can send it back through the network and compute for every neuron in the network. I can push it or nudge it a little bit more towards what it should have done by altering the weights that go into that neuron. And if we keep on doing that over and over again, iteratively, then we can gradually nudge the neuron towards the kind of behavior we want. All that, not the neuron, all the neurons so the whole neural network. And we then have the hope that it will eventually be good on what we just showed at which we call the training set.
Ken: And the real hope is that it will generalize to something like we would call a test set, which means that things it hasn’t seen when it was trained. And so if you give enough examples of dogs and cats than if I show it a cat, it never saw before, hopefully it’s generalized. [inaudible 00:19:21] is what we would say and hopefully at that point it knows what a cat is. And there’s a little bit of magic there, I think, which is interesting that you’ve just shown a bunch of examples, you never have to explain anything. You don’t know how it internally works. It just automatically through gradient descent, optimize those weights. And amazingly, you’ve got a cat detector at the end. And so that’s why this is exciting I think to a lot of people.
Jim: Well Done. So now we get the neural evolution instead of using back prophecy, the core case, we’re using evolution to discover weights and topology sometimes let me try to explain it. You can correct me. So we basically have a linear string of numbers let’s say, and some coded form. And because they’re a linear string, we can do crossover and mutation on them using the genetic algorithm to find the better ones and have them reproduce with the other better ones. And then we have some form of an interpreter which can interpret those strings and turn them into a neural net essentially and then we can run the neural nets, see how they do, and then find the neural nets that do best and then do our crossover mutation by some, one of the various rules, tournament rules, random with weights, et cetera, method of how we do reproduction. And that combination of having a linear version of the neural net and interpreter into its network form, then running those forms, grading their performance, and then running evolution back on the original strings is essentially the simple-minded core of neural evolution. Is that fair enough?
Ken: That is fair. So there’s a lot of subtleties here though. When you talk about this, because the real question I think that it raises for people in machine learning is okay well, you can do that, but why would you want to do that? You know, we have gradient descent. And so ultimately like this linear thing, which, which I would call the genome, which has like, basically some kind of encoding of the weights so as an alternative to back propagation, what we’re talking about here is that the weights are now in the genome and we’re evolving those genomes. We’re basically evolving the weights of the network in the simplest case. And so you’d say, well, why, why even do that? We can just use gradient descent, and then we can make principal perturbations instead of like just random perturbations, just hoping for the best it’s still going to can work.
Ken: Of course, because with selection is not an entirely random process. That’s important actually to recognize for people who think evolution is random it’s definitely not a random process. It’s a stochastic process and so there are a non-random elements here, like the selection of the best stuff. I mean, that’s obviously not a random event. And so it’s not a random process in the end and you can expect it to do things like optimize, but the question is, is it as efficient as something like back propagation, I think for people in machine learning, but then you have to sort of say, well, maybe though that kind of perspective is missing the real point here because evolution isn’t out there to be like the best optimization algorithm in the world. I think it’s more interesting that evolution is more of a creative algorithm than an optimization algorithm. And as soon as thing, it discovered all of life on earth in a single run and nothing that back prop does, is it all reminiscent of that?
Ken: And so, what you can do with evolution is that you can it is more of a kind of a creative sandbox where you can play with ideas about creativity and also complexity, which is sort of the idea that what if these neural networks are getting bigger and more complex over time, which they did, of course, in real evolution, that’s not addressed by gradient descent either. And not to say that you couldn’t imagine some fancy things that we do to try to augment networks while we’re running gradient descent, that’s conceivable, but it would be also something extending it in evolution is a natural thing to do, to look at increasing complexity. And then to try to understand how you get these kinds of divergent processes that produce all kinds of amazing stuff.
Ken: You get these kinds of divergent processes that produce all kinds of amazing stuff. And now we’re moving outside of the gradient view of the world because often, it’s hard to actually compute a gradient for something like that. It’s not necessarily impossible though, so I think there’s some convergence now between deep learning and neuroevolution, where people are getting interested in gradients of novelty and things like that, that then move it closer to what evolution does. And the distinction is blurring in some ways. So actually the conceptual underpinnings become more important than the terminology. But nevertheless, at first there was a clear distinction, I think, where this kind of creative divergent process was far more easy and natural to explore outside of gradient descent. And that’s sort of what this kind of more evolutionary metaphor offered, and we certainly use it to discover some interesting principles.
Jim: Yeah, very interesting. You mentioned weights. Back in the history of neuroevolution, there was quite a bit of exploration of evolving topologies as well. And in fact, in the work I did, I found that fully connected networks were actually not the best way to do it. At least not with the limited computation capability I had, and it turned out I got much faster learning when I evolved topologies as well as weights. Could you talk about that a little bit? Maybe where’s the current thinking about whether it’s worth evolving topologies in addition to weights?
Ken: Yeah, so that has been a historical significant distinction between neuroevolution and gradient-based neural network optimization, because it’s easy to evolve topologies in their evolution because the problem in gradient based methods, it’s not clear what the gradient of the topology is, but in evolution, you don’t need a gradient, so you don’t even worry about it. So now you can do almost anything because you can just introduce mutations that change topology. And there were huge proliferation of neuroevolution algorithms that would evolve topologies because there’s so many different ways it could be done. It was very creative. It was kind of a wild west situation though, because there were basically no principles, and it was just like all kinds of crazy ideas let’s just see what happens.
Ken: But in any case, it’s appealing that it can be so easily done and experimented with because of the fact that there’s an intuition that I think is pretty strong that topology matters, that structure matters, and if we really genuinely want autonomous algorithm independent of humans to discover intelligence, well, that’s appealing that here’s an algorithm that can allow it to discover both the appropriate topology and the weights simultaneously. And when I talked about complexity, that folds into that, because in some way what we’re really talking about is expanding the brain to become more complex over generations in a way somewhat reminiscent of what we see in nature as brains became more complex as well through our ancestry.
Ken: And so there’s always been some controversy about whether though that intuition is appealing in principle, whether it actually makes sense in practice. And there’ve been times in the course of the field of neuroevolution where some people have argued that it doesn’t actually matter, just give me enough neurons and connections and just evolve the weights and everything will work out. And then other times where there was a strong inclination, or at least even evidence saying that actually topology does matter. In the end I think topology matters. I believe that’s true. Even in the field of deep learning, we see in the modern day that there are clearly some of the most important innovations are innovations in architecture that enable what’s being achieved now with neural networks that are very vast today. And so it’s clear that architecture is important. Architecture is basically just the same thing as topology. And so neuroevolution algorithms were very good candidates for just exploring topologies without a huge amount of friction or intellectual challenge of how to convert a topology into a gradient.
Jim: Yeah, it’s interesting. And I must say I’ve always been a little depressed to the fact that what became deep learning is almost all structured on fully connected networks. And it seems to me like a coward’s way out, just because it’s easy, right? It’s easy, but computationally grossly more expensive than it needs to be because every time you add a connection, you have an associated set of work associated with optimizing that connection. The boys are missing something there God damn it, the boys and girls, I guess there’s some girls involved these days, which is a good thing. As an amateur kind of just watching from the outside it seemed to me like they’re missing something when all they do is fully connected networks for the audience.
Jim: Think of series of rows of nodes, either the nodes at each level could be connected to all the nodes at the next level down or just some of them and there’s no reason that you can’t do one or the other, and there’s actually much more interesting and complicated recurrent architectures which we’ll talk about later in the story. So anyway, with that prelim, why don’t you tell us about your NEAT architecture and what it does and where it fits in this history of neuroevolution?
Ken: Sure. Yeah. So around the year 2000, I was in grad school and I was reading all these papers that were evolving neural network topologies. And I had a whole collection of them. I spent a semester reading them basically in late 1999. And like I said, it was like a wild west. We’re so many. It was a pretty interesting time because anything goes and there were basically no principles. I sort of thought, and I was in my 20s and I was ambitious and I was thinking, “Here’s what I’m going to do. I’m going to take all these papers that have all these crazy ideas for evolving neural network topology and I’m just going to read all of them, take notes. And then I’ll just combine all the best ideas.” And that will be my dissertation for getting a PhD or something like that.
Ken: And that was what I was thinking. And I read all these papers over about four months. And at the end I just was dissatisfied. I was like, “None of this makes sense.” To me I was like, the problem was there were no principles at all. It was all totally ad hoc. And I was like, “There’s got to be something better.” So I came out having a different view than when I came in, where I was like, “There’s got to be something more systematic thing you can do here. Something that just makes sense.” And I kind of sat down and tried to figure this out like is there some way… Basically the main thing that I was trying to think of was if I had two different topologies, is there some principled way to understand what it would mean to combine those two things?
Ken: Because at the time, crossover was a big problem in neuroevolution because if you have two different topology brains and they’re going to have a child, it’s totally unclear how to combine them in. And there’s no actually known way of matching up graphs of arbitrary topology perfectly. And so I was just sitting down with a pencil and paper trying to line up graphs and see, is there some meaningful way to do this? And I had a series of kind of revelations, I guess, in going through that very quickly, where I saw that there are ways to do it that are principled if it’s in an evolutionary context. I guess that’s the inspiration for me. In general, if you just have two arbitrary graphs, there’s no clear principle for how to combine them. But if you know their history, like where they came from and if you keep track of that history, then you know like for example, is this connection from the same history? Is that connection in this other potential parent?
Ken: And so you can line them up. And those came to be known as historical markings or you could mark the genes to see where they came from. And that became a fundamental part of NEAT. And then, so by cracking that problem, I guess you would say addressing the problem, there’s still an issue with arbitrary topologies, but it addresses it sufficiently that you can get around it now. The next thing I realized was, oh, this is really good for speciation if you do this because now not only do I know how things line up, I know how they’re related because I have these historical markings, so I can divide them into species really easily without any special computational overhead or expensive competition overhead. And so that then allows you to preserve diversity, like what evolution does, it’s like when you get a really cool new idea, you don’t want it to just die out immediately even if it’s not great right away and speciation accomplishes that.
Ken: And then because of that preservation of diversity, you can also do increasing complexity because one of the problems with increasing complexity is that it involves risk. Anytime I increased the complexity of a structure, I probably reduce its fitness at first. So I’ll basically throw it out if I’m just doing a standard algorithm, like a genetic algorithm, but if I have speciation, then I can protect it in its own niche where it’s going to be competing with like neural networks instead of with everybody, and so it allows complexification to happen. And when I put those ideas together, that became NEAT. So basically these three ideas, which was the historical markings, the protection of innovation through speciation and then the complexification or increasing complexity. And they really work together well. It’s like a really unified kind of elegant system when you put them all together because they all kind of make each other work or enable each other and ultimately enable complexification, which is what I was really interested in.
Ken: The thing I really, really wanted to see was increasingly complex brains. I was less interested in optimization. I was more interested in seeing something that would become bigger and bigger and more and more complex like what we see in nature and NEAT sort of enabled that at a relatively small scale. So by small scale, we’re talking about adding one connection at a time here, not adding millions at a time or something. So there’s still a disconnect from nature. Nature, we got up to a hundred trillion connections in the human brain. That’s not going to happen through NEAT because NEAT is adding one connection at a time. Imagine if every second a new generation respond to NEAT, a hundred trillion seconds would have to pass and the universe would be over before we had a human brain.
Ken: So there’s still not going to get you to that level of complexity, but still really at least go to the year 2000, it’s really intriguing and new that suddenly we have an algorithm that’s principled for evolving increasingly complex neural networks in species. And so that was NEAT and it came out, it got more publicized around 2002. And so it’s kind of getting old now unfortunately. It’s still fairly popular. Still people are using it.
Jim: Yeah. I did some research on the number of papers that have been published using NEAT as it’s tooling and it’s, I don’t know, thousands. It’s a bunch. It’s certainly the most popular tooling in neuroevolution that I could find at least.
Ken: Yeah, yeah, it did. It really became popular quickly to my surprise. I was a student, so I thought I was doing it for myself actually. I was like, “This is just interesting to me.” I have no idea if a single person in the world ever care other than my advisor, but I started getting emails and lots of people contacting me and people had heard of it that shocked me that I didn’t never even knew these people. And so it started spreading, I guess it got viral, and I think it’s because complexification is just intuitively really compelling is my belief why it caught on. Like the supposed story or the headline story supposedly is because NEAT got the record in the poll balancing benchmark. and so it was like the best for optimizing neural networks in reinforcement learning problems.
Ken: And so that’s sort of what you have to do to get through the publishing bar is you have to beat some benchmark and that it did, and that was also surprising to me because that wasn’t really the motivation that I had. I wasn’t really trying to beat some benchmark, but then when it did beat the benchmark, I was kind of surprised and realized it might be a big deal so this could help get the word out. And there’s reasons that it did ultimately. But really, my thought is that I don’t think people really care about poll balancing benchmarks. I think people care about imagination and thinking about the universe. And the thing about NEAT that I think is appealing is that it creates a metaphor that’s kind of plausible about increasingly complex brains and how they might arise. And that’s something you’d like to play with. If you’ve got a toy that allows you to play with something like that, it sounds really exciting. And so I think people were just attracted to the fact that now there is a toy like that and can play with it.
Jim: Interesting. And there’s a little bit of a sideline. I didn’t really intend to ask this question, but it seems like the natural point to ask it. At that time around 2000, you read all the papers and you came up with NEAT and it worked and it got widely adopted and that one’s in benchmark orders and all those good things, which help it propagate in academic space at least and even to some degree in the practical space. But there’s another approach that was floating around at the time, and actually this was kind of the core interest of John Holland, at least in his later years, which is how do things become emergently modular? How do you pop up layers and build building blocks? I forget the name of the technology of another technology out at UT that kind of run at two levels, one where you were revolving building blocks and the other where you were orchestrating the building blocks. They never seem to go anywhere.
Jim: And I know John Holland worked for the last 20 years of his life figuring out a principled way to produce emergence of building blocks and always failed. Did you think about that approach? Oh, and I should just mention the other field that I dabble in, not as a practitioner, but serve as a follower has been genetic programming and it has the same issue, how do you encapsulate code? Because obviously, code is really a networking collapse into tree structures. So very, very similar kinds of issues. And they put all kinds of hacks in automatically defined functions and things that [inaudible 00:36:14]. So your mind must’ve gone and thought of and rejected the idea of emerging modularity or encapsulation.
Ken: Yeah, certainly it’s just an obvious, really important question and very related to evolutionary algorithms and neuroevolution and brains, of course, which have some degree of modularity, which is obviously, you have different parts of your brain. We have the cortex or the thalamus or all these kinds. They’re kind of divided into modules roughly. And so how does that happen and how can we facilitate it and make it more likely to happen? Or you might also ask, do we need to? Maybe it just emerges and we don’t have to worry about it. And so it’s true that this is a complex topic, and that I do think that explicit attempts to force things into a modular configuration haven’t been extremely compelling. They’re not that convincing and what comes out of them isn’t that exciting. And so I guess I could tend to be more drawn towards organic explanations, like where you’d say that, well as a result of some intrinsic property of a system, that it kind of organically emerges that something’s modular.
Ken: Rather than that I explicitly create a bi-level algorithm where there’s a top level and a bottom level and some hierarchy that enforce and then they all try to come together. I think that kind of thing, it’s too artificial to be as genuinely rich and imprecise. It’s like what you see in nature whereas it’s not like these modules are explicitly divided from each other. Like sometimes the borders are a little fuzzy and that looks more like an organic type of a process, like a bottom up kind of a process. And so I guess I sort of veer towards, are there constraints that explain why things like that happen? Like for example, there’s a cost to having a long distance connection. It’s harder to get a long distance connection for all kinds of reasons from say, one end of your head to another.
Ken: Most connections are short and it’s pretty clear why, because there’s a lot of stuff in the way if you wanted to have a long connection. And so just having a constraint like that actually encourages a degree of modularity because most structure tends to be local and then there’s like a very small amount of more global types of connectivity and that will just lead to something that looks modular. And so by putting in something like a connection length constraint, let’s say which I played with things like that later in my career, like with HyperNEAT, you do start to see things that look modular or emerge naturally.
Ken: Jeff Clunes also played with stuff like that too. He’s published on that. And so that’s sort of where I lean towards that kind of approach. And I think the kind of explicit enforcement hierarchical, rigid hierarchy doesn’t seem as promising to me, but yeah, I don’t want to completely dismiss it either.
Jim: Yeah, it certainly has never worked very well. I mean John Holland just never got there, right? That was his holy grail. Einstein’s attempts to disprove stochasticity in quantum mechanics, right? Mighty smart guys spent a long time thinking about it and never solved it so maybe it’s not solvable. So now let’s move on. We’ve talked about the early history and the basics of neuroevolution, NEAT, which was for quite a while, the standard, but we’re now in a brave new world of neural nets, billions of nodes. In fact, I follow the work in GPT-3 and one of the open source competitors called GPT-Neo, and we’re talking billions of parameters. Obviously something like NEAT or the 2000 vintage. Other forms of neuroevolution ain’t going to cut it in the world of billions of parameters. So maybe you could talk a little bit about this world and where does neuroevolution fit in, in the giga size and eventually bigger than that networks?
Ken: Yeah, the world has changed a lot and the emergence of these billion plus parameter models, it does change the context for thinking about what do we need evolution for? And it basically means that we need to be a little more precise in the motivation for why we would be evolving something, because if you go 20 years ago, it’s just like there’s all kinds of things that we don’t know how to do, and backpropagation itself you could even argue, it just gets dragged and trapped in local optima. We need something different, maybe evolution is it. Now things look a bit different now that we can actually back propagate through multi-billion parameter models and it’s working quite well. And so neuroevolutions role is, I think it’s more nuanced at this point.
Ken: In one sense, it still offers an opportunity for architecture search. And so one thing that it hasn’t changed is that architecture still matters. When we see things like convolutional units now transformers, resonance, these are all structural innovations that were invented by humans, but they are critical and allow the field to advance. And so it’s clear that architecture does matter, and there are probably new innovations that haven’t been discovered yet. And it is a possibility that evolution is one way that we could get to those future innovations. Although so far it’s been humans, the ones that I enumerated are more from humans, but it’s not out of the question that evolution is on the table for that.
Ken: But also that’s not the only place where this kind of evolutionary thinking, I think still has a role. There’s also this idea. I think that some of the most important ideas that come out of neuroevolution in kind of evolutionary algorithm thinking have to do with diversity now, because evolutionary thinking is ultimately about populations. And I think that’s really where the value going forward if you talk about like decades going forward has a lot of staying power is because the problem is that even if you have gradient descent with something that is extremely powerful and large, you still have an issue with all of your eggs in one basket type of thinking where a lot of problems, especially in reinforcement learning, which means where you don’t know what the answers are for specific inputs. You just get sometimes reward or what you might call sparse reward.
Ken: There’s a need to actually have a diversity of experience or to actually make a diversity of attempts to explore in order to understand the world around you so that you can become good at dealing with that world. And this diversity is probably essential to making sense of such a complex world as the one that we have. And it’s the obtaining of experience that is really important in a diverse sense. We need to seek out a diverse set of experience. And this is really closely related to evolutionary thinking where evolution is basically a divergent system, which is about how to actually gather a whole diversity of stepping stones that then you can branch off of in the future where you don’t know which of the stepping stones will be essential, but some probably are.
Ken: And so I think that there’s a convergence there and the words become a little more fuzzy because I can have a population based type of system that’s using gradient descent, and it’s not as clear whether it should be called evolutionary in the strictest sense anymore, but ultimately it’s drawing on the ideas from neuroevolution from more recent years. And I think that that’s going to continue to happen and to proliferate. And so I think that we just need to be a little more, basically a lot more aware of the intersection between what’s happening in deep learning and what’s happening in neuroevolution, but ultimately I think they’re, they’re complimentary to each other. And so it’s both sides really could gain from being more aware, not just in neuroevolution side.
Jim: Very interesting. Yeah, there have been some papers published in the last two or three years where neuroevolution seems to be at least as good and in some cases better than gradient descent, just on setting weights.
Ken: Mm-hmm (affirmative), I didn’t cover that, but that’s true that there’s even some room for that. It’s an uphill battle there if it just battling on the optimization of weights, like it’s an uphill battle for evolution because a gradient descent does have the advantage of a principle behind it. But nevertheless, it’s true that in some cases it’s been shown to be competitive, like there’s these evolution strategies that actually can optimize very large networks that have probably similar magnitudes of weights, at least millions in reinforcement learning problems, and sometimes they do better than the analogous propagation based reinforcement learning algorithms.
Ken: One problem, though is hardware, I think. We’ve really converged on a hardware paradigm that’s decidedly geared towards backpropagation, like the way that we use GPUs and so forth. And evolution is sort of a different paradigm in a way, and so it’s not as clear that it’s as well-served, but if you had the right hardware set up, I think it may become competitive in some strange situation. Like if I had a million parallel CPUs or maybe GPUs, but basically just configured for activation and not optimization, then it might start to be that basically what you’re effectively doing is you’re starting to approximate the gradient through just getting so many samples in a generation that you actually can actually approximate the gradient, but you can do better than approximately the gradient because you understand the entire landscape around you, not just a single direction, which we call the direction towards the optimum.
Ken: And that actually might be useful for doing things like trying to get diversity because we can go out in simultaneous multiple directions that we’ve computed through something like an evolution strategy. And so, yes, I think while it’s getting much less attention, it’s not completely off the table that evolution has some theoretical…
Ken: Cool. The evolution has some theoretical opportunity here to do something interesting still, even in just wait space.
Jim: Yeah, it is interesting. The coevolution hardware, software and approach. I mean, there was this interesting breakthrough where someone realized you could use a simple minded function, [inaudible 00:46:19], as a transfer function in a neural net, and you could implement that very easily on GPU’s. Right, If you tried to do a sigmoid or something, you don’t get anywhere near as much gain as you do with a more simple minded transfer function. That one insight alone, to my mind, is what sucked the world into this, I argue, over concentration on gradient descent based fully connected, deep neural nets, which have indeed produced miraculous results. But, one could also hypothesize a different hardware world, as you’ve sort of allude to, what some people call compute fabric, where there’s a million cores that have kind of hierarchical caches around them. And that just straight CPU type compute becomes very, very cheap.
Jim: And there’s no reason that it can’t be about as cheap as GPU if we had the Silicon to do it, which we know. And yet it’s a chicken and egg problem because we don’t have this. The deep learning guys got the free ride, right? That the GPU was designed for the game. The idea of [inaudible 00:47:19], turned out to be the magic trick, at least initially, I’m sure they’ve got other transfer functions now, but that was one that let them GPU ai, gradient descent, and the rest was this very interesting coevolution of hardware, tools and approach. And should massively parallel compute fabric be developed for probably some other reason than perhaps general purpose neural evolution for all aspects of the networks. Might indeed come back into fashion.
Ken: Yeah. It’s I mean, you’re right. There’s been a coevolution it’s quite interesting. It’s a historical fact that that’s the way things happen to develop. And the tightness of that coevolution is like the algorithms improve and then the hardware improves along with the algorithms. It’s just been, very exclusionary. I mean, if you’re not in that paradigm, you’re in trouble because you can’t take advantage of what’s happening there. And so, but it’s interesting thing about alternate realities, one kind of I think turning point paper in some ways, in reinforcement learning at least, was this paper from deep mind about Atari, where they showed that they could use deep reinforcement learning to solve these Atari games. And I think that it got a lot of people inspired, and it was very impressive to some very influential people like the leaders of Google, for example.
Ken: But one interesting thing in hindsight is that it turned out that we found out much later that actually you can reproduce that level of result. The quality results they got there could have been done through evolution, they were later. So, you can get competitive results through an evolutionary algorithm, but it’s just historically happened that it was first done with the deep learning. And so, but what if it had been published first with an evolutionary algorithm? I mean, what would have been the course of history then? The whole world would have been circling around the evolution strategies. Like, I’m not sure, but it’s interesting to think about that. A lot of this stuff is contingent on what just happened to happen, what order.
Jim: It’s trajectory dependent, as my friend Brian Arthur always likes to point out who, by the way, will be on the show in a couple of weeks. And we’re going to be talking about his book on the nature of technology. Let’s jump back a little bit to an issue that’s always been key in all kinds of evolutionary computing, not just neuro evolution, and that is the maintenance of diversity. I want you to talk about what that is, what it means, and some of the approaches that are being used today. You alluded to some of them, but go into that particular topic, which is absolutely core. I mean, you go to gecko at least back in the day when I used to go in the double lots and there’d be a zillion different approaches, how to preserve diversity in you’re GA ecosystem. So it tells us what diversity is, why it’s important and how you get.
Ken: Yeah. So diversity is a really deep topic. It’s quite profound if you really get into the weeds of it. But on the surface it’s basically just one of the worst things that can happen in evolution is if the diversity drains out of your population. If you think about it if you’ve got a population and everybody starts to be the same, which is easy to happen. If you think about if you’re just breeding, without being very smart about it, eventually, you find something you really like. And then you just let it have all the children and then the children look like it. And then you just keep on selecting things that look similar and you can converge pretty quickly and drain out your diversity. But diversity is the fuel from which you can make new kinds of innovations.
Ken: And we don’t know which particular type of configuration in some diverse set might lead, in the future, to something really important. And so it’s important, to some extent, to maintain a degree of diversity in a population, if you’re in a population-based algorithm, because of that, because if you drain out that diversity, then you really lose the power of the algorithm. And so, in the field, historically a lot of people worried about the maintenance of diversity for this reason, because of basically trying to prevent premature convergence. And in fact we’d sometimes advertise it as why it’s better than something like gradient descent, because gradient descent can get stuck, at least in theory, because it is all your eggs in one basket, because it’s basically one egg. But here we have a population so we can maintain diversity. So, in principle, we don’t get stuck.
Ken: But what’s interesting though, is that that is really not the deepest thing. That’s interesting about diversity. That’s like a more superficial thing that most people were thinking about for a long time. But I think as we went on, what we realized is that there’s something a lot more profound, which is that actually diversification itself is very powerful, even putting aside optimization entirely. If you don’t even know where you’re going, we don’t care, we’re not trying to solve the problem. We’re just trying to push towards diversity and only that. That alone has powerful implications. And that’s what was sort of the basis of the novelty search algorithm that I introduced with Joel Lehman. And it’s just that’s something that we see in nature, is that actually, if you think about nature, nature isn’t trying to solve any particular long-term problem.
Ken: Like there’s not a long-term goal. Like let’s make a human, that wasn’t the goal. You just have a continual diversification or what I would call divergence, it keeps on diverging. And because it’s divergent, it’s inherently divergent, that’s intrinsic to the algorithm. It means that it keeps on inventing more and more amazing things. And so, just pushing for divergence itself is actually very powerful and important for creativity. And so, it’s beyond just like facilitating optimization or preventing premature convergence, it’s about being creative and continually open-endedly inventing forever. I mean, it requires a strong push towards a certain kind of diversity.
Jim: Yeah, very cool. In fact, it’s later in my topic list, but let’s hop to it now, since you introduced it and it’s so close to the idea of diversity. I want you to tell us about your novelty search. This is really interesting stuff.
Ken: Right, so yeah. Novelty search, which has introduced something like seven or eight years after NEAT, was a result of some observations that we made of an experiment we ran called Picbreeder, where we were letting people evolve images or breed images on the internet, basically. It was like a little toy almost. It meant people could go in and see some images and then choose one to be a parent and then it would have children and then they could choose one of those and it could have children, essentially breeding pictures. And we discovered this really weird phenomenon when we did that, which was that we found out that people who discovered really cool pictures or who bred really good pictures, like there was a picture of a skull, for example, or a car, or an alien face, and different things, that those always came from people who were not trying to discover them.
Ken: This is very paradoxical and sounds like something Zen or something like that. We said to achieve your highest goals, you have to be willing to abandon them. So, you can’t do things unless you’re not trying to do them. And this was a principle in Picbreeder, but it revealed kind of, I think, a really deep aspect or explanation for what happened in nature. Why is nature so powerful? Finding things like a human brain, or the flight of birds, or photosynthesis, or things like this, is because it isn’t trying to actually find them. If it was trying to find them, it wouldn’t be able to do that. And that’s because it’s deceptive. The things that lead to intelligence don’t look like intelligence. If you think about what is the predecessor? One of our predecessors that leads to humans, it’s actually a flatworm. And a flatworm was related to the discovery of bilateral symmetry, but you can’t give an IQ test to a flatworm and say we’re making progress towards Einstein.
Ken: That doesn’t make any sense. And so, if you were trying to just maximize IQ, you would just discard the flatworm. And so, it’s actually better to not be worrying about Einstein if you want to get to Einstein, if you’re way back in the age of flatworms, you just should be worried about whether you’re doing something interesting in its own right in its own time. And so, this is an explanation for why people couldn’t find things by looking for them in Picbreeder and when I saw this, because it was just a phenomenon that we observed, I was just completely blown away. It was so profound that when I saw that people don’t find things by looking for them. And I spent weeks trying to process that, I was like what does that actually mean? Because I was thinking I’ve been taught from more an engineering tradition that one of the most important things you do is you set your goals, you set your objectives before you go out and try to do something.
Ken: And that, that’s actually the way that we get progress is that we say, “okay, I’m going to build a car, so what do I need?”. But here, it was exactly the opposite. It’s like the only way to get to the car is you can’t be trying to get to the car and then you might get to a car or something else, but it would be something really, really interesting also. And so, as I tried to process this, I eventually realized that you can make it. It actually makes sense that the best way, in a global sense, like an aggregate sense, to get all kinds of interesting things is not to be trying to get any particular thing. And so, I thought well, We could actually formalize that as an algorithm. And Joel and I talked about this a lot. Like we can make an algorithm that isn’t trying to do anything specific intentionally, but rather is just trying to do new stuff all the time.
Ken: And that was what led to novelty search. And so, novelty search was this idea that let’s make an algorithm that only tries to evolve novelty and is rewarded for novelty. And the theory was that if we do that, then it will actually invent really cool and powerful things. And what was actually really amazing about it is that in some domains it was actually better than trying to solve the problem. So, usually in evolutionary algorithms, what you’re saying is I give you a fitness reward for how well you do on the problem that I’m trying to solve; Novelty search, we don’t care because we don’t know what we’re trying to solve. We just say, if you did something new, you get a reward. But actually even if you do it that way, it actually would get you a better solution than if you were rewarding it for solving the problem better, which is totally counterintuitive. But make sense, if you think about this deceptive property where the things that lead to what you want don’t necessarily look like what you want, which is a property of all complex problems.
Ken: So, it’s a deep lesson about actually solving problems in general, which is, sometimes you have to let go of them in order to eventually make progress. And novelty search is sort of a formal embodiment of that philosophy. Isn’t this key to everything by the way? Because one thing, I just don’t want the audience to think, novel search will solve all our problems. One thing that you have to keep in mind here is that what it’s basically saying is there’s a trade-off, if you want to do amazing things, you have to give up knowing which amazing thing you’re going to do. And so, if you have a specific problem in mind and you say, well I’m just going to run novelty search, it’s going to solve every problem. It’s going to do something interesting, but it might not actually solve the problem you have in mind.
Ken: And so, we’ve traded away that control for something very different, which is more like saying just do interesting stuff in general. But as such, to me now the search is less like a solution to something than a philosophical algorithm. It’s basically making a point, which I think is really helpful to think about the way that the world really works. And then try to grapple with that rather than the kind of denial of how we wish the world works, which is that we just set an objective and move towards it and everything’s going to work out well.
Jim: Yeah. If the audience is interested in this idea the EP130 that we talked about earlier where Ken appeared to talk about his book, Why Greatness Cannot Be Planned, is essentially about this philosophical perspective of novelty. And I got to say, it opened my head. When I first started reading the book, I said this sounded like bullshit to me, but by the time I was done, I said “fuck, I’ve actually learned something important and new”. And I think to make it practical, you got to do something else because otherwise you just wander around all the time, but there is something really new here. This is really an important thought. If you find this idea interesting, go read his book. Let’s get back now to Neuroevolution. One area that, even in my own little work I fooled around with this, which was using evolution to evolve evolution. So evolving the hyperparameters of models. You may tell the audience a little bit what in deep learning might a hyperparameters be and how you might use evolution to search the space of hyperparameters.
Ken: Yeah. I mean, algorithm, machine learning algorithms in general, whether they’re evolutionary algorithms or deep learning algorithms, they always come with a set of parameters around them. And sadly, for the people in the field, the parameters matter a lot for whether something’s going to work. So it’s sometimes thought of like a dirty secret or something like that because it’s like under the hood, somebody has to tweak these things very carefully to get this thing to work. And that’s the part that the paper doesn’t mention. And so, it could be something like the learning rate or something like that. In evolution, it’s things like the mutation strength, the population size, there’s all kinds of parameters in an evolutionary algorithm. There are speciation coefficients that you might need to set. And so, it occurs to people that, Hey, we could actually search for those parameters, why not? Like to know where the meta-levels.
Ken: So it’s like meta, the meta learning level or a meta evolution level. And it’s expensive to do things at a meta-level. But now it’s…I think people do do it. They call it sometimes hyperparameters sweeps or things like that because it’s so important to get the right hyperparameters to see the full potential of what an algorithm can do. And so, evolution is…one on one of the options for that. I mean, cause one, one issue here is that we don’t it’s can be hard to, to actually compute a gradient of the hyperparameters. And so, it’s that evolution doesn’t have to worry about that. It can just evolve the hyperparameters and there are population-based hyperparameter search algorithms that do do that there’ve been used, in practice, in the field. And so, it’s another potential opportunity for evolution. And, by the way, it also points to the fact that when we talk about applications of evolution and things like novelty search, it’s also not only about neural networks. So we talk about hyperparameter optimization, that’s one example, but there are lots of potentially interesting things here that aren’t neural that we can do also with these kinds of techniques.
Jim: Very cool. Well, let’s go move on to the next topic, which is, it seems to be at the cutting edge of some of your thinking and the people around you. And that’s the idea of indirect encoding. In biology, we might call it evo-devo. It’s actually the same thing as it turns out, some of the things we’ve kind of talked about sometimes at the Santa Fe Institute. So talk about what indirect encoding is.
Ken: So like I mentioned a little bit earlier in the show, I was aware from a very early time after NEAT that NEAT was not going to evolve a human brain simply because of the magnitude of the human brain, like a hundred trillion connections. NEAT, basically, adds one connection at best over a generation because there’s a mutation that can add a connection. You’re not going to get to a hundred trillion through that. So then, what exactly allows us to get to this astronomical magnitude of connectivity? And the reason that it’s possible is because the number of genes in your genome is not equal to the number of connections in your head. In other words, there’s a lot fewer genes in the genome than connections in your head.
Jim: And that’s an important point. This is one of the most staggering empirical facts that’s appeared recently, is the number of genes is only 30,000. It’s like, what the fuck? I mean, that’s a really small number. It’s not much more than the number of genes in a fruit fly and yet somehow, so continue.
Ken: So we’re talking about massive compression. And so that’s, by the way, 30,000 is like about the number of genes. And then another way to think about it is that within those 30,000 genes there’s about three billion base pairs. Three billion sounds decent, but we’re talking about structures here with trillions and trillions of parts, just the brain along with a hundred trillion connections, but that’s only the brain, you’ve got your whole body on top of that. And so, three billion is actually quite small compared to that. That’s about the size of, I think I’ve heard an estimate is around the size of Microsoft Windows. Can you believe that?Like that’s actually a description of a human being is about the size of Microsoft windows? So the compression is absolutely incredible and astronomical, and this is what indirect encoding is about.
Ken: You do have a gene for every connection. Like if you add a new connection, you add a new gene that describes that connection to the genome and it gets bigger and bigger. But that is not how it is in nature, you have far fewer genes or alleles, or however you want to characterize it, then you have connections in the brain, let alone cells in your body and so forth; Astronomically, different orders of magnitude. And so, I knew that this is important, eventually, neuroevolution has to be evolving indirect encoding if we want, truly, brain size artifacts to emerge from it. And so, it turns out that there’s a whole set of reasons that such compression is really powerful.
Ken: It’s not just because it’s compressed. Like, that’s good that it’s small. I mean, for efficiency purposes, but there’s other things about it too. Like if you have a compressed representation of something like a brain, what it also means is that, in effect, what you have is sort of a procedure or you can think of almost like a program for how to build it, it’s like a recipe. And so, what it means is that it has to capture regularities because there’s information reused. That’s a really important part of indirect encoding. Like clearly, if you don’t have as many units in the genome, as you have in the actual body, then there must be a reuse of information in the generation of the body, which means, for example, like the instructions that generated your left arm are largely similar to the instructions that generated your right arm. Now this is actually advantageous evolutionarily, because what it means is that if there’s innovations on the left, there’ll be reflected on the right, because it’s the same program mostly.
Ken: And so, in other words, you don’t have to discover a hand twice. You know, you don’t have an ancestor that had only a left hand. And then luckily there was a mutation where they realized it could be good on the right, that’d be crazy and very improbable. So the regularities that are overarching in the structure of bodies and brains can be preserved and elaborated in a way that respects them. So, like the symmetry that we have, if there’s elaborations, it will be elaborated symmetrically. It’s true that you can break the symmetry, but it’s just that the opportunity to exploit the symmetry is there and it’s very efficient. Obviously, we’re not perfectly symmetric, like the heart on the left side and so forth. But the symmetry is there and we can exploit it. And so, that makes it very efficient. So you don’t have to rediscover things like legs and arms or hands because it’s symmetric, the symmetries there are like cortical columns, which are largely structured.
Ken: Similarly there’s many of them in your brain and your cortex. And so indirect encoding can encode them effectively in a very small compact way. And just say, repeat with some, perhaps some different parametrization as it moves across the surface of the brain. And so, what I started to do was to try to think about, well what should the indirect encoding be for artificial neural networks? We’d need it in indirect encoding, eventually. It sort of, I believe even early on that’s what’s going to be necessary. So I took a evo-devo class, actually. I loved evo-devo at that point in time, I was so into evo-devo. And, I mean, I was in the biology department taking evo-devo and it’s just so fascinating, which, basically, evo-devo was the combination of evolution and development together and how they interact.
Ken: And I took a similar approach to that, that I did to the neuroevolution algorithms or the topology evolving algorithms. I was just trying to look at all of the different indirect encoding and also how it works in biology, how does indirect encoding work in biology. And try to figure it out and well through a long story that we can get into if you want, eventually, it led to the HyperNEAT algorithm, which was, basically, the indirectly encoded version of NEAT, which now can evolve very, very large neural net structures.
Jim: Perfect setup! Now let’s tell us about HyperNEAT and also the idea of the compositional pattern producing networks.
Ken: Yeah. So, I think this is an interesting story because I was… it was similar to when I was looking at all of these topology evolving neuroevolution algorithms. When later I was looking at all these indirect encodings that people had invented a few years later. And kind of similarly thought I’ll just look at all of them and get the best ideas, even though that didn’t work for me the last time. But I was thinking that there’s lots of people that thought about directing encoding before because it’s obviously a really inspiring topic. And so, people have been thinking about it and I was reading all these indirect encodings and I was still kind of optimistic even after reading all these papers, that there’s some way to kind of combine all the good ideas. I actually wrote a paper in 2003 it’s called A Taxonomy for Artificial Embryogeny.
Ken: It was a review paper where I tried to categorize all of these ideas in indirect encoding, so to make it… I was really truly doing it for myself to make sense of how all of these ideas interrelate with each other. Because I was hoping to figure out what to do. What’s the best way to do indirect encoding with NEAT. And then, I noticed something totally weird, which was completely unexpected. And it was that like people who had started using NEAT to create these little things called, they were a genetic art programs, basically. And what they were doing is they were saying, let’s have NEAT evolve neural network output images, and then let people breed them. This was before Picbreeder. They were like little standalone apps that you could run in the early two thousands. And I was playing with them.
Ken: I thought they were super fun and interesting. And, but I started to notice something really weird, which is that the images that I was evolving, respected things like symmetry, when they discovered it. So, in other words, it was more like an intuition, but it felt to me like what was happening was sort of intuitively reminiscent of what you saw in nature or what we saw in nature that it would establish regularities, like symmetry on some structure, like a spaceship. And then it would elaborate it, respecting that symmetry as I went forward and continued to evolve the image or breed the image. And I was like, that’s what indirect encoding is supposed to do. It was totally weird because I didn’t think of what I was playing was an indirect encoding. I mean, it was a neural network outputting an image. But I couldn’t help but notice this is exactly what I’ve been aiming for, after all this thinking. This is doing it better than anything that I’ve seen and it’s not even supposed to be doing that. And I suddenly… it made me realize that neural networks actually are really good.
Ken: Amy realized that neural networks actually are really good abstractions of DNA. This is a potentially confusing point because this now is sort of like mixing two different metaphors. A neural network is supposed to be an abstraction of the brain, not an abstraction of DNA. But actually, information and coding mechanisms have a lot in common if you think about it. Both DNA and brains are some way of encoding information, so it’s not necessarily that surprising. But when I realized this, I started to understand what the analogy is, the analogy between a neural network and DNA, that like in this composition of functions, which is what a neural network is… When one neuron leads into another, it’s basically two functions being composed, one function taking another function input and then outputting something. And that inside of this composition of functions, there’s actually a flow of information, which is similar to what happens when a body plan is built off of a strand of DNA.
Ken: And so, recognizing that got me really excited, because I thought, “Well actually, the solution to the problem of indirect encoding is just a neural network. We already have the technology. It’s all been invented.” But I realized that if I said a neural network is an indirect encoding, it will lead to endless confusion and controversy, I thought, for decades, because it’s mixing metaphors and it’s extremely confuse… It’s like your brain is DNA. And I didn’t want to convey that. And I wanted to emphasize that the compositional aspect of it is really the insight that makes this a good metaphor, or good abstraction of what DNA does. And that’s why I introduced the term “a compositional pattern producing network”. So now, I’m not calling it a neural network. It looks a lot like a neural network. It functions like a neural network. There’s some differences, like it has different activation functions, like a symmetric one to capture symmetry.
Ken: But in a lot of ways, it’s like a neural network, but it’s being used instead to like take an input and solve a problem or to output a control. It’s being used to input coordinates in space and output structure, which is basically like phenotypic structure, or the structure of a brain and a structure of a body. And so, that’s where CPPNs came from, or the compositional pattern producing networks, which are abbreviated CPPN. And then, the next big question was how do you make that into a neuro evolution system? So, I’ve got these CPPNs, which can evolve imagery in two dimensions. And I want to evolve neural networks, not just imagery. And this was a little bit of a puzzle for a while. Because I really believed that there’s some way to do this, but I wasn’t sure how to convert an image into a brain.
Ken: It’s not that straightforward. Eventually, we had a breakthrough where we figured it out. And we can go into that if you’re interested, but I won’t go into the details, just for time. And that’s what led to hyper neat. So, we figured out how do you get a CPPN, which is basically outputting a pattern in space, to be interpreted as a brain? How do I get that pattern interpreted as a brain? And then, we can evolve, using the CPPM as sort of the DNA, really, really big stuff, because we can have a small CPPN encode a very, very large brain. And so, it’s like a whole new paradigm where you can try new things now.
Jim: And then… So then, to make sure I get this clear in my head, that you can evolve in the space of the small CPPN, which then produces very large networks. And then, you can test the fitness of those networks, and then do the evolution back in CPPN space.
Ken: Exactly. Yeah. So, you might actually be effectively doing evolution in a low dimensional space, which is great computationally and for search, but the network itself that it represents, it could be gigantic. It could be very high dimensional.
Jim: And of course, that’s exactly evo-devo, which is that we are searching in this relatively small space of 30,000 genes, or a couple of billion base pair, and the result of machines is shitload bigger. Right?
Ken: Exactly. Yeah.
Jim: And now, a key part… Now actually, there’s an amazing conversation on my previous episode with a guy named Dennis Waters and his new book, Behavior and Culture in One Dimension, Sequence, Affordances, and the Evolution of Complexity, which is a book I’d recommend you read by the way. And one of the key points he makes, because he chooses DNA and written language as two systems that have generated huge amounts of complexity… But he points out that, in both cases, they had to co-evolve their own machinery to interpret the small encoding to the genotype/phenotype mapping. And of course, that’s part of evo-devo.
Jim: And in fact, one of the great mysteries of life is, how the hell did we evolve all this very complicated machinery for error correction and near error-free duplication in the DNA/RNA protein world from a noisy world, like an RNA world? What the fuck? Right? So, what about the machinery? Do you evolve the machinery to, or did you just stipulate some form of machinery that goes from the genotype to the phenotype?
Ken: I think the right answer is that it’s stipulated in our case, or at least in the case of the hyper need. I mean, in some way, it’s a privilege that we can stipulate that. I mean, that’s one of the privileges of computation. So, we kind of sidestepping some issues that are like unavoidable in nature. There’s a certain physical constraints in the universe and so forth that make life pretty complicated. And that’s like… At least in… When I think about neuro evolution and machine learning is like… I feel like it’s something…
Ken: It is our prerogative to try to obstruct the way anything that is just sort of unnecessary in the way that it happens to be part of physical realities. It’s totally fine. And if we were actually a neuroscientist or a computational biologist, I think we want to be faithful to the way that the world actually works. But I don’t see myself as that. I see that I’m just an AI researcher. So, my brother actually… It’s actually more interesting to actually strip away these properties of the world if we don’t need them. Now, you could argue though that maybe the machinery is important in some level of computation. Maybe it does something useful. It’s a valid thing to argue. But I guess… At least the appeal of stripping it away is that there’s less machinery to deal with computationally.
Jim: Yeah. That actually makes sense. When we do know, then an awful lot of Ma Nature’s work arounds are just because of the laws of physics, right? We can duplicate a string in a genetic algorithm trivially, but Ma Nature probably took half a billion years to figure that one out. Right?
Ken: Yeah. I mean, this is a fascinating topic actually, the fact that nature has to do these really complicated things because it’s in this kind of physical box that it can’t get out of. If you think about the development of an embryo from an egg, it’s so compelling, I mean, so fascinating to look at like the stages of the embryo development until it’s like a full baby form. And you look at that. And you’re like, “That is an extremely powerful process.” But then, something like a CPPN, it doesn’t even go through things like that. Basically, all it does is it goes directly to the final form. It skips all these developmental steps. And so, what I get from that is the things that sometimes are most compelling in nature are really just artifacts of the fact that this is a very inconvenient, computational substrate.
Ken: And so, it can be very distracting, because you think that is the point, the computational point of the process is embryogenesis. That’s the amazing achievement. But actually, that’s just a workaround for the fact that the whole way things are set up is totally inconvenient. There’s a much deeper thing going on than just that, which is basically this indirect encoding and how information is reused and stuff like that. It doesn’t have to go through a developmental process in order to get that benefit, but it does in nature because there’s no other choice, because this is a physical reality.
Ken: And so, I think that’s a sort of stumbling block for thinking algorithmically is that it’s hard to get your mind off these extremely salient things that you see in the physical world that are just amazing, but actually might not matter as much as they seem to, just because they are so amazing when you look at them.
Jim: Yeah. You might call that the art to being an evolutionary computation dude, which is, what is of the essence and what is an artifact of the substrate that we’re sort of modeling against? Now, one of the most interesting thing… This can be difficult to communicate clearly to our audience. I have confidence that you can do so, something I was really… kind of my eyebrow went up, which is you discussed how, using hyper neat, you were able to replicate some of the common tools in advanced third/fourth generation deep learning, that you were able to literally evolve a convolutional neural network, for instance.
Jim: Now, if you could explain a… I know this is going to be difficult. We’re at the edge of explainability to our audience, what a convolution neural network is, kind of the geometric patterning that Hyper Neat does, and how you combine some relatively simple patterning rules to produce complex patterning, and how you throw all that together with evolution, and you can end up with some stuff that people spent years developing as humans.
Ken: So, convolution is one of the insights that I think has driven the field of deep learning forward in recent years. And one of the reasons is because it really made practical being able to optimize these very large, high dimensional structures that we’ve seen do all these impressive things. And the idea is often, I mean, most attributed to Yann LeCun. That’s the origin of convolution. And so, what is it?
Ken: And convolution is basically an idea about reuse. And you can see [inaudible 01:18:05] immediately when I say that is [inaudible 01:18:07]. It’s very related to indirect encoding, because indirect encoding is also about reuse. And so, convolution is just a particular kind of reuse. And basically, what it’s saying is that there are certain neurons within the structure of a brain, or an artificial neural network in this case, that are responsible for looking at some local area and doing a kind of a local computation of that area. And so, you can think of it as like a little feature detector. It’s going to be looking for, is there a mouth here?
Ken: Maybe it’s part of something that will detect human faces, but it’s the mouth detector. And what’s interesting is you can take this mouth detector concept and you could spread it across a sheet, so that you could check in all kinds of locations, whether there’s a mouth. And that’s basically what the convolution is, basically. It’s like the re instantiation of the idea all across the sheet, or dragging it across the sheet. And so, you’re basically repeating the same concept over and over again, or literally the same structure, like this is a connectivity structure, or a weight structure, which is being repeated over and over throughout the substrate. And because of that, what you can do is you can say, “Let’s have lots of convolutional feature detectors, for lots of different types of features, all in one big network. And then, we can reuse them all over the place for different locations.”
Ken: And that’s like a explicit reuse. I mean, it really got the same connectivity, but just moved around in the location within the network. And so, of course, it looks immediately like indirect encoding could just discover the idea of convolution. So that’s sort of like the kind of interesting underlying point here is we already know about convolution. So, in some way, we don’t really need to rediscover it, because it’s already there. But if you want to just make a point and say, “Well, can indirect… Could indirect encoding have done it? Did we really need LeCun? Eventually, an indirect encoding would have just invented that on its own.”
Ken: That is actually true. So like… Yeah, if you evolve hyper neat networks… In a situation where convolution is advantageous, you do a few preliminaries in order for it to be possible for there to be convolution in the system, then it’s very likely to discover convolution, or something reminiscent, or a lot like convolution. And so, not surprisingly, it did.
Jim: Very cool. And of course, the interesting thing is that there may be lots of other cool techniques that are pattern based that we don’t have to wait for the next Lecun for, if we turn hyper neat loose on these problems.
Ken: Exactly. Yeah, because that’s the intriguing thing about indirect encoding. I mean, it’s a… Lecun deserves a lot of credit for being the first to have this very deep insight that’s very important, but it’s just one insight about regularity. Right? It’s about regularity. It’s a regular structure, a repeating structure. That’s not the only… I mean, certainly not going to be the only insight about regular structure in brains. I mean, there’s going to be others.
Ken: And they may have very different presentation than just like the repetition of a certain thing locally. But all of those can be captured mathematically. In an indirect encoding, you can describe any kind of regular structure whatsoever. You can describe fractal structures. I mean, there’s anything that you could imagine. And so, to the extent that anything like that is actually useful, from a neural processing point of view, at least theoretically, it’s possible that indirect encoding could reveal it.
Jim: Are you, or other people you’re aware of, actually now trying to apply indirect encoding to big real-world problems?
Ken: Let’s see. I am not. I just haven’t… not recently been trying to do that, although I think it’s still really interesting, so that’s sad. I just can’t do everything, but I wish I was still trying to do that because it still remains interesting and on the table. There may be others though that are, yeah. I mean, there are other people working on indirect encoding still. I still see dissertations coming out about it. So, it’s eligible, it’s out there, it’s still… But ,the challenge is basically the hardware issue where a lot of the hardware support that facilitates doing something like researching indirect encoding is not directly aligned with the current paradigm that really supports deep learning.
Ken: It makes it more expensive. And that somewhat undermines the motivation, because what we’re saying is maybe there’s an efficient way to find ideas that we wouldn’t have otherwise found. But now, it becomes inefficient because it’s so expensive, from a hardware perspective, to perform the search. And it’s kind of a shame because, in principle, this could actually lead to some interesting discoveries.
Jim: Is there a modern implementation of hyper neat out there? The ones I found were basically hacked into the old deep code base.
Ken: Oh yeah. There are Python hyper neats available. And actually, I have a page which I maintain, which maybe you could provide the link later if I sent it to you.
Jim: Yeah. Send up an email to me. We’ll put it on the episode page. And people can find that code for people who want to play around with it. And the old… I fooled around some with the C-sharp, neat, hyper neat, and easy enough code, actually. I’d be intrigued to look at the Python one.
Ken: Yeah. I mean, it was a really good platform. Yeah. It was a great platform. I mean, Colin green made the most popular one and we used it. It was super good. But yeah, more recently, people have created these Python based neats and hyper neats. And I have some links to them from a page that I was maintaining.
Jim: We’ll do that. So, let’s ditch one of my favorite topics, which is Baldwinian learning, or the evolution of learning. Well, I’d like to talk briefly about that one, the evolution of learning itself and learning mechanism.
Ken: Well, I think it’s a naturally appealing offshoot of just evolving brains, is that, well, brains obviously can learn. Their dynamic systems, or dynamical systems, they are not static in general. And so, it’s true that, basically, once you… A lot of the time, in neuro evolution, if you evolve a neural network, you find a of weights that serves your purposes and can solve a task. And then, it’s just kind of frozen that way. And then, it just solves that task. And that’s not really a good analogy with nature where, when an organism is born, it’s going to go through a dynamic process over its lifetime where it’s learning. And what’s really cool about that is the mechanism learning itself was evolved. And in that sense, evolution is a meta-learning algorithm, not just a learning algorithm, because it basically created systems that learn, including our own brain.
Ken: And so, if there’s a holy grail for neuro evolution, it can’t leave that out. I mean, that’s like a fundamental part of it. If we’re going to evolve brain-like things, those things are going to have to be learning systems themselves. And so, that means that neuro evolution has to find the learning algorithm in addition to the structure and weights of the network, or… And another way of saying is that it would find the learning mechanism instead of finding the weights, because the learning mechanism would ultimately determine the weights. And the weights would always be fluctuating because the system is always learning.
Ken: But you can do that. Why not? Evolution can encode rules. We call them rules, the rules of learning. And basically, they can be called what was called local learning rules, which means, in a particular connection between two neurons, what decides how that weight should change? And there’s a lot of history of evolving local learning rules and trying to get what we call plastic neural networks to evolve that change over their lifetime. So, that’s a whole other fun area to explore.
Jim: Yeah. It’s an area that current deep learning gradient descent just ain’t very good at. Right? It’s not really appropriate for that architecture.
Ken: Yeah. So, to learn systems that… Yeah, right. To learn systems that learn… Well, they actually are… They’re getting interested in that in recent years. It’s true, historically, they weren’t. But yeah. The whole issue of meta learning is now something that gets a lot of attention. And I did work, when I was at Uber, with Thomas McCone, who published a number of papers that… I would encourage people to search for Thomas McCone… that were very imaginative about using gradient descent to optimize learning roles, which are actually inspired by the neuro evolution work.
Ken: So, I think it’s an example that the cross-fertilization of these two things is really the right way to go. It’s not so great to think of these as like two competing paradigms, like neuro evolution versus deep learning, but they’re very related to each other. There are years of ideas that came out of evolving learning rules that then McCone just transferred over to gradient descent. And that’s elegant, I think.
Jim: Yeah. Do send me those papers. Or I’d like to read them myself and we’ll post them on the episode page. Last topic. One of my pet peeves about deep learning, when we think about comparing artificial to human neural nets, is that most deep learning is mostly feed forward. There’s obviously more exception to that. We’ve always had local recurrence, the old Elma networks, and then LSTM, et cetera, but the recurrence is small. Right?
Jim: And in the brain, we know there’s vast long range feedbacks and feed forwards. And it’s thought that 90% of the neurons in the visual system are actually feedback from considerably higher levels, but some of them all the way up to the top. And yet, today’s neural net architectures tend not to be anything like that. And part of it is no one’s figured out how to do gradient descent on wildly recurrent networks. And yet, evolution doesn’t care. Right? You can put neuro evolution to work on the most tangled craze, long range, recurrent, circular network you want, and it will work just fine.
Jim: Has there been a bottle-necking, or a narrowing of the field of architectural work in neural nets, around the obsession with gradient descent, that neuro evolution might be able to open up much, much wider thinking about long range feedback, feed forward, residences, rhythms. We know the brain is substantially driven by rhythms. There’s a lot of work about phase signaling within the brain. Things like that just aren’t possible in feed forward networking. You really need fully recurrent networks, long range recurrent, in fact, probably something like small world recurrent networks to get at, and that maybe evolution is the obvious tool for exploring that kind of space.
Ken: Yeah. So, I mean, first, I would have to acknowledge that that in recent, very recent years, there’s been huge progress in deep learning on training recurrent networks. So, they’re making progress. But the point still stands that those networks still have a kind of a stereotype structure, and that neuro evolution can really think out of the box in terms of like recurrent structure. And it can offer you something you’ve never seen before, very easily. It is a possibility, I think, that there are exotic recurrent structures that are… just not even being complicated, that can be quite powerful, but they’re very difficult to think of just through human intuition. And so, neuro evolution is, like you say… It’s sort of like unconstrained in this way that it doesn’t… there’s no issue with how the optimization algorithm deals with the recurrent structures. It can do whatever it wants.
Ken: It can play with all kinds of crazy recurrent structure, total spaghetti, if it wants. And it can make it work. And in that sense, there’s an opportunity potentially to discover something. If you think about advances like transformer networks recently, which were used, for example, inside of GPT, that just shows that there’s room for substantial new ideas in structure and architecture, and to understand that there is a need for recurrence, in a sense. I mean, memory is obviously very important. And so, yeah. It’s possible that neuro evolution could offer something there.
Jim: Cool. Man, that’s one of my pet peeves about the… The world’s become so channelized of mostly feed forward, with minor bits of recurrence, when we know the brain ain’t that way. So, I want to thank Ken Stanley a tremendous amount for a very good deep dive into the history of, and the current state of play of, neuro evolution. So Ken, it’s great to have you.
Ken: It was great to be here again. Yeah, always great to be here.
Production services and audio editing by Jared Janes Consulting. Music by Tom Mahler at modernspacemusic.com.