The following is a rough transcript which has not been revised by The Jim Rutt Show or Ben Goertzel. Please check with us before using any quotations from this transcript. Thank you.
Jim: Today’s guest is Ben Goertzel. I believe this is Ben’s fourth appearance on the show. It’s always a very interesting and deep conversation. Ben is one of the world’s leading authorities on artificial general intelligence, which is also known as AGI. Indeed, Ben is the one who coined the phrase artificial general intelligence. He’s also the instigator in chief of the Open Cog project, an AGI open source software project, and Singularity Net, a decentralized network for developing and deploying AI services. Welcome back, Ben.
Ben: Hey, pleasure to be back Jim.
Jim: Yeah, this should be a good conversation. Yeah, I recently saw an essay then published, I don’t know where it was it on your sub stack maybe that is called Three Viable Paths to True AGI. And much of our talk is going to be on that essay, but who knows where the hell we go from there. But that will be our starting point. And I will say it’s a pretty accessible papers. You don’t have to be an AI head to understand it. And as always, it’ll be posted in the show notes on the episode page at jimruo.com. So check it out. So let’s start, as we usually do with these kinds of discussions about what is it that you are actually talking about when you say artificial general intelligence?
Ben: Yeah. AGI is really an imprecise and informal term, which refers to computer systems that can do things that we consider intelligent when people do them, including things that they weren’t specifically programmed for and weren’t specifically trained for. So AI in general is also an imprecise term, which sort of means hardware or software doing stuff that we think is smart when we see people do it. But what we’ve seen in the AI field in the last couple decades is great practical success at narrow AI, the creation of hardware, software that will do highly particular things based on programming or data driven training at those particular tasks.
And people can do that, but people can also do something different. We can take a leap into some domain that’s only loosely connected with anything we’ve done before. You and I both learned to use the internet. I mean without it being wired into our DNA or in our university curriculum, we sort of improvised and screwed around till we till figured it out. And people are somewhat good at that. I mean, we’re not necessarily infinitely good at that. So there’s a whole discipline of mathematical analysis of the concept of general intelligence. And one of the conclusions you come to there is people are not all that generally intelligent compared to the theoretical maximum. We’re bad at running mazes in 977 dimensions. We’re even bad at figuring out what each other are thinking or feeling half the time.
Jim: Yeah, we can’t follow.
Ben: Yeah, our generality is not maximum yet. It’s way better than any current AI software program. Or you look at Alpha Fold as a very impressive example of narrow AI. I mean, Alpha Fold predicts how a variety of proteins will fold based on training data that was fed to it. But it doesn’t do well at floppy proteins like proteins that are sort of a little looser than normal proteins. And they may go into different configurations very sensitively depending on what chemicals they’re floating around in. And to make Alpha Fold deal with these floppy proteins, someone’s got to either manually feed it more training data on floppy proteins or change the algorithm or something. It’s not going to on its own think a little longer and stew a little longer and make that leap of generalizations to proteins that are a little bit too different from the ones in its training data set.
And if you devise a new class of molecule that isn’t a protein per se, but is some other big long molecule, like discover something on an alien planet using a slightly different chemistry, again, you’ve got to retrain the model or retool the software. It’s not going to leap to that alien biology, whereas a human being, human being could make that leap. If a human being was good at figuring out how proteins fold by whatever combination of their brain and tools and software right there. If you give them some alien molecules, they’re going to enjoy that challenge and they’re going to leap and improvise and then figure thing out.
Some people better than others or at least make a stab at it. And that’s the sort of thing we’re after with general intelligence. And none of us in the AI field are there yet. The idea that computers will be able to be generally intelligent is no longer so far out at the mainstream as to make you a pariah for even proposing it’s a feasible thing to do in our lifetimes, but it’s still far enough off in a practical sense that there’s not a consensus about what’s going to be the most viable route to get there.
Jim: And another one you and I have talked about in the past of a simple minded example of AGI is who was it? The apple guy who wasn’t Jobs, it was-
Jim: Wozniak’s test, which was take a robot with an AGI in it, plug it in a random kitchen in America and have it make coffee, right? Or at least attempt to make coffee. Any human could do it of average IQ or above, but there is no robot, no AI that could even start to solve that problem today, I don’t believe, at least none I’ve heard of.
Ben: No, I mean a Roomba can’t even deal with my living room. It’s not even that messy compared to how it was when I was in college or something. And I think telling which practical problems are what we call AGI hard and really need an AGI to solve rather than narrow AI has been a challenge unto itself and educate people have persistently been wrong about this. It’s currently not clear whether self driving a car at the level of an average human is AGI hard or not. I suspect it isn’t. I suspect you can address that using narrow AI, but it’s turning out to be a harder narrow AI problem than Elon Musk and many others in the field had envisioned. And that is because of issues with generalization, right? It’s mostly because weird things happen in the road. Say when you’re making a left turn into an intersection, you got to sort of a bit until just the right moment when there’s not too many cars coming too soon.
The variety of different weird left turn situations that pop up is such that the training datasets that self-driving cars have, don’t yet let them leap to deal with all the weird left turn situations that pop up. So that that’s a case where current narrow AI doesn’t generalize well enough beyond its training data. And it’s not entirely clear, even to real experts, whether you can get that generalization by just beefing up the training data sets and refining the neural architectures a bit or whether you really need a sort of leap to human level general intelligence to get there. And I think that the Touring test is another example there where it’s pretty clear to me now you’re going to be able to make an AI that will pass what Alan Touring called the Touring test in the 1950s, which is to sort trick an average person that you’re a human in a casual conversation.
It’s pretty clear to me we’re going to be able to trick a random human that the AI is human in a casual conversation, I would guess within a few years with transformer neural net based chat technology. Now, whether you can make an AI that would trick me or you into thinking it’s a human in a two hour long conversation, that I think is AGI hard and you probably can’t do without a genuine human level general intelligence. Whereas tricking an average person in a 10 minute conversation I think probably can be done just with chat bots beefed up a little beyond the current level. So yeah, telling what things need general intelligence has been an interesting challenge. And it could even be, I’m not saying I think this is the case, but it could even be there’s no particular task commonly done by humans that really needs a general intelligence.
It could be that for any task, be it Wozniak’s, make coffee in a random house, or true self-driving, or hold intelligent seeming conversation, it could be that for any task, if you take the trouble to cobble together training data and waste enough computing time, it could be you could master that task. Even if that was the case, it wouldn’t obsolete the notion of general intelligence, which really has to do with how do you deal with something that hasn’t been done a million times before. You just don’t have that training data. And the more intriguing cases in that regard are things that people who are outliers have done, what Feynman or an Albert Einstein did, Jimmy Hendrix or Matisse or something. These people created stuff that in significant respects went beyond any details of what has been done before. There’s no way you’re going to make that level of innovation by just looking at surface level patterns in previous accomplishments.
There’s a significant leap into the unknown. Now most people don’t make that kind of leap in such an obviously observable way. Most two year old kids arguably make that kind of leap and just learning to do the stuff that they don’t know, but we know how to do. But yeah, so after the last AGI conference, which was here in Seattle area a few months ago, Joesha Bach, who was also appeared on this podcast, visited me here on the island where I live near Seattle. And we went for some long walks on the beach and discussed life of the universe and everything in various permutations, which is what Joesha tends to do. And one of the questions Joesha asked me was, “Well, do you really not think current deep neural nets can just be beefed up to yield AGI?”
And he was sort of stewing over this in his own mind and wasn’t quite sure, he’s a little more inclined than me to think maybe they can be beefed up [inaudible 00:11:49] AGI, whereas I’m pretty sure they can’t be unless the beefing up is so beefy as to include introducing whole other architectures and ideas. But anyway, that conversation with Joesha walking on the beach here, about an hour and a half of that conversation is what seated the blog post that you mentioned. I thought, well this chat with Joesha went in some interesting directions. Maybe I should write it down before it slips away. And that’s what led to that blog post.
Jim: Yeah, Joesha, he’s been on the show a couple times, some amazingly deep conversations, that reminds me I should invite him back on. We’re probably due for another appearance of him. You start off the essay with quite a bang, you say, “The deep neural nets and other ML algorithms that are absorbing most of the AI world’s attention today are in my own view, fundamentally unsuited for the creation of human level AGI.”
So you’re not quibling here, you’re laying it down, boom. And of course for our audience, most of the things that you hear about in the world that are called AI or machine learning do fall into deep neural nets or one of their close relatives and or machine learning algorithms. Things people here on the show have heard me talk about, the fact that I’m been playing around with Dall-E2, and GPT-3 and jasper.ai and all that sort of stuff. Those are all deep learning type architectures and ML approaches and massive data sets, et cetera. So maybe at the highest level, before we get down into the specifics of some of the alternatives you think might have more promise, what is it about this deep neural net approach that you think is not suitable for reaching human level AGI?
Ben: Yeah, that’s a great question and to position my view on this within the AI field, actually for the people working on artificial general intelligence such as we have the AGI conference, I’m really sort of a moderate on this. So you have a bunch of folks in the AGI research world like Gary Marcus and the folks aren’t familiar with Gary’s writing and thinking on deep learning. He’s a very, very clear writer and he’s given talks on this, you can look it up.
Jim: And he is been on the podcast about his recent book.
Ben: As well. Fantastic. And Douglas Hoffsteader will be another guy with a similar perspective on these things. So there’s some pretty deep and significant AI researchers who basically feel like deep neural nets in their current form are totally in the wrong direction and are a misdirection. And have pretty much nothing to teach us about how to get to AGI. And then of course you have what’s become remarkably rapidly the mainstream, which is the idea that we’re almost there, just a few tweaks on the sorts of neural nets that we’re seeing now rolled out for dealing with vision and language, just a few more tweaks and we’ll be able to deal with full human level cognition. So I’m in a way, somewhat in the middle there, I think deep neural nets in some variation on their current form can serve as a non-trivial, a significant component of an AGI architecture.
And I’ve actually made some progress in that direction since I wrote that blog post, which we can talk about. So I don’t think they’re useless as components of an AGI mind actually. I think they are doing something interesting and AGI relevant. But yeah, they’re missing many key aspects of what you need for human level intelligence. And I mean the way Gary Marcus frames it is, pretty much current deep neural nets are very large lookup tables. They’re just kind of recording what they saw, indexing it in a clever way, and then when they see a new situation, they’re looking up the most relevant things in their history and just using them to supply a response. And that’s just true enough to be dangerous and funny but not quite true. I mean they’re too much like that to be viable as AI architectures. They are recording like every damned detailed little pattern that they saw, but they’re storing them in a quite subtle way that takes into account the overlap of different patterns, and which patterns have been more useful in which contexts and the weightings of different patterns in different contexts.
You could say they’re like very clever lookup tables that are taking into account the relevance weightings of different things in different contexts. And that’s not trivial. But ironically, given the label “deep”, I think what these networks are really doing is looking at what I would call shallow patterns in the data sets that they ingest. So for example, in dealing with natural language, they’re pretty much looking at sequences of words rather than trying to build a model of some conceived world underlying these words. So as one random example, which I think is from one of Gary Marcus’s articles, when a transformational neural net is given a puzzle about how to fit a big table through a small door, the neural net suggests using a table saw. And that’s amusing if you’ve ever done carpentry, I mean a table saw in fact is not a saw for sawing tables in half.I mean you could use it to saw table in half, but a table saw is a big flat surface like a table with a saw blade that sticks up from under it and then you pass something over that flat surface and it’s sawed by the blade. The neural net automatically assumed the table saw is a saw for sawing tables in half and then generated a whole bunch of bullshit based on that. So that’s an example of what I mean by looking at shallow, surface level patterns. It seems to make sense. And if you never had wood shop in middle school, because you’re younger than I am, I mean you may not know what a table saw is if you’ve never had to saw things, right? But it’s not trying to understand what it is either. And the thing is, these GPT neural networks, they have read enough manuals of carpentry in their training data that they could know what a table saw is.
It’s not like they don’t have the basis to make that inference. There were senses in their training data that explained exactly what a table saw is and even told you how to assemble one. So it’s not like they had to improvise because of a lack of knowledge. It’s just the way they’re working is not to build a model of what reality underlies the text.
Now, philosophically you could say, well, building a model is the wrong way to think about it. The whole world is about pattern recognition and building models is just a bullshit concept. But the bottom line is the neural nets can’t answer complex questions of a sort that would appear intuitively to require build building such a model. And so that indicates the sort of limitation that seems to be there. But the thing is, that’s not like a weird nitpicked example. So I mean every machine learning algorithm for every domain doing every sort of task, you could poke at not too hard and you find endless varieties of examples of this nature, which suggests that these systems, they’re leveraging the avail of huge amounts of data and processing power to recognize a huge amount of highly particular patterns and then extrapolate from these to deal with new situations.
And this kind of approach on the face of it is not going to generalize to domains of reality that don’t demonstrate those highly particular patterns. This is what in the AI field you would call a knowledge representation issue. The knowledge about the world is being represented as a large catalog of weighted and interdependent and contextualized particulars. There’s no attempt to abstract. And if you go deep into learning theory, there’s a lot of math that tells you that finding concise abstractions of your experience is basically equivalent to having the raw ability to generalize to other domains that are a bit different than your experience. And that’s the crux of what’s missing really.
Jim: Yeah, exactly. I’ve the strong feeling, but I played with GPT three and especially Jasper, which actually better jasper.ai and Dall-E, you get the sense that this is amazingly clever set of statistical relationships. It’s astounding what it does, but it doesn’t actually have a good grounding. And you know, and I have talked about this and some of the work I’ve done in AI myself, I’m looking for the generalizations in computer war games for instance. And the insight that I had was, you made the statement that these things used vast data sets billions, hundreds of billions now of examples to build them. And I calculated how many war games sessions have I played in my whole life? And the answer I came up with is no more than 5,000, somewhere between 2500 and 5000. So a quite small number.
And yet, I have pulled out a whole bunch of generalizations from these games and it’s probably across 200 titles, maybe 300 titles, something like that. So I sit down and play a new war game. It could be a space war game, it could be about bugs, it could be about anything. And there are certain patterns that I can apply to these games and I can get to play them quite good quite quickly-
Ben: Yeah, totally.
Jim: Because I’m operating at this level of abstraction that is different than having played the game 300 million times the way say Alpha Go might do it.
Ben: No, related analog, I think it was about think the novelist was Jane Austin and she’d written a novel about some young child with a strict religious upbringing and someone asked her once, “Well was your own upbringing like that, how do you know so much about it? You’ve told the story of my childhood here, right?” She’s like, “No, I had no experience like that at all. I just saw a child and his mom talking on the staircase in a building once. And then I understood the essence of what that life must be like.” I mean she also wrote very deep impression, understanding novels about marriage. She was never married. It’s from overhearing conversations here and there and extrapolating and the such extrapolation isn’t necessarily perfect, but it illustrates what little kids do all the time. A young child can see a situation once and then suddenly all this knowledge about dealing with so many comparable situations pours in.
And it’s not that this was programmed into their DNA, now there’s certainly some inductive biasing, a deductive and adductive biasing. There’s some biasing propensity in our brain helping us figure out battle, helping us figure out human relationships and helping us… We’re probably particularly good at filling in knowledge gaps in the particular domains that we’re talking about because we did evolve for love relations, and parent child relations and for battle. But still given that priming, yeah, we’re incredibly good at filling in a bunch of gaps in an improvisatory way based on just a few clues. And the mainstream AI field understands this. And there’s the idea of one shot learning and few shot learning and there’s a bunch of papers on this within the neural net field, but there’s no evidence so far that current architectures are really going to be capable of that in the sort of ways that people, or even a smart dog is for that matter, I mean that dog can do one shot learning quite well in many contexts.
Jim: Yeah, indeed. So many of the wonderful things we hear about in AI, as I alluded to before, are the result of truly massive amounts of data. Either reinforcement learning, or the games that play themselves a hundred million times or a hundred billion character data sets for extracting language patterns. And that’s just not how humans work doing something different.
Ben: And I would be open to AGIs working very differently than people because we’re not that super clever anyway, in many ways. And if an AGI was able to get to generalization ability in a different route than humans have, then more power to it, literally speaking more power to it. I mean, that would be great. But what seems to be happening now is these architectures are bypassing the aspects of human intelligence that let us make creative imaginative leaps. And I think that’s-
Jim: Yeah, it feels like it’s missing a level. It’s missing a level, it feels like.
Ben: Yeah, these systems are just doing a different sort of thing. And another key point is the thing that they’re doing is perhaps a kind of thing that is easier to milk commercial value out of. To be a good employee of a company with a successful, well-defined business model, if you can just repeat some well understood operations in a predictable way to maximize well-defined metrics, that may be the ideal thing to do. I mean, I worked with US military and intelligence for a number of years in an earlier part of my life. They didn’t want AIs to create, imagine and improvise. They want AIs to obey doctrine. Right? And I mean, similarly, if your AI, if its job is to make people click on the webpage, to click on an ad in the webpage so that you’ll buy something, you know what the metrics are, you have a lot of data there. Improvising and imagining…
Ben: … Data there, I mean, improvising and imagining is a little bit beside the point that there’s… So I think that we’ve now seen the AI industry self-organize into a position where AI is being used in ways that extremely well leverage deep neural nets in what they’re good at and wouldn’t even necessarily be able to use a more creative, imaginative AI. And if you look at DALL·E and these graphics programs you’re mentioning, there’s some of that there too, right? These graphics programs are really, really good at re-permuting elements from existing images. Now, they’re not going to innovate like Matisse or Picasso or even Andy Warhol relative to what came before him, right? But on the other hand, the graphic arts industry doesn’t often want a graphic artist to rethink the foundations of art and come up with a whole new genre of visual representation, right? What they want as a graphic artist is to combine recognizable visual tropes that have a known measurable impact on human psychology, combine them in a contextually appropriate way.
So I mean, the economics of modern industry suits itself really, really well to AIs that are good at combining already existing elements to maximize known metrics and what this means is that the industry of AI doesn’t have that much motivation to flail around doing AGI R and D. I mean, in the big picture, yeah. In big picture if we want AI to take over every industry and obsolete all human jobs and make people money in the progress of making money obsolete, I mean, we certainly need AGI, but for incrementally maximizing the current uses of AI, very often just fine-tuning narrow AI is the trick.
Now, self-driving is a counterexample to that, right? So there are large counterexamples to that, but there are more examples than counterexamples right now and the result of that I think is, well, the pursuit of AGI is better accepted now than it used to be. And there’re skunk works labs in large corporations working on AGI. The commercial viability of recognizing shitloads of highly particular data patterns, particularly if you’re a company that has a huge amount of data, I mean, the commercial viability of this means that AGI research is still sort of on the margins of the AI field.
Jim: Yeah. Most people don’t figure it’ll have a three year payback, right? This is going to be more of the much longer term thing, so-
Ben: It’s a funny thing, Jim, is the naysayers about AGI, now they’re saying, “Oh, but that’s 20 or 30 years off.” And I’m like, “Wow, people are finally admitting AGI might be decades off. This is insane progress compared to when I got my PhD in the 80s and people thought either it was impossible or were thousands of years off.” But now the story is, “Well, this is 20 or 30 years off, so we’re not going to invest in it.” Right? I’m like, “What? What?”
Jim: Yeah, that’s like a weird fallacy of frankly financial discount rates, right? It’s amazing how financial discount rates skew our vision on how to apply human effort, but that’s a different story for a different day. So we stipulated that the current DNN approach, deep neural nets, and closely related technologies may not be the right way. And in this essay, you lay out three ways that you believe may have merit. Let’s go into those first. Let’s go into those.
First you described it as a cognitive level approach, which you described as hybrid neural symbolic evolutionary metagraph based AGI inspired by cognitive science and optimized via funky math. I think I know what you’re talking about there. Try to avoid the funky math as you know I’m no math whiz, but the rest of it, dig in a little bit and tell us what you mean with that one. And if you could distinguish it maybe a little from the pure DNN approaches.
Ben: Yeah. So I think this approach is more familiar to you than to most of your listeners, Jim, because you played around with the classic version as we’re now calling it of the OpenCog system some years ago. And we’re now rebuilding the OpenCog system in a totally new version called OpenCog Hyperon, which has been a lot of fun and taking a bunch of my time now. But the basic concept here has been let’s look at a high level at what the human mind is doing, but let’s try to realize this using best of Reed computer science algorithms rather than trying to get down to the biology level. So I don’t think a human-like mind is the only way or even the best way to make a general intelligence, but it’s the way we have the most experience with, the way we understand best. And of course you can make that same argument to just trying to emulate the human brain, which we’ll talk about in the moment and I think is also an interesting approach.
But I think one can also say, “Okay, well, let’s look at flight.” Well, we did observe that birds have these wings and they’re lightweight, which is how they’re flying. And so we emulated birds to a certain level to make airplanes. But then when the engineering became too annoying to make the wings flap up and down, we’re like, “Well, hold on. We can intervene with some science here. Let’s make the wings fixed and let’s make a different propulsion mechanism.” So you can take natural things as inspirations at a little higher level of abstraction and then try to fill in the details using math, engineering, science, and whatever tools are at your disposal, right?
So if you’re taking that approach with respect to the human mind, I mean, this means you’re looking at a human mind and brain, you’re saying, “Okay, we’re doing perception, we’re doing action, we’re doing planning. And there’s some overlap with how we do high level planning and how we do motor planning. We have working memory, which breaks down into several sub working memory stores for sensory memory, linguistic memory, spatial memory, blah blah. We have long-term memory, which seems to work a little differently for knowledge of facts and beliefs versus for knowledge of procedures and how to do stuff. We seem to have some inbuilt knowledge of how to model each other and do social reasoning.” So if you look at all these main sorts of things that humans’ minds know how to do, one approach to AGI is to say, “Well, how can we make each of these sorts of things done effectively by some computer science algorithm?”
So say declarative knowledge, facts and beliefs, how can we store facts and beliefs in a computer? Perceptual pattern recognition, how can we recognize pattern and visual auditory data in a computer? So you figure out a good way to do each of these particular things that the human mind brain seems to know how to do and then figure out how to connect all those together into a coherent combined architecture. And like every approach to AGI, of course it’s easier said than done. And I think that the subtlety you hit there is really that there’s something I’ve called cognitive synergy in the way a human mind carries out these semi-discrete functions, which means that the neural sub-networks carrying out each of these somewhat distinct cognitive functions help each other out at a deep level and they have some transparency into each other’s internal processing.
So it’s not like working short-term memory and long-term memory are totally separate boxes with a very clean interface, right? The nitty-gritty of what’s happening inside working memory as it processes and the nitty-gritty of what’s happening inside long term memory as its processes sometime have a back and forth between each other that’s helpful. So then you’re looking at how do we use computer science algorithms to carry out each of this sort of identifiable semi-separate functions that occurs in a human mind, but in a way that each of these different computer science algorithms serving their own functions can have some transparency into each other’s internal operations and can help each other out a bit, and this becomes complicated, right?
And then the high level approach to that I’ve been taking for many decades now is centered on a sort of large distributed knowledge graph. We can call it a graph, it’s a hypergraph or a metagraph, which are different graph-like mathematical data structures. So you have this big graph of knowledge with different nodes and links connected to each other with different types and different weightings on them. Then you have a bunch of different AI algorithms acting on this common graph of knowledge doing different things like perception, action, short-term memory, long-term memory, reasoning, procedure learning. And I built together with others a series of software systems like this over many decades.
The OpenCog system is built according to this methodology. We’re using it now in my own teams at SingularityNET OpenCog and some other companies and we’re using it to crunch genetics data from flies and humans we’re using it behind Sophia, Grace, Desdemona, other humanoid robots. So it’s an interesting complement right now to neural nets for doing various sorts of things. But we decided to reimplement that system and create a new version, OpenCog Hyperon, which we’re about a year into the reimplementation journey there. And this was partly based on some new funky math that I did with some colleagues, which was trying to figure out basically how to make the learning and reasoning algorithms underlying different cognitive processes acting on the graph, how to make them all sub-cases of the same basic like mathematical meta-algorithms. So that the challenge with this sort of approach has been the heterogeneity and complexity, right?
So then what I’ve been plunging into as a mathematician is how do we take things that seem different, like logical reasoning, visual perception, auditory perception, action learning, language understanding, procedure learning, how do we take these kinds of learning and reasoning cognition that seem different and represent all these different algorithms as special cases of some underlying more abstract math algorithm? Then how do you bake that abstract math algorithm into the plumbing of an AI programming language so that you can implement all these things efficiently? So that’s where I’m at with my own primary AGI research thrust right now. Although since I wrote that article I’ve been digging a bit into the biology simulation side as well, which we can talk about in a moment.
Jim: Yeah, we’ll talk about that one. A question for you about your approach, some might critique it and say, “Ah, this is just a new spin on good old-fashioned AI, as we call it, when we thought we could have computer science people write all the algorithms we need and just in the self-driving car case, turned out there’s way too many peculiarities about the real world for knowledge engineers and experts and programmers to code up the answer.” What’s your response to that objection to your approach?
Ben: I would say a dislike for what’s now called good old-fashioned AI was why I got my PhD in math instead of AI. Because I mean in the 80s when I was going to graduate school, I knew I wanted to build thinking machines as a primary life goal. But if you go to an AI professor in computer science department, all they wanted to do was, yeah, rule-based production systems, what we now call GOFAI, good old-fashioned AI and I was just like, “This has nothing to do with intelligence systems in any form that remotely interests me.” What was funny is at the time I was trying to get someone to supervise a PhD thesis on basically multidimensional time series prediction using recurrent neural networks. I couldn’t find any advisor to supervise that in any department because it was just too weird and freaky and out there, whereas now that would be an undergraduate course project and it’s a very conformist boring thing to be doing.
So I think what was called good old-fashioned AI then really had three aspects if you want to dig into it. So one of them was representing knowledge using logic-like formalisms. So like for all X, there exists Y, so that blah blah blah. One of them was trying to represent knowledge crisply, not acknowledging uncertainty and they were using crisp logic. And the final one was they weren’t doing learning, they were trying to get the knowledge into the AI system by having people explicitly type like, “People are smart, dogs are stupid, grass is green, people eat hamburgers, hamburgers are made of cows.” And that latter bit is what pissed me off and drove me away from the AI field initially. It seems so fucking obvious that typing in all the knowledge you need to be a human is just… How anyone could think that’s was going to work baffled me.
I mean, okay, if the idea is we’re going to type in the first 100,000 things and that gives it the core to learn the rest, there’s something to argue about there, but that’s not even what they were doing, right? They were just like, “We’re going to type it all in and then do reasoning based on it.” And that is now almost universally recognized as ridiculous. I mean, I also didn’t like the ignoring of uncertainty because it was obviously… I mean, probability theory, fuzzy logic had been around a long time and more prevalent in the engineering field. But already then there was probabilistic logic, there was fuzzy logic. There wasn’t a fundamental problem with incorporating probability. It was mostly ignored due to lack of compute resources. So I’d say what we’re doing now in OpenCog, we do have a logic-based knowledge representation, although it’s not exclusive.
We’re using fuzzy, probabilistic, paraconsistent intuitionist logic. We’re using much weirder and more advanced forms of logic than were being used back then, which let the system be uncertain and let it have contradictions in its own mind and so forth. So we are using a logic representation. We’re using uncertainty up the wazoo, right? And we’re totally not dependent on hand coding common sense knowledge, which I think is just rubbish. And that I think there’s sort of a misconception that using logic in your system contradicts using learning in your system or that using logic in your system contradicts doing low level perception or something, right? Because I mean you could make a logical term represent a pixel or the Fourier coefficient of sounds coming into your system and you could do learning based purely on low level sensory and motor data using a logical theorem prover if you want, right? Using pure unsupervised learning.
So I think you got to unbundle good old-fashioned AI. There were horribly stupid things in there. There were good things there, and of course I have to mention that neural net is the most good old-fashioned AI there is, right? I mean, I was teaching recurrent backprop in the 90s and it was already old, like back propagation, neural networks. Backprop was invented in the 50s, neural nets were first played with in the 40s, right?
So I mean all these AI ideas have been around a long time. Neural nets worked like shit till we had enough data and big enough computers. And my bet is that almost every idea from the history of AI is going to be found to actually have some interesting contribution to AGI systems once you’ve deployed it in the right way with the right kind of hardware and the right amount of data and integrated into the right sort of system. So I think in hindsight it’s going to look like neural nets were just the first of the good old-fashioned AI methods to come into its own in the era of copious processing and data. And we’re going to see logic and evolutionary learning and agent based systems and all these other things come out to play a role also.
Jim: Yeah. In your brief description you mentioned evolutionary and as you know that is my personal hobby horse. I continue to complain that there’s not enough evolutionary and neuroevolutionary and evolutionary in general going on in the AI space. And there have been a few papers recently which have been showing some interesting results using especially neural evolution. Where does evolution fit into your approach?
Ben: Yeah, so in two ways, really. I think you can look at evolution as a sort of high level, abstract way to describe a lot of what’s happening in a system like OpenCog, even if it’s not explicitly an evolutionary algorithm. And then there’s also a role for explicitly deployed evolutionary algorithms. So I mean, if you look at OpenCog’s AtomSpace, a large distributed knowledge base, right? I mean, you have economic attention allocation which is spreading around values that indicate the importance level of a node or link in the network and it could be fine grain, like long-term and short-term importance. Then you have processes like say, logical reasoning that preferentially act on the more important things in the AtomSpace, right? And AtomSpace space being this knowledge graph. So if you have logical reasoning picks chunk of knowledge A, chunk of knowledge B does some inference to drive an uncertain conclusion from that, so A and B are combined to drive C. I mean what do we have there?
We have the importance in the AtomSpace is like a fitness value and then we’re selecting the parents knowledge chunk A and B and we’re crossing them over using logical inference to get knowledge chunk C. So then really with attention driven premise selection for uncertain logical reasoning, the uncertainty is doing a kind of mutation thing, right? I mean, then in essence you’re doing evolutionary learning even without a genetic algorithm, right? Where you have an uncertain theorem prover and then activation spreading thing for attention allocation, but you’re doing something that mathematically would be described by population genetics, right?
So that’s one half of the answer is, I just gave one way, but there’s many other ways that evolution is just there implicitly in having a distributed system with a whole bunch of different things going on in parallel and then selecting the best results and throwing them back into the soup, right? And then in some cases I think there’s a role for explicit genetic algorithms also. So we’ve been using them for procedure learning, for learning little program codelets that do things and also for creativity, like for evolving new logical predicates that combine existing ones that can then be validated or fleshed out by inference.
But I’d say even if the explicit use of genetic algorithms goes away because other algorithms are found to do better than it for the particular things we’re using it for, there’s still, at bottom, there’s a fundamental evolutionary aspect to the whole sort of system that we’re building because if you want to get really philosophical, I would say evolution and autopoiesis, which is like self-organizing, self reconstruction, these are sort of the two high level metadynamics underlying every complex system worth its salt, right?
I mean you have autopoiesis which is a system maintaining itself and rebuilding itself and you have evolution, which is basically the process of combining what’s there and varying what’s there to make what’s not there. So this is like being and becoming in Hegel. And if you go into postmodern philosophy, like my friend Weaver’s open-ended intelligence, this comes down to what he called individuation and self-transcendence as two sort of core drives underlying any open-ended dynamical system. So I’d say evolution is there, how explicitly you want it to be there is a detailed algorithm question that is unfolding.
Jim: Okay, cool. So that’s Ben’s current work approach and there’s other folks obviously working with him and working on similar things. Let’s go on to your next of three, which is the brain level approach, large scale, non-linear dynamical brain simulation.
Ben: Yeah, so this-
Jim: By the way, some people think this is what deep neural nets are, but they’re not, right? Maybe you can start with that.
Ben: They’re not, and there’s interesting efforts to make them a little more like that, which we can talk about, right? So I think, yeah… I’ve done computational neuroscience at a few points in my career. I mean, before I left academia for industry in the late 90s, I spent a few years in the psychology department University of Western Australia, working with PET and fMRI scans and trying to really model the nonlinear dynamics in the human brain. And a little later than that I worked with magnetoencephalography data which lets you measure time series, what happens in the brain. And I mean, it’s not so much that I thought that would be the only or even necessarily the best way to make AGI, it’s just pretty interesting to know what’s happening inside our heads. I mean, just like in the airplane case, even if you’re not going to cover your airplane with feathers, certainly understanding how a hummingbird flies is pretty interesting and probably there’s some lessons there for building flying machines, right?
So each time I plunged deep into the computation neuroscience world, I came to the conclusion that we just don’t have the measuring instruments yet to really tell the stuff I really want to know. We’ve studied a lot about visual and auditory cortex. We haven’t really studied much about how abstractions are formed in the brain, which is the question that interested me most. It’s clear there’s some weird feedback between hippocampus, which has localized concept neurons, and the distributed representations in cortex. There’s some feedback loops there that have to do with the hippocampus having drive the formation of chaotic attractors in cortex and we just don’t have the measuring instrument to sort of get time series of neural activity across large swaths of cortex to be able to reverse engineer what’s going on, which sucks, right?
I often thought maybe I should drop other research I’m doing and just go into biophysics and figure out a better way to do brain imaging because that seems like you need a revolution in brain imaging or else just saw off the top half of my head and put 100,000 electrodes in my head and turn myself into the ultimate brain imaging imaging guinea pig, right? So I mean, really there’s a data issue there, which means we don’t understand what’s going on.
And I mean, neurons are one kind of cell, there’s glia, there’s astrocytes, there’s cellular diffusion, there’s charged diffusion rather, through the extra cellular matrix. There may even be some weird wet quantum biology going on in the brain, although I don’t think in ways that makes us super intelligent quantum supercomputers or something. But there may be some interesting quantum biology we don’t know helping to bind things together. So I just think there’s a lot that’s not understood there. But nevertheless, there’s no one really trying to make large scale brain simulations do smart things even based on all the knowledge that we have about the brain, even given how incomplete it is, and that’s a bit weird, right?
I mean, Markram with the Blue Brain Project was sort of aiming in that direction and that sort of turned into a weird EU level shitshow and didn’t really go in the direction that Markram wanted to. It led to some interesting research being funded on modeling different aspects of the brain, but I was more interested actually in what Gerald Edelman and Eugene [inaudible 00:55:03] were doing a decade or two ago where they were looking at chaos theory based models of neurons and then looking at simulating large bunch of the brain using a more sophisticated chaos theory based neuron model which is much, much more biologically realistic than these non-deep neural networks, then looking at what high level patterns are emerging in these neural networks.
And there was interesting stuff coming out there in terms of how disparate neurons in a network are bound together in a holistic reaction to a stimulus by sub-threshold sort of leakage of charge between each other even when the neurons are not firing. And there’s just not really any work on this that I know of going on with an AGI orientation. And computational neuroscientists, by and large are trying to model one little…
Ben: Neuroscientists by in large are trying to model one little subsystem of the brain rather than make a holistic brain model because that’s big, expensive and hard. And that’s one of many crazy examples of the human species not deploying resources in what seems like an incredibly obvious low-hanging fruit direction of research. Why don’t we have what Markram was trying to do but done right? Why don’t we have Manhattan projects that integrated holistic brain modeling using the totality of knowledge about computational neuroscience? I mean we all are brains, right? I mean even if it’s not the golden path to AGI, why the hell we aren’t doing that? So it’s weird.
So since I wrote that article and since we last talked, I’ve been chatting a bunch and hopefully starting some joint research with a guy named Alex Ororbia out of RIT who would be a great podcast guest for you actually. He has developed what seems to me like the only really credible back propagation killer in the neural net field. He has a new way of doing learning in deep neural networks that seems to work better than back propagation. He has a couple papers in nature and other journals on this and I’ve been talking to him about how to extend and upgrade his approach by making the neurons more and more realistic. So he has what we called integrate and fire neurons in his simulation, which-
Jim: I was going to say yeah, that’s what most of we have today is simple, integrate and fire. And there was a breakthrough with a very simple transfer function ReLU, which was actually part of the breakthrough that allowed deep neural networks to take off.
Ben: But you could take his networks and you could plug an Izhikevich neuron under the Hodgkin Huxley equation in there and his form of learning will still work. Whereas back propagation, which is a standard learning mechanism using neural nets today is just not viable to do with the chaos neuron model. You could take his improved and more realistic and localized learning method and make it work with upgrade and more realistic neurons and with GLIA stuck into the network and so forth. So I’m actually interested-
Jim: Sounds interesting.
Ben: It’s pretty cool. I’m actually interested in pursuing that as part of the OpenCog initiative because I do have some respect for how good neural nets are being at vision and audition and even at linguistic bloviation, right? So I see no reason not to experiment with taking deep neural nets, upgrading them and then hybridizing them and making them work together with a dynamic knowledge graph approach like OpenCog. And in that way I’m a little more of a neural net booster than someone like Gary Marcus or say Pay Wang. So what Alex and I have been talking about is whether by replacing back propagation with this predictive coding based learning mechanism, can we do a better job of learning deep neural nets that automatically learn structured semantic representations?
Because if you can do that then you’d be learning neural nets that can more cleanly interface with something like an OpenCog system that uses among other things a logic based representation, because there’s been many attempts to make deep neural nets that automatically learns semantics. Infogame was one type of neural network along these lines that many people played with years ago. This is sort of petered off because the back propagation runs don’t converge. Like the learning algorithm just doesn’t converge to any answer when you run it on current deep neural net architectures. It might be the reason that doesn’t converge is just because back propagation sucks for architectures of that complexity. It’s not like the brain uses back propagation as a learning algorithm.
So I think there’s interesting research like that which in a way can cut across my three different approaches. So you can take current neural net architectures, you can try to upgrade the learning algorithm with stuff like Alex Ororbia’s Predictive Coding learning to make it more biologically realistic. And by making the neural net more biologically realistic, you may end up making it better able to slot into a role as a module in integrative architecture like OpenCog. And I do think that’s one interesting direction to take.
Jim: Now one, I’m going to look into that, email me his name and link to him.
Ben: Will do.
Jim: That sounds like something I’d like to learn more about. And now another problem with highly detailed non-linear dynamics of brain simulations have been that they’re a terrible fit for non Neumann architecture computing, essentially serial computing. Because what makes the brain, and I think still probably barely the most powerful computer on the planet is that it’s fully parallel, it’s inherently parallel. Well all of our computers are basically serial and indeed the thing that allowed the current crop of relatively simple minded deep neural networks to take off, what’s getting some degree of parallelization through hijacking GPE processors, basically game cards and that architecture, but that doesn’t really work very well for the funkier and more-
Jim: … complicated mathematics and timing related stuff, full brain simulation. And so what you’d need to really take off there is something closer to inherently parallel hardware. And I remember there was a guy who was going to do that, someone named Hugo deGaris who’s going to use FPGAs to evolve parallel hardware to attempt to get around this problem. I never heard of anything much ever coming of that project. So what do you think about that? The issue that, the real barrier and why maybe it’s not worth spending a trillion dollars quite yet, is that we don’t have the appropriate parallel hardware infrastructure for attacking the full nonlinear dynamic brain simulation opportunity.
Ben: So Hugo was actually for a while was a quite close personal friend of mine and he introduced me to my wife actually Rae Tsang, when she was his PhD student at Shaman University in China. And I’ve known him since ’93 or so, and we haven’t chatted that much in recent years since I moved back from Hong Kong to US. But I mean what Hugo was doing was actually doing genetic algorithms to evolve neural networks on an FPGA. And I was involved with that work with him to some extent. That was kind of obsoleted by GPUs, which are just so good at doing matrix multiplication. But I think more broadly speaking, what we’re going to see in the next five years, maybe three to five years, we’re going to see a bunch of specialized chip architectures come out for different AI algorithms.
Just like you’ve seen a bunch of deep neural net chips come out that are now like GPUs specialized for AI rather than specialized for rendering graphics. And I’ve been involved with that myself for a different reason than the one you just mentioned. But I mean think certainly you could make a chip that’s optimized for Izhikevich neurons and I don’t know that anyone is doing that. The neuromorphic chips people are working on are mostly specialized for spiking neural networks. But you could certainly make a chip that’s specialized for chaotic neurons Izhikevich neurons or various variations of that equation then it wouldn’t be an incredibly hard thing to do given modern hardware manufacturer infrastructure. Who’s going to do it is an interesting question.
But the way I’ve gotten into this space recently is I’ve been working with some folks from the hardware space on designing a chip automating OpenCog pattern matching at the chip level. And this is a MIMDI parallel processor in RAM architecture for graph and hypergraph pattern matching basically. So it’s sort of a 2D lattice of RAM units each with embedded processors where you would take a graph and sort of display it out among the different processors in the 2D lettuce and then do your pattern matching in parallel. And it would be applicable in cases where you had a chunk of your knowledge graph that wasn’t going to change that much for a period of time, but you need to do a bunch of pattern-matching against you.
If the graph is changing constantly because it’s a subject of highly intensive real time inference, then moving stuff around on the chip among the different cells becomes expensive. And I got into this because my friend Rachel St. Clair was designing a chip for hypervector math, for the math of very high dimensional bit vectors because that suited her own AGI work trying to implement Andrew Coward’s recommendation model of the brain, which is a more biologically realistic brain model. So for this specific biologically realistic AGI approach Rachel was taking, we figured out an efficient way to implement Andrew coward’s brain model using a mathematical structure called hydrovectors.
So she set about designing a hypervector chip, then when I saw she’d assembled some brilliant hardware people, I’m like, “Well let’s leverage the fact that you’re starting an AGI chip company to finally make an OpenCog pattern matching chip that I’ve been thinking about a while.” So you could see the cost of making weird new chips and developing them has come down enough that you could see it’s viable to make an AGI board that has a deep learning chip, an Izhikevich neuron chip, a hypervectored chip, an OpenCog pattern matching chip and processor interconnect that fast among all of these. So I think again, historically, the role of GPUs and accelerating neural net AI may just look like the first in a series of special chip architectures that accelerated this or that AI algorithm.
And this can all happen within the next three to five years I would say. There’s a bunch of people working on this now. Probably the worst thing slowing it down is it collaborating with China has become more annoying due to bad politics and China’s bad COVID policies because it’s just so fast to take a chip idea from simulation to FBGA to manufacturing in Shenzhen and Dun Guan and now working with China is getting more complicated and people are looking at, “What’re you doing in Taiwan or South Korea?” or maybe even eventually the US, right? So yeah, I think it’s increasingly viable to do this sort of stuff in hardware now that’s close enough it makes sense to be prototyping this stuff in software with reasonable faith that in years rather than decades the hardware acceleration will come.But I mean yeah, if you’re really talking about making a simulated brain with a hundred billion Izhikevich neurons and GLIA all operating in parallel, I mean to do that at a reasonable cost, you probably are waiting for the upgraded hardware, but that’s okay because there’s at least several years of research to do before you get to the point that scaling up is your main problem.
Jim: Yeah, and of course we’ll have to find a demand pull problem that’s worth spending the hundreds of millions of dollars to spin various approaches, custom hardware that can then pull the researchers into it. Because that was the interesting thing about GPU, right? It was a found technology, this was an adjacent possible. The GPU already existed and then once Hinton and Friends figured out how to do back prop on it at scale, every graduate student started fooling with it, and very quickly there was a compound of learning. And that’s what I’m-
Ben: I think every big tech company will have many, many uses for pattern matching on large stable graphs. I mean social networks are that, the NSA’s database is that, Google knowledge graph which they have internally is that right? So they’re now using graph neural nets and so forth on that, but that’s not necessarily the best tool. So I think, and you have say Graphcore is already making graph on a chip, but they’re focused on floating point operations on graphs, not discrete operations on graphs. So I think once you have a graph pattern matching chip with nice software APIs, I think big tech and big government, for better and worse are going to find many, many, many uses for that in terms of biologically realistic simulations of neural nets.
So I have a hypothesis that if we take Alex Ororbia’s predictive coding based learning and put it in a network with Izhikevich neurons that have sub-threshold spreading of activation, not only spiking, I have a hypothesis this will lead to neural nets with better generalization than current neural models because I think the coherence that you get from synchronization by a sub-threshold spreading of charge, I think that will let you make more compact neural networks modeling the same data. So if I’m right in that hypothesis and Alex and I succeed with that research in the next couple years, and then you’d have a case where biologically realistic neural nets did better generalization than current, less biologically realistic back prop based neural nets.
In that sort of case, then you would have the use case that you were looking for, it would be basically for every commercial application where there’s not quite enough data to feed a current deep neural net, but there is enough to feed one of these predictive coding based neural networks. So I’m fairly optimistic that’s going to come but yeah, it’s not coming as fast as it could because industry is focusing more on, well we’ve got this amazing hammer, what can we bang on with this hammer, right, which is deep neural nets. It is an amazing hammer in a certain way, you can do a lot of really cool things with it. And as a user I fall into that too.
So I’ve been playing a bunch with neural nets for music. And so we’ve got a rock band, we’ve got a robot singing in the rock band, and it’s really fun to train neural models to sing. It’s fun to train neural models to come up with weird lyrics. And if I put my musician hat on, it’s more fun to do that than to bang and bang on getting the AI to learn better. So right now AI for music, we can compose really cool riffs, 10 or 15 seconds long. So as a musician one thing you can do, you can just loop that into a rhythm track and jam against it, or you can put your head down and try to innovate new ways to make it compose whole songs that aren’t just 15 seconds long and aren’t repetitive and boring.
On the other hand, there’s a lot of models to get as a musician by taking an AI composed 15 seconds snippet and looping it. So I can see if, and the same in biology research, there’s so much to discover about longevity from using OpenCog as it is now to interpret results of genomic data analysis. So I can see why researchers will tend to use these amazing tools and milk them as much as you can get because that’s easy and it’s fun and you’re getting real stuff. Whereas taking a step back to figure out how to make the generalization work better is more difficult and uncertain sort of work that doesn’t give you incremental goodies step by step.
Jim: Interesting. That’s always the issue. What is the socioeconomic political landscape, fitness landscape look like?
Ben: And it’s the psychological landscape I think, I mean not… We’re both crusty old people so I can go there a bit. But I mean I think these kids today don’t have any attention span. So I think people who got into the AI field in recent years, they’re addicted to running and learning algorithm of a data set and getting a cool result right away. And if you’re doing something that doesn’t give a cool result right away, just shift to something else that does give a cool result right away. And I think the result of this lack of attention span is nobody has the patience to do something like AGI research that may require you to not get feedback for a few days or weeks or even years. And that’s it.
Jim: It is true. If you can’t do it in Carass in two hours, it ain’t worth doing, right? There’s some of that running around. All right, let’s get on to our last one. We’ve been digging into these things and this one is what you call a chemistry level approach, a massively distributed AI optimized artificial chemistry simulation. Now I will confess, when I read your essay back, I don’t know, a couple of months ago when it was published, that’s what really triggered me to reach out to you again-
Jim: … because I have actually had… I’ve actually had that same idea multiple times over the years and I’ve even programmed up little toy thingies, but I’ve never had the time to really see how far I can push it. It just seems like an interesting approach, it just seems intuitively like an interesting approach. So why don’t you lay out the approach from the beginning?
Ben: Yeah. I probably got into that idea about the same way you did. So artificial life was a thing decades ago, we were both into it. The idea there was you’re not just doing a genetic algorithm that simulates mutation and crossover and artificial genomes, you’re making a whole artificial organism that buzzes around in some simulated world. The artificial organism is trying to eat, drink and be merry, reproduce, whatever, clobber other organisms, and then the artificial organisms that do well in the world are reproducing and surviving. When you get deeply into that, you start to see, well actually how to really build these organisms, how to really build their artificial genomes, their artificial metabolisms is quite tricky.
And you start to think, “Well actually the human genome is a pretty subtle thing.” The protium is a pretty subtle thing. We don’t actually just have some sequential genome that decodes in a trivial way into a computer program. The DNA is triggering all this RNA and all this methylation, all this epigenomics, it’s causing proteins to catalyze reactions with other proteins to build structures that depend on the chemicals that the embryo is floating in. So you come up with a conclusion that evolution is a bit impaired when it’s separated from self organization. And the complex self organizing dynamics within a biological organism and in the biological organism’s interaction with its environment, I mean in the end, for better or worse, it comes down to biochemistry.
So when you play with artificial life a bit, as you’re trying to make your A-Life models more and more fine grained, you end up starting to do artificial biochemistry. And then that becomes very, very interesting. I mean, back in the ’90s and ’80s even people were experimenting with this, I mean work by Walter Fontana under the name Algorithmic Chemistry caught my eye way back in the day, probably the early ’90s or something this was, where he wasn’t at that point trying to do detailed chemical simulation, although others have done that, he was trying to abstract the spirit of chemistry and say, “Let’s take little code lists, little programs, let them act on other programs to produce new programs in complex chains of reactions, and then see what emerges out of this sort of algorithmic chemical soup.”And that, yeah that’s a very beautiful sort of, intuitively appealing idea, which did not at that time lead to super amazing practical results, but nor did it totally flame out, right?
Jim: Yeah. And the other part of the idea, again, it was part of my motivation was I realized that we have everything we have in terms of life and cognition, information processing, et cetera, with the arbitrary chemistry that the earth happened to have, and Old Ma Mature has spent 3.7 billion years stumbling its way through various experiments to the barely intelligent, barely generally intelligent humans first across the line, probably approximately the stupidest possible general intelligence, and so it worked with the clues. What happened if you kind of did a dual evolution on the chemistry or something more like the genes and the development machinery-
Jim: … with what’s called in biology, EvoDevo, and perhaps you could find a much more expressive representation. And I got this idea from evolutionary computing where, how you represent a problem often has a big impact on how well your evolutionary tools work. So I said, what would happen if you evolved the underlying chemistry such that the gene expression machinery, shall we say, produce the phenotype that would produce a more intelligent result? So that was my kind of deep motivation. Ma Nature did it the hard way, maybe we can do it the easy way because we can control the underlying representation.
Ben: Yeah. I mean I sort of vacillate on a number of points related to this. I mean, I think on the one hand, the algorithmic chemistry sort of approach is very appealing and part of my mind is just like, “Well, the key thing isn’t the particulars or the periodic table of elements in this particular branch of the multiverse, the key thing is you have this self-organizing system of processes catalyzing other processes, combining with other processes to produce other processes, and then evolution happens on this sort of field of underlying self self-organization rather than on static strings and so forth.” And on the other hand, the part of me that loves physics was just wondering, “Well what is there that’s weird and special in proteins and periodic table and so forth?”
Protein folding is pretty funky the way the van der Waals force works and maybe some weird quantum stuff in helping the folding of the protein, and exactly how are the different forces on different time scales working in these molecules? And we still don’t really understand that. So part of me was gravitating toward, let’s make a real chemistry simulator and just do that. And my friend Bruce Damer had a project called EvoGrid where he’s making grid computing to run an actual computational chemistry simulator on a big distributed compute fabric. And he’s interested in solving the origin of life problem that way. So, I mean that’s-
Jim: Oh, by the way, Bruce Damer will be on the podcast. We recorded his episode last week and it’ll be published in two days.
Ben: Oh, cool, cool. Yeah. Yeah, I love Bruce. Yeah, his amazing estate-
Jim: We talked about it.
Ben: … out near Santa Cruz as well. He’s in the computer museum and-
Jim: Yeah. He showed me lots of videos of it and such. It looks pretty cool. So, yeah. Anyway, so one approach is to actually simulate chemistry/biochemistry and the other is to abstract, I guess the approach I was working towards was to abstract that tremendously in the same way neural nets have been abstracted tremendously in the DNN world. And is there some other way of thinking about simulated chemistry that might be a reasonable representation to find our way to intelligence?
Ben: Yeah, I mean both approaches are interesting. And there’s a parallel of course to approach the AGI more broadly where I mean, simulating the brain is interesting in itself, and obviously has promise. And then taking a cognitive approach and thinking about can you emulate the essential characteristics of what the brain is doing without trying to simulate all the mockery, right? I mean, on the algorithmic chemistry level, you also have both those approaches. I mean, understanding how real chemistry works is really, really interesting. And there’s certainly.
Ben: … works is really, really interesting. And certainly there’s a lot to be learned there. On the other hand, yeah, there’s an appeal to trying to abstract the conceptual essence of that in a way that might be more easy to play with, easier to understand, take less compute resources and maybe not hit the same restrictions that you hit with the peculiarities of real chemistry. And I mean both of those are interesting, both approaches. All these things should be getting way more attention by the human species than is currently the case, right?
I mean, one trap I try not to fall into is being too either or about research approaches that are all terrifyingly underfunded, right? When you get to the point of deploying something on the scale of the whole US or the whole world, then you really sometimes got to make a hard choice. But at the level of which research approach, clearly humanity should be exploring all these research approaches with much more of its resources than it is now as opposed to spending resources like blowing up innocent people and making more and more versions of junk food to poison people and so forth, right?
Jim: Yeah, 97 varieties of barbecue sauce. I mean, what the hell, right? 200 versions of shampoo. What the hell?
Ben: Well, yeah, I think these are all interesting. I think that what has pushed me away from the chemistry simulation approach, from an AGI view, although I think it’s very interesting from a Bruce Damer origin of life view, because then you’re trying to solve an intellectual puzzle rather than to build a system. What has driven me away from it is just the likely compute resources, right? Because I mean, if the trouble to simulate a brain already may need a bunch of custom hardware to simulate the whole prebiotic soup of the early earth and how that chemistry gave rise to life, which then gave rise to multicellular life, this would be an amazing simulation to run. But how much compute resources does that take is going to be quite a lot, so that’s really what has maybe gravitate away from that approach.
Whereas if you take the more abstracted approach where you’re making an algorithmic chemistry that doesn’t appear to be a real chemistry simulation, I mean, then we don’t know what the computer resources are. So as an optimist, at least you can hope they won’t be that much. I’ve gone fairly far down the road of trying to figure out how to sort of modernize what Fontana did with algorithmic chemistry using more modern programming language. He was using snippets of Lisp code. But I mean, you can do something like that using a yet more abstract and more modern programming language than Lisp that incorporates sort of dependent types and gradual types introduced in run time and so forth.
We have a new programming language we’ve built for OpenCog Hyperon called MeTTa, M-E-T-T-A, Meta Type Talk. And I mean, this is probably the most abstract programming language ever created. I’m hoping some university will adopt it for first year programming so they can eliminate 99% of their students. I mean, and of course you can build other languages on top of it that are more easy to approach and use for different purposes, which we’re in the midst of doing.
But I think choosing the right representation language is one important thing for this sort of work. But thinking about doing algorithmic chemistry type stuff in OpenCog Hyperon immediately led me as I recounted in the blog post that prompted this discussion, immediately led me to think about less pure approaches. Because if you think about running a chemical soup, be it a realistic chemical soup or a highly abstracted algorithmic chemical soup, if you think about doing that right now with a bunch of computers, then it’s hard not to think about, well, let’s not be so brute force about it. Let’s use some machine learning to study the evolving chemical soup and figure out which little cells in our chemical soup are evolving most promisingly.
Let’s say you had 10,000 little vats of chemicals brewing and each one is trying to brew some intelligence. Why not use a machine learning algorithm to study those 10,000 vats? Find the hundred vats that are being most promising, then kill the least promising vats and fill them up with stuff gotten by mutation crossover or probabilistic analysis and instance generation on the things happening in those 100 most popular vats.
I’ve been talking to Bruce a long time about doing this in his computational chemistry simulation. So I’d run OpenCog’s MOSES algorithm to help generate new prebiotic chemical soups in his Evo Grid system. But then of course if you’re doing a simulated chemical soup of little MeTTa programs in OpenCog Hyperon and atom spaces, well, I mean we have a pattern miner in OpenCog Hyperon, the whole thing is already instrumented for recognizing patterns and what works well and what doesn’t, and for generating new little snippets of MeTTa code from distribution.
So then that leads you into a weird sort of hybrid architecture where you’re saying, “I’m going to do an algorithmic chemistry soup. I’m going to do directed chemical evolution on this algorithmic chemistry soup using, well, what use, pattern mining use, probabilistic reasoning for abductive reasoning on what kind of algorithmic chemistry soup seem to look better.”
Then you could look at what you have in two ways, either you’ve got an algorithmic chemistry soup, which is being accelerated in evolution by doing some machine learning and even some proto AGI to study what’s happening in the soup. Or you have an integrative hybrid AGI algorithm where one of the lobes of your hybrid AGI is using guided algorithmic chemistry for sort of creative building link HUNK generation, and which gets the emphasis which sort of depend on what’s working better for what reasons, which is hard to prefigure at that moment.
This excites me greatly. This will be incredibly fun to work on. If we get into SingularityNET as a decentralized AI platform, it’ll be fun to see, or NuNet, which is SingularityNET spinoff that decentralizes processing power. It’ll be fun to have millions of people instead of protein folding at home, which is obsolete by now pretty much. It’d be fun to have millions of people running a little virtual algorithmic chemistry soup of evolving OpenCog Hyperon codelets on their machine and then the progress of their soup is analyzed by some OpenCog running in SingularityNET cloud. And then periodically the soup is refreshed by some clever codelets. This has a beautifully decentralized aspect to it, which is kind of cool. I mean in common with other artificial life and artificial chemistry approaches.
Jim: Yeah, very much liked in your essay, adding that extra piece, which is to essentially have an AI observer system mining looking for useful dynamics. One could even think of coming up with some abstract measures of interestingness, I know you’ve thought about that many times, what is interesting, right? And bootstrapping it from that perspective, that’s an actually very interesting useful addition.
However, I’d also comment that just like the full brain simulation, chemistry is an inherently parallel process. It’s where you can get some of these amazing results using DNA computing on little trick problems that happen to be solvable and parallel by matching on DNA. And if you’re going to actually take an algorithmic approach that is anything at all analogous to physical chemistry, you still run into the problem that our computing isn’t very parallel yet. And architectures that could give you millions or billions of parallelisms, more sophisticated parallelisms than simple matrix math. Maybe what we need to really be able to work this problem in a serious way.
Ben: It might be. I mean, I think adding in the pattern mining and the sort of AI direction of the chemical evolution as I’ve suggested works around that to an extent. Because I mean, that leverages the fact that you are on the modern computer network where everything’s happening in RAM and can be pattern mind over which is an advantage that a physical chemical soup doesn’t have. But the quantitative advantage you get from that and how it compares to the quantitative disadvantage you have by not having that massively parallel substrate is not completely obvious at this moment. And we don’t have a good way to theoretically estimate it, right? I mean, to an extent we just have to try.
I mean, that’s another shame of human technological history that massively parallel more lifelike computing infrastructures just haven’t been resourced. I mean even mend the parallel architectures building on silicon chips have not been significantly resourced, right? Let alone weird sort of massively parallel chemical computing infrastructures. You may have run across [inaudible 01:34:59]. Do you know that guy?
Jim: No, do not.
Ben: He popped into my radar as a collaborator of Stuart Hameroff who works with Roger Penrose on-
Jim: Yes, on quantum psychology, right.
Ben: Yeah. I’ve gotten to know some of the guys working with James Tagg on trying to make quantum gravity computing work, which is going kind of interestingly, even though I don’t ultimately think it will validate Penrose’s ideas. Anyway, [inaudible 01:35:32] is an Indian guy based in Japan. He’s worked with Stuart and Roger Penrose, but he also has his own research. But anyway, he’s working on nano computing infrastructures that are doing massively parallel computing at the nano scale. And Japanese government is funding some work on this.
It’s like you’re doing a sort of continuous variable cellular automaton lattice at the nano scale and various molecular brews, and he’s trying to get that to do computing. I mean, there’s some things in his work I don’t fully agree with, and I don’t know whether he’s going to get there or not, I’m not a nano technologist, it’s not full of my area of expertise. But it’s a shame there’s not more creative exploration in that direction, right? Because clearly there’s a lot that could be done there.
And I mean, DNA computing is nice, but I mean that’s not really what DNA was evolved for. And you can build other nano structures that are far better architected for computing, which is what [inaudible 01:36:46] is trying to do. And that of course gives whole other avenues for algorithmic chemistry, right? Because then you could look at what sort of algorithmic codelet is most effectively natively implemented in [inaudible 01:37:06] now in molecular compute substrate, which may be different than what we would do in our [inaudible 01:37:15] computers. But also different than what biochemistry led to. So yeah, there’s a lot of-
Jim: Yeah, that’s very interesting.
Ben: … many, many cool directions to go in, and certainly more than any one researcher could chase down. I mean, I do try to keep an open mind about all the different avenues out there, because I think there’s going to be many, many paths to AGI. On the other hand, I also think it may only be the first path to AGI that’s pursued by humans, and all the rest will be pursued by the AGI itself, maybe much better at AGI research than people are.
And the first approach we discuss today still has my bet on the most likely path to get there first. But as we’ve seen in this discussion, by the nature of that sort of hybrid approach, you’re able to ingest some ideas that are inspired by other approaches. So within an OpenCog Hyperon framework we could certainly use biologically realistic neural nets for visual perception and robot movement control, or even episodic memory say if they turn out to be really good at that, they don’t have to be good at everything. And we could certainly use algorithmic chemistry for coming up with weird new creative ideas if it turns out to be good at that. And some of the ideas that pop up in these other paradigms can be glammed into your hybrid AGI system. And certainly not all could, sometimes it would be too much of a stretch.
Jim: Well, a very interesting conversation. The last topic is one that was at the very end of your essay, which is your sense that we somehow society needs to open up its portfolio bets here and put more money on some of these less than mainstream popular deep neural net approaches. Any thoughts about that? I mean, how much? How?
Ben: Yeah. I mean, how much?
Jim: “As much as possible,” he says.
Ben: Yeah. I mean, well, there’s a lot of need for resources on the planet. And I mean, I would say the best estimates I’ve seen indicate we could eradicate world hunger for maybe 100 to 300 billion dollars, which seems like a lot. But Obama synthesized four trillion dollars or more magically in response to the 2008 financial crisis. I mean, trillions of dollars can come up when really needed, but the world can’t come up with a few hundred billion dollars to improve infrastructure in the developing world to abolish hunger.
I mean there’s a lot of needed things on the planet that we don’t seem able to pull together money for. I mean, certainly funding on the level of a few hundred billion dollars could massively accelerate our R&D in artificial general intelligence. I mean, if you look at say, even though compute time is expensive and paying AI experts is expensive and so forth. I mean, suppose you wanted to fund a number of projects at $10,000,000 each for our R&D. So then you’re funding a hundred of those projects for a billion dollars.
I mean, then for a hundred million dollars, you’re funding 10,000 of those projects. I mean funding 10,000 cool projects at $10,000,000 spent over five years or something. I mean, that on the one hand is a very large chunk of money. On the other hand, if you believe there’s decent odds that humanity is at the critical point where we’re almost there to create super AGI. I mean, the US government could magically synthesize 200 billion dollars to solve world hunger, 200 billion dollars to vaccinate the developing world, 200 billion dollars for AGI, 200 billion dollars for longevity research. And the bottom line is this will be absorbed into the vast corrupt chaos of the global financial system without leading to mass chaos in the streets and terrorist attacks and horrible world disruption. I mean-
Jim: Yeah. The numbers you just gave me, I just added them up real quick and it comes to one year of the American defense budget.
Ben: Yeah. Sure.
Jim: On a scale of things. Say spread over 10 years. If we spent a 10th of the defense budget to achieve all those things, wouldn’t that be money well spent? Certainly it seems so to me.
Ben: Yeah, that’s right. And it doesn’t have to be just the US either. And you could look at other things besides defense like subsidies to industries that don’t need it. But looking at waste would be a much larger and longer conversation. I don’t even say that much resources are needed, so I’m just posing that as for thought, right? Because I mean right now there’s not even 10,000 really credible distinct AGI projects to drop money into on the planet.
On the other hand, if that sort of funding were available to AGI, you would start seeing a lot more people start to think about spending their research career on that, because it’s just more appealing. For people like me, I worked on it when there was no money, I would work on it again if there was no money. I would pay for the chance to work on it. But I mean, if you’re not such a zealot, then of course you’re going to have some drive to work on something that will allow you to pay your mortgage and put your kids through school or something. Or not have to live in your parents’ basement.
I mean, the advent of the best and brightest minds into the AGI field would be quite rapid I think if you had more funding on this. Now, some people think there shouldn’t be because they think AGI is going to destroy the world and that would be a whole other topic.
Jim: We’re summoning the demon, that’s what Elon Musk will say.
Ben: Yeah. I mean my view is the fucking demon is already here and just look in the mirror. And I mean, if we don’t get some beneficial super AGI to help us abolish scarcity and do a bunch of other things, to my mind there’s fairly high odds the human race reeks mass destruction on itself with methods that don’t need any AGI within the next few decades. But this is a whole other topic.
Ben: Well, I think we’re going to wrap it up here.
Ben: Go ahead.
Jim: Well, let’s-
Ben: Final thoughts.
Jim: Final thoughts, yeah. So you asked how much money and then you asked how. And so I think that we only got halfway through how, and I don’t want to take up too much more time. I’ve got to go also, so I think one way how will be for government to step up and do research funding in a less crappy way. And that certainly is possible. I mean like the US NIH while conservative in some ways has transformed many fields of biology and medicine halfway decent dissemination of research funding. Even though I have my many quibbles of what the NIH has done.
I mean, they’ve done a decent job and they’ve transformed genomics and many other fields of medicine. So governments are not universally incompetent in research funding by any means. DARPA has had its moments of brilliance also. I mean, that is a thing that could happen. We could get a US president, for example, who understood science and technology and wanted to make that a major feature. I still wish I convinced you to run for President against Trump actually but I failed. That would’ve been an amazing presidential debate to see.
Ben: That would’ve been funny.
Jim: That’s what politics is about now, it’s a species of entertainment. Funny is what gets votes, so yeah.
Ben: I think the other route apart from government waking up or become a little less idiotic, I mean the other route is a culture shift. I mean, we’ve seen open source software development has developed a lot of pretty amazing things, and citizen science now is at least a thing. I mean, I think there’s enough people in the developed world now who are working less than full time out of choice and are not starving and do have disposable time to put into things with massive intellectual interest and massive potential to do good for the planet.
There’s the possibility of a cultural shift similar to the cultural shift that made open source code a real thing. There’s a possibility of a cultural shift that leads to this same amount of AGI R&D happening without government funding it. And this would have to do with more and more people recognizing that AGI is a viable thing to happen in our lifetimes. And more and more people recognizing that big government and big tech companies aren’t going to do it because they’re now focusing on that other few particular approaches.
And we might see both of these things happen at the same time. What we might see is someone, could be my own group, it could be someone else’s, comes up with a first palpable breakthrough that makes everyone look at and say, “Oh shit, AGI really is coming.” And then once that happens, you may see an upsurge of government funding and an upsurge of grassroots attention to trying all sorts of different things.
I think these are viable ways it could happen. I mean, whether big tech is going to turn around and start putting huge funding into AGI, I’m more doubtful on just because I think AGI bites nature is going to be imaginative, creative, and unpredictable and not play according to the rules, which is what intelligence tends to do. And big tech will always see greater probability of near term financial return from AI that’s more controllable in nature would be my guess.
So yeah, that’s my thinking on the how question anyway. And that of course the culture change connects with game B and a whole bunch of other stuff that you’ve been thinking about. If we’re really lucky, we could see broader positive cultural changes and a grassroots upsurge of R&D on the AGI for the greater good could be part of this broader cultural change. It seems like a rosy-eyed, optimistic possibility, but it’s not impossible.
Jim: Well, that’s some hopeful thoughts there. Well, come back and check in 30 years, let’s see how she went, right? Ben, I want to thank you as always for an amazingly interesting tour of the horizon of some possible ways that AGI might happen. It’s been great.
Ben: Thanks a lot. This was good fun as always.