Transcript of EP 201 – Tobias Dengel on the Age of Voice Technology

The following is a rough transcript which has not been revised by The Jim Rutt Show or Tobias Dengel. Please check with us before using any quotations from this transcript. Thank you.

Jim: Today’s guest is Tobias Dengel. Tobias is president of Willow Tree, a global leader in digital product design and development. And he and his company, or at least his headquarters, is located over in Charlottesville, Virginia, which is in my neck of the woods. In fact, he’s about an hour and 45 minute drive directly west. So welcome Tobias to the Jim Rutt Show.

Tobias: Thanks, Jim. Thanks for having me.

Jim: Yeah, it looks like we’re going to have an interesting conversation today. We’re going to base our conversation around a new book that Tobias has just published. In fact, if you hear this podcast, the book is published because our plan is to release it on October 16th with his, the pub date. The book is called The Sound of the Future, the coming age of voice technology. This should be fun.

Tobias: Yeah, looking forward to it. I mean, we believe that this is the biggest tech shift since mobile came on a decade and more ago, and really the third one since the internet when you think about digital waves.

Jim: Interesting now, I would push back on that one a little bit because I’ve been saying that the current big, big one are large language models and related AI. In fact, I’ve been on the record as saying, I believe that they are bigger than the internet and bigger than smartphones that they’re on par with the invention of the PC in the late 70s because smartphone ain’t nothing but a tiny little PC. And we were networking PCs as early as 1980. I went to work for a company called The Source, which was the first consumer online service in 1980.

And so the internet, just bigger version of that, smartphones are tiny little PC with networkings. But LLMs are fundamental, a whole new way of exploring. And we’ll talk about that because obviously, LLMs are very important in the epoch of voice. And so, yeah, and as I told you beforehand, I was going to start off with a little bit of negative about voice, which of course, we can talk about. And in your book, you do acknowledge some of these, which is back in the days when I was CTO at Thompson Reuters back then, Thompson, I used to spend a day a year at the Watson Research Center, which was IBM’s number one R&D lab in upstate New York, up the Hudson from New York City. And my host was always John Patrick, chief scientist of Watson, and he’d show us all the new cool stuff.

And in those days, middle 90s, IBM had spent a billion dollars on their voice stuff. And he’d show me things, I’d look at him and go, no, John, have no use for that. No, John, have no use for that. And what often critique the problem I see this using it in the business environment is what I came to call public babble. You know, who the hell is going to sit there at their office and be talking to their computer? And particularly as at right at that point, we were moving towards many more open offices. And so, you know, here are 20 people working in their offices all babbling out loud. What the hell, you know?

On the other hand, when I was CEO of Network Solutions, I’d established a Skunkworks to work on various projects of the future. So this would have been 2000 approximately. And it was right around the time that Hidden Mark Off models were really providing a big up regulation and voice recognition. And we started a Skunkworks project to build a voice registry for domain names. If you hooked our little widget into your browser, you could speak a domain name and be routed to it. And this was particularly aimed at WAP, which was kind of a half-assed mobile internet during that time. So we had a little team working on it. Frankly, I don’t know if they ever built it or not. We sold the company and I left fairly soon thereafter, but I was able to do a kind of deep dive there. And then, you know, in terms of some of the negatives, back to the negative side, just last week, I tried to do something.

In fact, I ended up doing it, but it was mighty painful with the two largest pharmacy chains, where I want to do something that I’m sure needs to be done all the time, but is fairly low probability, which was to move a prescription from one pharmacy chain to the other. Talk to my wife. You know, I am known to my fans as Salty Jim, and I was saltier than usual that day as I was cussing these voice menu systems that were clearly designed to optimize a small number of trajectories through them, but were just totally broken when it came to do anything unexpected. And even my old tricks of, you know, pound zero zero zero zero zero, that didn’t get a person, or even my later method of, give me a live person, mother fuckers. That didn’t work. And I, you know, tried at various occasions, tested that. Generally requires the mother fuckers for it to work. But even that didn’t work.

So I was a rather unhappy camper after spending 45 minutes doing something that would have taken no more than a minute and a half if I actually got a hold of a person, right? Then I also say that like probably most of my listeners here, we’ve tried out things like Alexa or the Google, what was that Google thing he called? I remember what it was called.

Tobias: The Google Assistant or the Google Home that.

Jim: Yeah, Google Home, I think is what it was. Yeah, we tried them. They’re annoying. You know, they were sort of okay for some things, but, you know, not for a lot. They were just more trouble than they were worth. But I did run to one good application, my mother, who was actually quite a good computer person. She got her first computer in 1982 and was online by 1984. But as she kind of sunk into dementia, her ability to manage her computers first diminished and then became untenable. Then even her apps became, she could barely operate her apps. Then she couldn’t do it at all. But she could use her Alexa to call up her beloved country music, you know, all the way to the end.

So she could say, Hey, play Vince Gill, and it would. So I was thankful for that applications. So anyway, that’s some of initial negatives. Oh yeah, then the other one, this is when my old pissed off at things was, you don’t see it so much anymore. But it used to be, you’d see these people walking around town talking to their phone. At first, I just thought they were crazy people talking to themselves. You know, you see it in grocery stores on the street. You don’t see it as much as you used to, but you still do see it occasionally. With those kind of, you know, mix of some positives, but a lot of negatives. That’s sort of my perception of where voice tech is today. Tell me some ways that that’s not quite the whole story.

Tobias: So Jim, I am excited for this conversation. It’s always a great sign of a podcast when the F bomb comes in the first five minutes. So let’s get rolling here. So a couple of things I would say. First of all, this problem of publicly speaking into devices is probably going to get worse, not better. And so we’re going to have to get comfortable with that. It will change the norms of some things like how an office is set up. Now a lot of us are knowledge workers are working from home.

And so that problem is a little bit different maybe than it was 20 years ago. But I’m glad you mentioned WAP, right? Because I think we are in a very similar space to where WAP was. When you look at the internet and then mobile, those things in retrospect, they seem like they happened overnight, but they really took a long time. I mean, I started playing with the internet when I was at school in 92, 93. And then it really got popularized five, six years later, mainly through AOL, but other approaches as well. And what happened was it got better and better. It got more and more user friendly and finally got to breakthrough. And early days of AOL was called, you know, people call it the worldwide wait, but we still used it because we understood that it was solving a major problem.

Same with mobile, right? We were messing around with WAP for five, six years or 10 years. It kind of was like Alexa was today. Like it’s kind of got some neat stuff to it, but it isn’t life changing. And then bam, the iPhone happened, smartphones and the world changed. And I think we are at a moment like that with voice. And you said, I think it’s a very fair challenge to say, how do LLMs compare to voice? And why would you say that voice is a bigger tech? They’re deeply related, right? And I actually think what’s going to happen is the way most of us are going to experience the benefits of LLMs are through a voice interface, at least us speaking to machines. And that’s what I think is so important because when that starts to happen, it will change the way companies where customer service is done, the way we interact with technology and we can spend a lot of time on the implications of that. But those things are deeply intertwined and they’re really just manifestations in some sense of the same change that’s happening.

Jim: Yeah, that’s very interesting. We’ll certainly dig into the intersection of the two. But let’s continue now on a more upbeat note, which in your book you have early in the book, a story about a student, University of North Carolina named Anastasia Sol, who was stricken by a medical condition and somebody at your company really helped her out with an innovative product. Why don’t you tell us about that story?

Tobias: Yeah, so she was the girlfriend actually of one of our employees and she was an extraordinary athlete. Still again, is an extraordinary athlete. She had just recently run the Boston Marathon and she was on a a trip suddenly stricken with Guillain-Barr syndrome or GBS and was completely incapacitated. She could only move her eyes. At the hospital, they were using literally cardboard printouts of letters and she would use her eyes to indicate, yes, no, as a nurse would point to different letters, which is a very, very cumbersome way to spell out words when you’re lying in a bed. At that time, the iPhone had just developed eye tracking.

Our developer said, wait a second, we can see where people are looking and we can basically see on a screen on an iPad as an example what letter someone is looking at, which is a much faster way to do this. That was breakthrough number one. That was about four or five years ago. Breakthrough number two is happening right now. We’re about to release it where it now interprets the language, what someone is saying around you as a caregiver. For example, Jim, if you were in a room with someone who’s incapacitated and you ask them, hey, I’m going out to grab some food. Do you want anything? You have a doctor’s appointment tomorrow. Who’s it with? It’s going to transcribe that listening to the voice tech and then propose an answer for them that they can then look at without having to spell things out. It takes these communication cycles that are just brutal for people that are incapacitated and gets them almost to the same speed that you and I are having a conversation.

Jim: Yeah, that could be a real help. I can see that’s a tremendous innovation for people in those difficult situations. Now, you have one of your own critiques of how the industry has unfolded so far, which you said the entire paradigm of voice technology is backwards right now. And you talk about the problem of describing things like Alexa and Siri as smart speakers. And you contrast that with the idea of a smart mic. Why don’t you tell us about that?

Tobias: Yeah, so when you go back 2014, 15, the Siri and Alexa kind of took the world by storm very, very quickly. And our company is always being asked by clients, what’s the next big tech thing? And the world, as you know, Jim, is full of false positive. People are always telling you there’s some new tech and mostly it doesn’t pan out. If you were at the Consumer Electronics Show in Vegas in 2013, you were being told that by 2015, half the households in the United States would have 3D TVs. I don’t know if a single person that has a 3D TV. Then three years later, we were all told we would be wearing AR goggles, Google Glass.

Three years after that, 2018, 19, we’re all told by 2023, half the cars in the United States would be autonomous vehicles. There’s still basically none. So there’s all these false positives. And you got to think, why do some of these things happen and some of them don’t?

You and I have talked about crossing the chasm as being one of both of our favorite books. And that laid it out 25, 30 years ago. There’s always a lunatic fringe that’s going to adopt a new technology because it’s new and cool. But how does it leap to be generally applicable?

And it’s usually because it’s solving a problem for human beings. Why do we like voice? Ultimately, it’s because we can speak three times as fast as we can type.

If you’re on a mobile device with your thumbs, it might be five or 10 times as fast. So it unleashes this information transfer velocity between human beings and machines. The problem with how it’s been implemented is that we always use paradigms that we’re used to. So it was implemented as a voice to voice paradigm, whereas we do not want to listen to machines. It’s super slow. It’s actually way slower than reading. It’s way slower than looking at graphics. So this idea of a voice assistant doesn’t solve anything because we can speak so fast, but then we get bogged down in the listening. And that’s why the whole thing is backwards. Right? What we really want to do is tell a device to do something. So I’ll give you an example. You don’t want to listen to Alexa tell you what movies are playing tonight in showtimes. Jim, you and I will remember that existed 30 years ago. It was called Movie Phone. We don’t need Movie Phone again. What we want is to ask our Regal Cinema’s app what’s playing tonight.

See it. And then in two seconds, say, all right, get me two tickets for Star Wars at 8 p.m. already pre authenticated, etc. Now we’ve solved a problem because that takes two or three minutes to do today on an app. And now we can do it in 10 or 15 seconds. And by the way, exactly the same thing. You should be able to pull up your, I don’t know which chain you were saying, but let’s say it was CVS. You should be able to pull up your CVS app and say, change by delivery address, right? And the app should react to that real time. And that’s, I think, the real breakthrough that people are now getting their heads around. This is a multimodal breakthrough. My point then that you start out with is we call these things smart speakers, but that’s backwards because we actually don’t want to listen to anything the computer is telling us. They should be called smart mics because it should be a command and control system for human beings to tell machines what to do.

Jim: Yeah. And I thought that was a very nice insight by you and your folks, this multimodal idea and the fact that voice out is seldom, but not always the wrong answer. I could imagine it being useful. Let’s say I’m doing repair on a car, I’m an auto mechanic, right? You know, so, all right, the Framus is the least looks like it needs to be disconnected from the what’s a Mo Hammer, what tool do I need?

And if it would just speak it, that could be handy. But I’m with you dirty little secret. I don’t listen to too many podcasts, but I do read their transcripts. One of the reasons I pay big dollars to have an almost perfect transcript done to the Jim Rut Show, which people can find at our episode page at Actually, I forgot to mention speaking of, the link to Tobias’s book will also be there as always. So be sure you check that out if you find our conversation interesting. Anyway, back to multiple modality. I do think that that could make a big difference. Though it’s amazing, you know, with voice tech having been around for more than 30 years, that they’re still not doing it, right? You know, why doesn’t an Alexa have a screen, for instance?

Tobias: Yeah, I think the promise of Alexa, right, in theory is that you can speak to it wherever you are, etc. So they didn’t want to tie it. Now, there’s a version of the echo that has the screen, but it just doesn’t have a lot of uptake because now you got to stand in front of it. Our thesis is you got a screen. It’s in your pocket all day long. Most of us sleep with it. It’s called your smartphone. And so that’s going to be the default screen. And then there are going to be speakers either on the smartphone or all around us that we use to command the system what to do. And, you know, in terms of what you just said about the transcript of the JimRutShow, we do this every day. A lot of us are speaking our text messages or emails, but we never listen to text messages emails, we read them. And so we’re already just on our own as human beings, because we’re always drawn to what’s easier and faster. We’ve already figured out how to use these systems multimodally, even though they weren’t necessarily designed that way.

Jim: Now, one of the things I have dabbled with is using voice technology for dictation and for something longer than a text message, which it’s fine for. But if I’m doing, let’s say an essay, yes, you can speak three times faster than you can type, which you say is one of the key cornerstone benefits of voice technology. But I’m a writer who is constantly editing, changing their mind, hopping around, cutting and pasting, and even the really good voice interfaces like the Google dictation system and the newest dragon dictation system. I’ve tried them both, but my style of writing just, I have yet to find anything that seems to work for me. I just need to invest more in learning how to speak about editing and cutting and pasting and rephrasing and all that sort of stuff. What do you see as that interface for doing longer form, somewhat more formal texts?

Tobias: Yeah, I think there’s two phases to that, right? I think one is there’s a certain art to it as a user is that you have to get used to dictating it and then sometimes not using voice during the editing phase. But I do think this is again, a place where LLMs are going to come into play, that they’re going to get really smart at being able to interpret whether what you’re saying is an editing command or whether what you’re saying is part of the actual content. And again, this understanding level, like if I were dictating something and then I said, oh, wait, no, go back and change the last sentence to X, Y, Z.

As a human being, you understand that intuitively. Historically, the transcription tools haven’t been able to understand that at all because they’re not programmed that way. They’re purely transcription tools. But when LLMs come in, that’s kind of what they’re good for. So that again, this combination of the conversational AI and the LLMs is going to fix a lot of these problems that we’ve had. And to me, so injecting the LLM into conversation AI is kind of like analogous to injecting the smartphone into the mobile experience. It kind of all of a sudden opens it all up.

Jim: Yeah, indeed. I could see that as being quite useful. because LLMs are good for fuzzy things. They are amazingly clever. On the other hand, we know that they also hallucinate and are probabilistic so that they’re not exactly precise in most cases. And you did make a point though later in the book, which I thought was bang on, anybody thinking about using these kinds of technologies in their own business might consider this, which is one of the ways to make them more precise would be to take tunable, trainable models and fine tune them for one’s own application. Maybe you could talk about the idea of fine tuned LLMs in combination with voice tech.

Tobias: Yeah, so I think the problem with LLMs and this concept of general assistance is that it’s so complex. I mean, you’re in some sense is trying to replicate the entire human experience, which goes without saying is very, very difficult. But if you start with much more straightforward LLMs that are designed for specific use cases, that’s where I think most of the applications will be for the next two to three years. And one of the pieces of data that I found the most mind blowing is Google has indicated because they have all these assistants. They say that in English alone, there’s over 2000 ways that people set their alarm. 2000 different terms of phrase that people use to set an alarm. So if something that simple has 2000 approaches to it, you can understand the complexity of creating broad models. But if you have a narrow model and say, you know what, we’re just going to perfect how to shift your prescription, or we’re going to perfect how you interact with your pharmacy or how you order movie tickets. That’s a surmountable problem that we’ll get really good at and we are really good at in 2023.

Jim: Yeah, that makes sense. Because one of the things I have found is that models are pretty good at detecting syntactically different, but semantically similar utterances. We typically typed in. It makes sense because they’re essentially depending on latent semantic vector spaces. So even if you use very different words, they end up in the same place in the vector space. Have you guys done work on fine tuning models for applications for clients or for yourselves?

Tobias: Yeah, so the big projects right now are creating these specific models that are, in most cases, hopefully doing a much better job of answering customer service issues than the experience you just had. But step one of that actually isn’t to expose the consumer directly. Step one is to support the reps, the human beings, and get to a point where the human beings can rely on what they’re getting from the model. And once that happens, then the model can get let loose on the consumer directly. So it’s kind of a two-step process. The other thing I would say is, aside from the training on very specific use cases, which is really important, is this concept of using two LLMs to check each other.

There’s no way to completely eliminate hallucinations, at least at this point, it’s a really difficult problem. But what you can do is use two different LLMs. So the first LLM gives a response, and then the second LLM checks it. In the near term, you might have a human being then triple checking it, but as the interplay gets perfected, you can walk away from the human being in the loop over time. So I think that’s a really interesting approach. It’s proven you can’t stop the hallucination, but you can detect the hallucination with another model.

Jim: Interesting thing you mentioned. I’m going to give out one of my little secrets here. I do have a way to very substantially reduce the impact of hallucinations in mission critical work. I have a scientific paper I’m working on. I haven’t published it yet, but I’ll put it out to you. Maybe you guys can use it, which I call Paraquery. Now, unfortunately, it doesn’t work for real-time online stuff. But the idea, as I mentioned, models are good at mapping between syntax and semantics. And so if you ask a model, say you have a question and you suspect there’s a hallucination, you then ask the model to generate 50 paraphrases of your query.

And it’s very good. All 50 will be different by a little program detects that make sure that they’re different. And if they’re not, it throws them out. Every once in a while, it makes a lot of difference. Then you have it give these paraphrased queries. You then capture the result. And it turns out that in most cases, the entropy for the errors is different than it is for the correct answer. So even if the correct answer only comes out 30% of the time, the wrong answers are much more broadly spread so that maybe the top wrong answer is only 10%. And so if you do 50 paraqueries, which is a number I just brute-forced, the right answer’s head stands way above all the many wrong answers. And you can go to like from a 30% right answer to a 99% statistical probability of it being the right answer by using paraquery. So maybe you guys will figure out a good use for that.

Tobias: You know, that is super interesting. I can’t believe this. But my senior project in high school in 1989 was using the concept of entropy to analyze different authors and be able to recognize who an author is based on the entropy of their writing. You know, there’s some obvious examples like a Hemingway versus a Shakespeare, but being able to do that, maybe I should have pursued that. It would have been really valuable right now if I’d spent the next 30 years in that direction.

Jim: Yeah, indeed. Entropy is a surprisingly useful general-purpose tool. But you don’t even use the words entropy. You can just say that the statistics are distributed in the errors more broadly than they are in the ones that are closer to fact. It’s not always true, but it’s true an awful lot. This is quite interesting. I like the fact you mentioned using the LLMs to structure the work first. Another horror story happened about a year ago, but it’s officially bad, I still remember it, was dealing with an American Express clerk, again, to do a relatively low probability, but it’s not a zero probability thing. And the person was rigidly imprisoned by their finite state prompting machine, obviously. They were just reading what was supposed to happen next. And I was like, I’m ready to pull my hair out.

I’ve been a substantial client for American Express for 45 years. And I couldn’t believe that they stuck me in with this moron with a finite state machine. And I was saying to my wife, chat GPT could have done a better job than this moron, right? And they could have. And better would have been a foundational model that was actually trained, fine-tuned, or Laura-ized or something on the specifics of American Express’s business. And you could relatively safely test that by having it as the tool for the service agents before you, and then once you got that perfect by feedback, then you could hook it up to the voice system. Presumably, people are working on that now. I would expect.

Tobias: I mean, that is the most important application or the most immediate application of these models in most business environments is exactly that. And our parent company has 75,000 agents. And so we are testing it with multiple clients right now. And that is the first use case, is how do you make the agents more performant because you’re able to use these super powerful tool sets to help them on a real-time basis.

Jim: Yeah, they ought to work better on the lower probability parts of the tail than trying to enumerate everything in this tree structure, which is what they used to do.

Tobias: Yeah, I mean, I think it’s analogous right to, Google said four or five years ago, every time we add a linguist to our conversational tool set, that it gets worse, right? Because the linguists are trying to hard code a million different things. And that’s the breakthrough of the LLMs, is they don’t need to do that anymore. And so in the American Express example, if you can train the system on all the use cases on, they’ve got millions of recorded calls, train the system on those recorded calls of what was the ideal outcome of each call. Now that still requires a lot of annotation, it requires some human annotations, some synthetic data. So that all comes together to train the model because you can’t just upload the calls, you got to tell the system, this was a good call, this was a nine out of 10, and here’s why, and this was a three out of 10. So there’s a lot of work and everything in tech, it doesn’t happen on its own, and there’s a big, big difference between doing it in a high quality way and a lower quality, faster way.

Jim: Yeah, that makes a lot of sense to me. So let’s switch a little bit to a little bit more tech part of it, what’s the current state of the art on voice to text? Just as a data point, as I mentioned, I pride myself on my very excellent transcriptions for the Jim Rutt show, and I have tried pretty much every off the shelf, fully automated transcription service that’s out there, and none of them have quite made the cut. And I continue to pay one of the service providers that uses a human plus technology approach, and it gets it from like 96 or 97%, which 97% sounds good, but it means basically, basically every 30th character is wrong. All right, it just looks like shit. You could read it, but it’s just not very professional. When you get humans plus software, you get to 99.7 or eight, which means three characters in a thousand. That’s perfectly tolerable. So your mind, what’s the state of the art on direct voice to text?

Tobias: I mean, that data that you shared there, jives with even what the industry’s saying, I think Google says 98 or 99% for its off the shelf tools, which is generally right if you’re accent free, if you start getting into accents, the systems really start to fall apart. But again, this is where generative AI, LLMs play a huge role because they are so much better at this and they’re able to like real time looking at the output and to say, does this make sense or not? And then have all kinds of algorithms to kind of test it, et cetera. Because what they’re really looking at, what’s the next most likely word? So on a real time basis, they can actually help with that transcription problem as well. So I think again, none of this stuff happens overnight, but as we get into 2024 and especially 2025, you’re going to see 99.9 on the commercial tools because they’re going to be applying all this tech.

Jim: I would expect that, but I haven’t seen it yet. And to me, 99.9 is good enough. You know, I would cut my costs by a factor of 10 probably to go from the human plus computer to computer. I will say I haven’t tried Whisper yet, but I hear that it’s in a similar kind of range as the other guys.

Tobias: Yeah, we use Whisper a lot with a lot of clients. You asked me before what we kind of think is best in class. I would put that right there.

Jim: Okay.

Tobias: You know, we worked with Monticello here in Charlottesville to transcribe a lot of the tours that the guides were giving because we really wanted to understand what was being said on the tours vis-a-vis all sorts of, you know, social issues like, you know, slavery, et cetera, et cetera. And so it was like the perfect tool for that. And it got to well over 99% accuracy with just a little bit of training.

Jim: That’s cool. Okay, let’s go to a little bit more conceptual space, which our listeners tend to like a little bit, a little abstraction here. I was pleased to see you mentioned Baumol’s Cost Disease. This is a fairly important, but relatively little known analysis of why that hell has certain things gotten grossly more expensive in our society, like healthcare and college education, while other things have gotten grossly less expensive, like flat screen TVs, and at least until recently, and if you adjusted for quality, cars, and certainly things like kids toys and things like that. So talk us about Baumol’s Cost Disease and its relevance to this conversation.

Tobias: Yeah, so the concept is that there are certain areas where there’s basically been minimal productivity gain in the last, you know, 50, 60 years. And those things tend to be white collar kind of jobs. Healthcare is an example, education, legal is an example, and the combination again of Gen AI and conversational AI is going to, for the first time, have a massive impact on those professions, and legal is just an example, right? I think it intuitively makes sense that over time, now it’s not gonna happen overnight, I think all of us are aware of the lawyer in the last couple months that did their whole case, basically, and a lot of it was factually incorrect via chat GPT, there are gonna be models, specifically for the legal profession, highly trained LLMs, combined with VoiceTech that allows the interface to be so much faster that are gonna make those jobs way, way more efficient than they have been, where humans are going to be augmented, and in some cases replaced by this technology, and that just hasn’t happened, right?

We replaced a lot of blue collar jobs and made those more efficient, and I think everyone says, all right, does that mean we’re gonna have massive unemployment? It has never meant that before, right? Those jobs are painful to lose, and the people impacted are painfully impacted, but as a society overall, the jobs tend to move into different places, and to move into solving higher level problems, which makes life more interesting for those folks, but again, there’s going to be displacement and a lot of retraining, et cetera, and I think we as a society have to get ready for that and understand that.

Jim: That’s a good point, but there’s another interesting odd economic theory that argues maybe we won’t have any unemployment increase at all, and that’s Jevons paradox, right? Jevons paradox, when you think about hard enough, you know it has to be true, at least to some degree, which is as the cost of inputs go down, classic examples, electricity, electricity was hugely expensive, say in 1900. Now it’s very cheap in comparison. Gasoline was $200 or $300 a gallon equivalent in 1890, and it was $20 equivalent in 1910, before the big oil fields were hit, and yet as the prices fell, the consumption went up sufficiently much, not only was there much, much more demanded, but the actual dollars spent increased, and this is what the paradox is about, because you could see why people would buy more, but would the actual dollars increase, and here’s why, because as the input becomes cheaper relative to other inputs, it becomes economically reasonable to use more of that input rather than some other input. And of course in the days of the American boatmobile of 1967 to 1990, the big input trade-off was, well, gas is relatively cheap, so we’ll make our cars crude and heavy, right? And American cars were just grossly poorly engineered, relative say German or Japanese cars, they burned more gas, but gas was cheap, so it made no sense to substitute using better engineering and more expensive machining and lighter materials instead of gasoline, and so we actually kept spending more dollars per year on gasoline, in addition to consuming a lot more.

So anyway, when I use that example, talk about programmers, for instance. One of the things I’ve seen, programmers I’m working with, is that if they know how to use these AI tools, they’re probably three times faster at least. For people learning new tech, it’s probably 10x, right? If you’re learning a new API you haven’t used before, man, when I do that, I have my OPT-Chat GPT-4 window open, I’m hammering away at it, right? And 10x probably faster coming up to curve. However, the world has a hell of a lot more need for software that it’s ever been able to fulfill. So if you reduce the cost of that input, the human in the software development curve, my prediction will be not only will there be more software done, let’s say it makes a programmer three times as efficient, that doesn’t mean we’ll have three x the amount of programs, we may have 10x, because the incremental cost to automate something just fell by a third, therefore it makes more sense to invest in automating things. And that’s the implication of Jevons paradox, versus job offset from things like large language models. Does that make sense to you?

Tobias: I 100% agree, it resonates, and I think the software development one is one that’s relatively straightforward to understand, and we live every day. Most of our clients who are typically kind of fortune 500 companies, but also a bunch of mid-market companies, they have years of backlog of things that they want to get done. So what it’s going to help us do is get through that backlog much faster, but I guarantee you that new ideas are going to come in, new features, et cetera, are going to just get developed faster. I mean, this is why we’re all talking about this being as an important change as mobile was, and as the internet was going back to the PC, because it’s going to change the world in terms of how fast these new technologies come to us, and how fast change happens in society, really, because technology is going to start evolving so quickly that the software we all use every day.

Jim: And there will be a new golden age of innovation as people start thinking through, what does it mean to be right software for a third of the cost or a tenth of the cost? What things that I wouldn’t even have considered doing can I now consider doing in the same way that the golden age of PCs, the early 1980s, when suddenly you realize you could make a word processor for the masses, right? You could, I remember when Pagemaker came out, you could do desktop publishing, right? It was like, whoa, there’s going to be those kinds of fundamental new kinds of things that are going to come from voice plus LLMs plus less expensive software development, I would suggest. A little aside here, when I was reading your book, I saw something, and at first I said, whoa, is that true? And I checked it, it actually turned out to be real. And that was that someone along the line, somebody invented a way to write software using voice called talent. Could you tell us about that a little bit?

Tobias: Yeah, so when we talk to our software developers, they spend much too much of their time actually typing and getting the syntax just right. Now a lot of it gets auto corrected these days, but what they prefer to spend their time on is ideation. Like what should this program do and why and kind of the structure and how to interface with the API and how to optimize the architecture, those kinds of things. And so talent is an example of software that is really designed to free up the developers to do more thinking and less typing because it’s just so much faster to speak it. Now again, as you mentioned earlier in your example of like writing a document, it has to be smart because it has to real time get the syntax right and real time do the editing because there’s going to be misinterpretation of voice, etc. But these tools have gotten pretty good and they’re going to get much better over the next six or 12 months. And I don’t know if you’ve had a chance to test it, but it’s pretty neat stuff.

Jim: Yeah, I’ve not had a chance to test that. As I said, I tend to have a bit of an aversion to voice stuff. But if it could work on software and it worked in my workflows, hell yes, I would definitely use it. The company’s called Talon. Is that the name of the company?

Tobias: Yes.

Jim: And is that I think I see something. It’ll be on the episode page. So people want to do that. Now as we talked about earlier, I continue to see the problem of public babble as a potential barrier. And I was looking at some of your ideas about where these technologies would land. I started in my own mind to sort them out. Public babble problem or not, right? For programmers, especially as the world has changed to most programmers working from home these days, not a public babble problem. So I could see that working. But you also gave some examples and I said, Oh, this would be a big public babble problem. And that would be things like waiters and, you know, interacting with menus and things of that sort. And I heard earlier, you said, Well, we’re just going to have to get used to that. But, you know, you also talked about the fact that sometimes society just rejects technologies. You mentioned one of the maniac extreme ones, Google Glass. Of course, I signed up and bought a Google Glass. The instant they were available, because I’m a maniac extremer. But there was a huge social pushback against Google Glass and other things. Maybe there’ll be a huge public pushback against applications of voice tech that result in public babble.

Tobias: Yeah, I think it depends on the context, right? As you said, a lot of work is going to get done from home. Workplaces may change, like the concept of the complete open office may change. If you walk into a radiology office, they have pods, a lot of them, where the radiologists get in the pods, and then that’s where they do their voice dictation. Radiology is one of the earliest adapters of this technology or adopters, because they were doing dictation 30, 40 years ago by humans. Now it’s all real time transcription, because their time is so valuable, right? And so they recognized a long time ago that this is valuable.

Now, I think in restaurants it’s going to be accepted, because we’re already, think about in theory, when I look at a menu and I tell a waiter or I tell the quick serve, the person behind the counter and the McDonald’s, what I want, I’m already speaking, it’s already out there. I’ve already got the babble problem. And now I’ve got a human being whose only job it is today to listen to what I’m saying and then type it into a system. And that still in a lot of sit down restaurants, casual dining restaurants, they write it on paper, and then they go back and then they type it into a computer, right? It’s kind of a three step process. Instead of once I say it as a consumer, it’s out there, the information is in the ecosystem, it should immediately be acted on. So I think in that kind of scenario, it’s going to be very, very acceptable. I’ll give you another example, right? When you go to a doctor, primary care doctor, general practitioner, my doctor sits in front of a computer and kind of types real time as I’m talking to them, right?

Jim: Yeah, mine does the same thing and it can be annoying at times.

Tobias: It’s pretty annoying and you don’t feel like you’re getting their full attention. I know I have it, but that’s the world that they’re in. Why on earth are we doctors who are some of the most highly educated people in our society? They were basically, they’re human transcribers. The system should be interpreting what I’m saying as a patient real time and probably analyzing it and probably giving the doctor feedback on what I’m saying based on comparing it to millions and millions of other patients and data points. We’re going to get there super quick, right? And that’s something that’s going to happen certainly in the next couple years. Those changes are going to happen in medicine, etc., both at the doctor’s office and in hospitals.

Jim: Well, this opens up another whole area to talk about and that’s the area of privacy. When you talk to your doctor, sometimes you talk about things you probably don’t want in the computer, right? And of course, with things like Alexa, there’s always the concern, were they monitoring everything we said in the house? Supposedly not, but maybe they were. And particularly if you have a vision of the smart mic versus the smart speaker, there are a bunch of privacy issues. What do you think about that?

Tobias: It’s a huge issue, right? I think the doctor’s office example, I mean, they’re typing while you’re speaking today and you’re not sure what you said is in the computer and what it’s not. So, I think in that sort of closed environment, we already have that issue to some extent. But regardless, this is one of the challenges of the industry, right? Is we have to establish trust with consumers and voice tech has not done a good job of that historically because we don’t know what’s happening with the data. We don’t know what Amazon’s doing.

It turns out that they were using some of that data for training purposes and people didn’t even understand it. It is critical that there are clear policies. It’s critical that those are clearly communicated. I think most of the tech these days has already figured out that the concept of listening all the time for a wake word but isn’t a very safe way to do this stuff and is easily hacked or that data is at least tracked in some way. So, trust is everything, right? When it comes to a new technology and I go on for a bit about trust.

Jim: Do it. This is important. Talk about trust from the perspective of this technology.

Tobias: We talk a lot about it in the book and how the voice industry has violated some of the trust concept. So, human trust is two forms. Effective trust, which is emotional, which is human beings generally decide whether they can trust each other in some sort of emotional level within seconds of meeting with each other, right? And it’s a fight or flight instinct that did a lot of good for us. Also, it creates a lot of bias and a lot of bad things. But as part of that effective trust, there’s this concept of the uncanny valley, which was developed in the 1970s in Japan, where they started figuring out that the more human like you made robots or animation, the more people were freaked out about it. And there was this intuitive human concept that, wait, this isn’t right. I know it’s not human, but it’s trying to be human. It’s lying to me. And there’s theories that we think it’s a corpse. We think it’s a zombie. But anyway, humans don’t like it. And I think we saw it a lot of animation over the last 20 years.

If human like animation generally has done a lot worse than animation of animals where you love Bugs Bunny, but you might not have the same affinity towards a humanoid character. And so, voice tried to do that. And this concept of making all these assistants human like, in my opinion, is actually going in the wrong direction, because humans feel lied to. They know Alexa and Siri are not human beings. So let’s not pretend that they are. Let’s instead make the voice technology do what I say it’s going to do. And that goes into the second type of trust, right? Human beings long term, you develop cognitive trust, which is really just based on, does this person or this thing do what it says it’s going to do, right? That’s the line that has to get established over time. Voice failed on that too, because as you point out, a lot of voice experiences have been crappy. And so voice, as a technology, we have to solve the effective thing.

I think we solve that by not trying to fake human like emotions and human like existence. And then the second is, it’s just going to have to get better and better at a time and narrow its focus and say, these are the things we’re doing well. And so again, if you’re a bank, I think you will be able to launch a voice assistant that’s really, really good at what it does. And you’ve talked several times today about these low probability things where you have to call into a call center, right? And a classic example for a bank is reordering checks. If I ask you to pick up your banking app and reorder checks, your heart rate’s immediately going to go up because you’re going to like, this sounds like a giant pain in the ass because how I’m going to figure out where in the app they’ve buried the check reorder button. It’s the classic case where you should be able to pick your app up and say, order me checks and then you can see it confirmed on the screen and say go. Now, you’re not going to do that with Alexa Siri because it’s a discoverability problem.

Are you going to really ask Alexa, say, hey, Alexa, go to Bank of America skill, and then you go there and then order me checks because you don’t know that it works or doesn’t work. There’s no feedback loop, et cetera. And again, there’s no trust there. So this trust thing is at the core of why voice has been slow to develop. But I think if done right, why voice is going to be so powerful?

Jim: Yeah, two things there. One is technical trust in particular. Right. The people are very suspicious of big tech right now. And as it turns out for good reason, in many cases, and then the other is, you know, efficacy trust, you know, that it actually does what I want in a reasonable fashion. And those are going to be two big challenges. And in some ways, they’re kind of fighting against each other because the more training data you might be able to get from an open mic is just exactly the kind of things that people don’t trust big tech about. But on the flip side, if these low probability things could be handled that way, you even get a convert from me probably, right? Because as you say, oh my God, I got to deal with morons about a low probability event. There are a few things that are more stressful in the customer service realm.

Tobias: Voice is particularly susceptible to this efficacy trust problem, right? Because when you’re speaking today on these devices, you don’t know if the device is listening. You don’t know if the device is properly transcribing what you’re saying. And then if you get a result that you weren’t expecting, you have no idea why, right? And so the example I use, and we’ve tested this pretty heavily, is if you have a pizza ordering app as an example, what people want and what gets the most positive feedback when we test it is that as you’re speaking into the app, it’s actually transcribing what you’re saying.

So you’re saying, Hey, order me two pizzas, one that’s thin crust and a second one that’s pepperoni and sausage, thick crust. If you see those words real time as you’re talking to the app, and then the app behind you, it is real time moving and transforming itself to the order. That gives you a lot of confidence as a user that the system is doing what it says it’s going to do. Whereas if you say something and you wait five seconds and then you get a response and it’s not the response you wanted, you have no idea what even went wrong.

Jim: Yeah, just want to throw the phone out the window.

Tobias: Then you’re done.

Jim: Yeah. Yeah.

Tobias: But what Jim is really interesting about the example I just gave is most of our communication as humans is call and response. I say something, you say something, you say something, I say something. Now we’re going to get to concurrent communication where we’re saying things and the device is doing something real time, which even more shortens these communication loops. And so this concept of concurrent communication, I think is super, super interesting.

Jim: Actually, I love that example. It’s well thought out. And then you tie it into your multiple modality idea. I don’t want my damn phone reciting my order back to me. I just want to see on the phone that it’s put the right order up. And then I press confirm or say confirm and off it goes so that you can use the two technologies together. Now that example, so obviously good. Are people like DoorDash and things like that deploying this kind of stuff yet? I haven’t seen it.

Tobias: DoorDash launched something like this last month.

Jim: Okay.

Tobias: We’ve tested it. I think it’s good. It doesn’t do everything that I was just describing. So it’s got a little bit of work to do. We didn’t build that one, but if anyone from DoorDash is listening, give us a call. We keep to help you.

Jim: Of course, DoorDash, there’s of course a big difference here between say Pizza Hut, where they’re the single back end and DoorDash, well, they have to deal with a whole bunch of different back end. So it’s a qualitatively different problem. But doing it for Pizza Hut, I presume would be vastly simpler than doing it for DoorDash.

Tobias: Yeah, it’s a much more limited data set. And they obviously are going to want to optimize everything for what they’re doing and make sure that they’re getting their upsells into it so that, hey, do you want to soda with that? Like there’s a lot that goes in and gets optimized in these systems as you kind of put them together. But again, going back to the Google example, 4,000 ways to kind of set your alarm, there are literally billions of combinations when it comes to ordering just pizza from one pizza place. I mean, it is, that’s why these LLMs are so important, right? Because you can’t solve that through how they try to do it like these, these logic trees. It’s just impossible.

Jim: Yeah, essentially you flatten the information space, which is what you want. You know, the finite state tree machines, they don’t even work very well in games, right? Of course, a lot of the games are driven by them. And I would expect to see some LLMs be introduced in those categories. You actually did talk late in the book about the idea of these voice texts providing a new horizon for games. Let’s talk about that a little bit.

Tobias: Yeah, and so it kind of applies to games and applies to the metaverse, right? There is no metaverse without great voice tech. And there is no kind of immersive gaming without great voice tech, because especially the more complicated interactions, you’re not going to interact with the metaverse by a keyboard. You’re going to have to interact with the metaverse by a voice. So the voice, if the metaverse is going to happen, which I think is less clear to all of us than is voice going to happen. But if it is going to happen, it’s going to rely on voice first.

And so I think there’s going to be a whole new category of voice driven entertainment and gaming. Again, this is, we come back to this, we had a bunch of summer interns here this summer. And I said to them, I was like, I wish I were in your shoes, because the next five years are going to be just like the late 90s were on the internet or 2010 to 12 or with the iPhone, where stuff is going to happen and it’s going to be mind blowing. It’s going to come out of left field. And the example I was used and I used for them is when Steve Jobs introduced the iPhone, no one in the audience, no one listening predicted that four years later, the taxi industry would be wiped out by Uber. But that’s what happens. He’s second and third order things happen. And you alluded to it earlier when you said this concept that once this becomes cheap, people are going to find uses for it that they’ve never found before, right?

Jim: Indeed. On the side of the things that seem to me pretty compelling, you described an example where like Singapore Airlines uses voice tech to optimize the structuring of the data for the after flight check of the plane. Oh, there’s a tear in seat number three, you know, some party of a Bachelorette’s puked all over row number nine, etc. And that’s again, struck me as really good. You get productivity and you don’t have the public babble problem. Do you see those kinds of examples of, you know, essentially at work and where there’s a data entry intensive or at least should be data entry intensive as being very good low hanging fruit for these kinds of technologies.

Tobias: Super low hanging fruit. We haven’t talked about that at all. But the industrial applications are going to be super earlier at work applications. I think the category that they often fall into is this concept of heads up hands on is work that has to be done with both your hands or at least one hand and that you want your eyes on the work, not on a screen. And so there are so many inefficiencies that happen today, but because the technology in essence interrupts the work and law enforcement’s an obvious example along assembly lines, there’s a lot of work being done or in warehousing when you’re stacking stuff or taking stuff down. Those are all perfect examples where if you had a voice interface while you’re doing that, it just it frees up a bunch of time that’s now interrupted in the Singapore Airlines example. They have these teams come into planes that have, I mean, they’re trying to turn these planes around in 30 minutes, right?

From the time the last passenger leaves till the next plane leaves the gate. So they’ve got 10 or 15 minutes to have people like taking all the trash out, etc. And they’re working super hard and they’re have to be super fast. If they have to stop and pick up their phone and open the app and swipe and type in and say there’s a tear on seat 32 or seat 32 is in up for the next flight, that’s three or four minutes out of a process that’s supposed to take them six or seven minutes to do the entire plane. If they can do that while they’re working, and this is why Singapore Airlines was so interested in this, it makes the entire process of getting the plane ready for takeoff and not having it delayed so much more efficient. It’s just one really easy to understand example of why this stuff is so powerful.

Jim: As soon as I read that, I put a note down in my notepad. Another one that I came up with, this will open up a whole area of conversation. So imagine if you’re an auto mechanic and your head’s down inside the hood of the car and you need a different tool, I got to disconnect the Framus from the what’s a hopper, right? And you’re yelling over to your assistant robot and it knows or can query a database and figure out what the right tool is and bring it over to you. And so there’s two things. One, it’s as you say, heads down hands engaged work where you want to interface the world. But the more broader category is the interface between humans and robots.

Tobias: And it takes on a couple different levels. I mean, one of the examples we have in the book is actually from my wife, who’s a surgeon. They have different stages in the operating process where they have to interface with the machine and hit a button. And if she’s in a sterile environment, she can’t hit that button without violating the sterile environment. And so today, what she actually does is kick her shoes off and taps the button with her toe. So I mean, imagine these examples are crazy, right? Like you’re in the highest tech operating environment and surgeons are using their toes to interact with machines.

Such an obvious voice use case. That’s a part of it. The other is safety. A lot of the accidents, whether it’s the deep rise in accident, the Gulf of Mexico or the Boeing 737 accidents. They’re ultimately human beings not knowing how to tell a machine what to do. The oil rig that blew up in the Gulf, they for hours knew that things were off kilter, but they literally couldn’t interact with it in a way that would stabilize the situation. The 737 MAX accidents, pilots knew that things were going wrong, but they didn’t know how to interact with the device. In this case, the autopilot to have it do what they wanted to do. And so these aren’t like theoretical examples. Both the U.S. Air Force and the Russian Air Force now have voice based cockpits. They’re testing for exactly this purpose because it’s so much more intuitive than using buttons and levers, etc. To tell a machine what to do. You should be able to tell it in certain instances, much more efficient.

Jim: Again, this is where it gets quite interesting, especially in these very mission critical things. Imagine you’re a fighter pilot, it’s about ready to engage the enemy. You don’t want to be making any mistakes. Right. So the, especially if you have things like large language models in the loop, which are not quite precision devices, there’s a real issue. And I suppose we’re going to see a period where, in fact, some of the work I’m doing, we have specifically chosen applications where less than perfect answers are OK. How do you see that playing out over time?

Tobias: It’s a progression, right? And I think people will very quickly over the next three to five years, I don’t mean by December, figure out how to optimize these systems and using some of the approaches we talked about today, having models check against each other, using highly trained models with proprietary data, only about that scenario, which has to happen anyway, just due to expense. You can’t have these broad models. It’s just too expensive to run versus very specific applications. This is the area where most of the research and the thinking and a lot of the startups are happening right now, which is around how do we test to make sure that these models are good? And then how do we continue to train them to make them better and better at scale?

That’s where the innovation is happening right now, because a lot of the work at places like OpenAI, but lots of other, et cetera, they’ve spent a lot of time getting the baseline models into a great place. But now it’s tuning them for certain applications and testing them, because I think it’s a policy thing, right? If you’re running American Express and you’re in legal or PR, whatever, you’re not going to want to let these voice based models or non voice based models out into the wild until you have a very, very high degree of confidence that the error rate is extremely low and that the more importantly, even than that is that the errors that happen are explainable and reasonable and not

Jim: It’s like what the hell right kind of things, right? You don’t want those.

Tobias: Yeah, like you can’t. MX can’t have its model telling a user that it’s falling in love with them. Like that just cannot happen.

Jim: Interesting. Now, are you all actually doing this kind of work now? And if you are, do you have a perspective on which of the foundational models turn out to be good for this kind of specialized business training?

Tobias: A lot of what the work that we do is actually select the model for the use case. And step one is there’s a lot of models out there that you start testing them for specific use cases. And some of it, obviously accuracy and quality is job one, but cost at scale is an important parameter as well. And so, you know, just like anything in tech, there’s not a one size fits all. It’s more, you got to look at the use case and then you got to have people that have seen a lot of these that can give you some really good advice for that specific use case.

Jim: Yeah, in our case, we have about eight models hooked up and we’ll have two more soon. And we’re now getting to the point where we need to start thinking about automated testings of the models, right? We have all these scenarios and workflows that have like 15 steps in them. And yes, we can have our testers and our QA people run through them, but the ends start getting pretty small when you divide it by 10 different models. And so we’re given some thought to automated QA on models, you know, which model should we use for which step in the value chain? And it turns out that, as you say, we have about 15 steps.

And while the state of the art open AI models are the best for about 11 of those steps for four of the steps, including one of the most important ones. That’s not true. And being able to validate that and select what’s the right model for each step in the value chain, we believe could be a very important longterm competitive advantage. But as far as I know, there’s very little testing or very little on the literature about automated testing of foundational model selection for specific tasks. Do you have any thoughts on that?

Tobias: I mean, as we sit here, October 2023, that is the biggest opportunity right now within the ecosystem. And there’s a lot of people thinking about it. A lot of people working on it. I suspect within six or 12 months, we’re going to see some pretty sophisticated tooling come out to address these things. But, you know, if you talk to VCs, like this problem set is where the money is going right now. And I think for good reason.

Jim: That’s interesting because we’re going to have to build it ourselves. We’ll have something out in about two weeks, just for our own purposes. I hadn’t really thought about it as a product or a service.

Tobias: Maybe we’ll help you think through it because we have a lot of client examples that are working on this issue right now.

Jim: Huh. That’s interesting. As I’ve always said, most business ideas come from client problems, right? Rather than from, you know, sitting in the ivory tower thinking them up on your own. At least my own companies. The biggest, best ideas always came from our customers. So in this weird case, we are our own customer. We’ve stumbled into something maybe that’s more important than we thought. On the other hand, as you know, when you do the in-house solution, you do it as quick and sleazily and dirtily as you possibly can to get it to do a good job for you, you don’t necessarily generalize it and productize it. Anyway, something to think about.

Tobias: Yeah. I mean, as you know, too, in tech, right, it’s the product and having a good product is part of the game. And then it’s all the marketing, all the support, all the distribution.

Jim: And that gets harder every day as the ability to communicate gets less and less as our world is just full of bad noise, right? The noise level on our communications channels is just totally out of control.

Tobias: Well, I mean, that’s something we’ve been thinking about a lot, right? Because these voice-based LLMs are going to make it even easier to create content, right? A lot of the content creation is going to get automated.

Jim: Oh, yeah.

Tobias: So the noise is going to increase. And I think then we’re going to use LLMs to synthesize and curate the content coming into us. So we basically have this arms race between models that are creating content and models that are curating content.

Jim: Well, I talk about this on the show regularly. I call it the flood of sludge. And it’s already happening. The quality of alleged journalism has dropped precipitously in the last year. The number of fake sites that you get when you type in a Google search has gone up by a factor of two or three in the last year. And it’s all driven by LLMs. There’s no doubt about it. I call this the info agent opportunity. And I’m going to repeat it probably the 10th time on the show. Young entrepreneurs want to be a trillionaire? Solve this problem. This problem is real. It’s big. I’m advising three different projects that are working on this. But it is a hard problem. And LLMs are not a silver bullet. They’re probably a tin bullet. They’ll help. But there’s an awful lot that’s going to have to be done. And I’m absolutely with you. It’s an arms race. We’re going to have to build defensive mechanisms against the flood of sludge or our ability to do collective sense making as a society will just stop probably or at least be massively diminished.

Tobias: I agree with you. We all talk about the mental health crisis. And I think it’s a contributor. Is that the amount of noise and stress and nonstop. out west of Charlottesville. So you in theory are a little getting less of it, but in practice, probably not because every time you turn on a device, you’re getting the same exposure that someone in New York City is. And I think it’s, it just creates this like baseline stress where there’s always something you always, and a lot of it is auto generated and a lot of it is based purely on algorithms where it’s figuring out what people are clicking on and what do people click on things that create an emotional response. And so we’re just getting fed this content nonstop that’s purely designed to have an emotional response, which I don’t think is super healthy.

Jim: We’re getting off topic here. This is one of my favorite topics. I’m going to go on it anyway. Hey, it’s my show. Do what the fuck I want. Right. I do try to very carefully curate my flows. For instance, I go on a social media sabbatical for six months every year from July 1st to the 2nd of January. So I’m just about at the halfway point of this year’s social media sabbatical. And that is a tremendous energizer. And I probably wouldn’t be able to get as far on the current projects that I’m working on if I wasn’t on my sabbatical. Second, this is a more general perspective.

And I think bang on to what you were pointing at. I do a lot of work in cognitive science and cognitive neuroscience. Those are two of my areas of some considerable depth. And I’m now starting to think that humans were evolved to have a capacity to process no more than X number of interrupts per day, right? Whatever an interrupt is, your phone rings, a tiger jumps out of the woods and wants to eat your ass, right? Or your goddamn phone pings on you, right? And that one of the mean drivers of reduced efficacy of people, anxiety, depression, suicidal ideation, you go down the list of horror shows of things that we’re seeing, maybe something as simple as the fact that most people are now over the line on the number of interrupts per day. There’s some relatively simple things you can do. You talk about people taking their phone to bed. Don’t do that.

You know, leave it out in the kitchen, right? Also turn off all your notifications of all the communications channels that I have. There’s only one that I have notifications turned on. And it’s one that only very few people have the address for. Because let’s think about this thing here now. We managed to survive 200,000 years without having to respond to Slack in 15 seconds, right? And the amount of things that are on Slack that require an answer in 15 seconds are vanishingly cost close to zero. So decouple from interrupts and maybe you’ll become more sane. I think that’s going to be a really important part of this info agent sphere. Is it it batches stuff and then it prioritizes stuff. In fact, what I’ve been talking to these people that I’ve been mentoring and trying to get them to do this.

And by the way, I have no equity interest in any of them. I just want this to happen. This is my contributions to the world to make this happen is I would like to be able to say, let’s say from the equivalent of social media and the news. I’d like to have a slider which says I only want 10 things a day. You the agent figure out what are the 10 things that I am most likely to be interested in. And don’t give me anything more than that. Because I’ll bet the 11th thing I really shouldn’t be wasting my time on. On the other hand, if I’m on kind of browsing mode where I’m exploring more, I might open that up to 30. If I’m working really hard on a project, I might move the slider down to three. Give me three things a day from the outside world. You figure out what they ought to be. And I’ll give you feedback on whether you’re hitting it or not as a way for us to be able to defend our most precious facility, which is our attention.

Tobias: I love it. And so one of my favorite books we’re talking about favorite books before we got on the show is Flow. It’s basically, the head of psychology at the University of Chicago studied human happiness over a decade across the world, across socioeconomic groups, etc. And they found out that the highest correlation to human happiness is how much time per day you spend focused on something. It could be a hobby. It could be your work. It could be, etc, etc. And we adopted that as one of our core values here at the company is that we want people to have three or four hours of uninterrupted flow a day. But it’s interesting that all this interruption is decidedly interfering with our flow as human beings, which, you know, we now have a body of work that proves that that is one of the core drivers of happiness. So I 100% agree with you and as a society, this is something we really have to tackle that kind of no one’s talking about except for you.

Jim: I can’t pronounce the name, but it was like Sixla Malkil Hall.

Tobias: I know. And he just passed away, I think last year, but he was an icon in the space.

Jim: Yeah, he was the guy that really did some of the best work guy. I listen to a lot on this is guy named Jamie Wheal W H E A L I’ve had him on the podcast a few times. He’s written several books on how to maintain flow state and the importance of flow state to happiness. He does consulting to businesses on how to get them to have more flow state.

I think we’re 100% agreement there. Even back in the days when I was a busy CEO, public company CEO, total radical change for my predecessor, I instructed my assistant two hour block a day every afternoon, in violet, unless the world’s on fucking fire, right? You know, of course, being a CEO of a public traded company, a very controversial place, about one day out of five, the world was on fire some ways. But four days out of five, I got my two hour block to actually be able to focus and do constructive work. I strongly advise leaders to realize one, they’re not as indispensable as they think. I like to say graveyards are full of indispensable men and women now. And try to find those two hour blocks most days for yourself to work on the really deep important things that playing ping pong with everybody on your team is not really the best use of your time.

Tobias: Yeah. And as a corollary, I think for me, it’s spending 15 minutes every morning writing down what are the three things I can get done today that will really move the ball down the field.

Jim: I would use five, but yeah, a small digit number, no bigger than this, put it on three by five card, put it in your pocket and cross them off as you go, right? Now, let’s wrap up on our topic here. Will voice technology make the interrupt problem better or worse?

Tobias: I think in and of itself, it’s just a tool. I think left to its own devices, it will make it worse if we are conscious in terms of how we use it as consumers. And the reason it makes it worse, because it makes it so much easier to create content. And so now there’s two times, five times, 10 times as much content going to be out there. So the volume is greater. But if we use it as a tool, and I think you were giving some examples that we really restrict the content that comes into us, and we use it as part of the barrier that we all have to create the bubble around ourselves, the important word is intentional, it will help reduce it. The technology in and of itself can do both. It’s up to us as users to make sure that we’re deploying it and we’re using it to restrict the content that’s coming into us. And right now I’ve got a 15 year old son, he is incredibly susceptible to the millions of things being sent to him at every minute.

Jim: Tell him, cut that number down, cut that number down, you don’t need 3000 interrupts a day, right? You know, get it down to 100 and you will be much happier and more productive. This intersects actually with my, hadn’t thought about it before the show, but this conversation has brought up that this issue of being over interrupted may actually have some bearing on my dislike of public babble, right? To the degree people are babbling around you, at least unconsciously, that’s an interrupt, you know, because you’re, we do know, for instance, that you are unconsciously listening to any stream of sound to see if your own name is mentioned, for instance, right? So there is a certain amount of cognitive energy that has to go into processing verbal streams, even ones that are not real that you don’t think are relevant to you. So public babble is part of the problem here. I’m gonna, I’m gonna assert.

Tobias: Yeah, I agree. We as a society have to figure that out. As we said earlier, some of this gets mitigated a little bit from, at least in the work environment by people working at home, but we’ve all been in the supermarket with someone, whether they’re talking on a phone or dictating something that’s just super obnoxious. And

Jim: We should tell them so, God damn it.

Tobias: There should be a publicly acceptable way to let someone know that that’s not okay.

Jim: What about pull out your 44 Magnum pull the hammer back and put it against their forehead? That generally gets their attention in my experience. Anyway, I want to thank Tobias Dengel for a really interesting conversation out there on the cutting edge of what’s happening now.

Tobias: Thanks, Jim. It’s been wonderful.

Jim: Yeah, it’s been a whole lot of fun. I’m gonna wrap it right there.