Transcript of EP 212 – Joy Hirsch on How the Brain Responds to Zoom

The following is a rough transcript which has not been revised by The Jim Rutt Show or Joy Hirsch. Please check with us before using any quotations from this transcript. Thank you.

Jim: Today’s guest is Joy Hirsch. Joy is a neuroscientist and professor of psychiatry and neurobiology at Yale University. Her work aims to understand the neural circuitry and fundamental mechanisms of the brain that enable human cognition, language, emotion, decision making, and perception in both healthy individuals and in patients with neurological developmental and psychiatric disorders. Her lab is also expanding the experimental paradigm from single brain frame of reference to multi-brain frame of reference using near-infrared spectroscopy, functional near-infrared spectroscopy. We’re gonna talk about that a little bit. So welcome, Joy.

Joy: Thank you, Jim. Nice to be here.

Jim: Yeah, I stumbled across your paper. I don’t quite remember how. I think it was in some email I get, you know, interesting papers of the day or something. And I looked at it and I said, this is very timely. This touches back of my interest in the past about brain scanning, et cetera. So I reached out to her and Joy very graciously agreed to come on the show and talk about the paper. And the paper is titled, Separable Processes for Live in Person and Live Zoom-like Faces. Why don’t you give about 20 seconds on the highest level concept, and then we’re gonna dig down into some of the details.

Joy: Okay, very good. Thank you, Jim. The highest level concept is that we were able to do a two-person neuroscience study that wanted to look at the differences in how the brain processes faces that are viewed on live interaction during Zoom and faces that are encountered during in-person situations. So it’s the in-person versus the Zoom comparison that we were interested in studying in terms of how the brain responds to that information. Our hypothesis was that they were different, but we didn’t know how.

Jim: Yeah, and we will talk about this. Obviously, of some social significance at that point, where so many of us, our lives have moved to Zoom. Right, in fact, I will confess to coming in a Zoom maniac years before COVID, but the rest of the world caught up with me because I live very deep in the mountains of Appalachia. When I wanna reach out to the world, I’ve long used Skype and then Zoom. But anyway, before we get into the details of the paper, I would like to have you tell us a bit about some of the technology and paradigms, I guess is the word you used. That’s as good as one as any. Many of our listeners know about things like fMRI and the bold signal, et cetera. Could you maybe tell us about the near-infrared spectroscopy and contrast it maybe with fMRI and also maybe with EEG, talk about things like, you know, temporality, locality, precision, all those kinds of things.

Joy: Perfect, okay, okay. So let’s start with the really, really big picture. Before we go to the technology, let’s start with the brain. The basic indication that we have that neural parts of the brain are active is that there’s this principle of neural activity where every little part of the brain that is performing a function, doing a task, is recruiting highly oxygenated blood. And so if we want to use blood flow to oxygenated areas of the brain or to active areas of the brain, we have to detect it one way or the other. And so in the case of functional MRI, we detect blood flow to active areas of the brain using magnetic resonance. And this is an old and well-established technique. It localizes extremely well where blood is being recruited to specific areas of the brain during neural activity. Similarly, that same signal can be detected by other methods. It can be detected in this case by functional near-infrared spectroscopy.

Now this method is an optical method. It detects the same signal that indicates the neural tissue is active. But when this neural tissue is active, the blood that is recruited is highly oxygenated. And so what this new technique does is inject little bits of laser light through the skull of the head into the brain. And the light is diffused and some of it comes out in nearby areas. And we can detect the amount of light that goes in and the amount of light that comes out. And the difference between tells us how much light was absorbed. That indication gives us a measure of the neural activity. And it’s because of this endogenous signal that active neural tissue is recruiting highly oxygenated blood to the local area. So this optical method is used with little detectors that are fit on the head. It’s like wearing a swim cap. And the swim cap has little buttons on it.

And each of those little buttons are detectors and emitters. So instead of going into a scanner and being in a encapsulated bore with a cage over your head to measure neural activity, we can do these measurements outside the scanner. And what this allows is for two people to be sitting across from each other and engaged in natural interactions while the neuroimaging is going on. So it turns out most of what we know about how the human brain functions in general is based on what we call solo activity, individuals alone. We put them in the scanner, a single person and we asked the single person to do something. And by imaging the brain, we can learn a great deal about how the brain does that task. But because of this, we know almost nothing about how the brain actually responds in live interpersonal interaction. Because that requires two people being engaged in live interpersonal interactions while the neuroimaging is going on. Because this cannot be done in fMRI, it must be done with another technology.

And so one of the reasons why we have developed these near infrared spectroscopy is because we have a new question that we want to understand this whole new frontier of neurobehavior, which is what happens when individuals are actually relating to each other. So what we gain is the ability to look at these new functions. But what we lose, and Jim alluded to this, that there’s always a gain and a loss. There’s a trade off and an advantage. What we lose is the ability to image the brain in high resolution. Our resolution is quite compromised here. It’s about three centimeters as opposed to one and a half millimeters. So that’s substantial. And furthermore, we do not image regions of the brain that are deep within the center of the brain. We image what we say, superficial cortex. Now there’s an awful lot of brain in the superficial cortex that gives us a very good idea of the circuitry that’s involved, but we’re not reaching emotional areas, for example, like the amygdala, the fusiform face area, and those regions that are buried underneath the surface. So Jim, does that sort of answer the first question that you had?

Jim: The one other point, as I did my own little preliminary research on this technology, frankly, by reading the Wikipedia page, it seems like that there is a benefit though versus fMRI and it’s relevant to the work that you’re doing is that the temporality is much crisper. Talk about that a little bit.

Joy: So the timing, I disagree a little bit with the Wikipedia page and let me tell you why.

Jim: Okay.

Joy: And it comes from the basic signal that we are imaging. It is what we call the hemodynamic response function. Remember, go back to the brain. In the brain, the signal that neuro tissue is active is this recruitment of blood to the local area. Now, this signal is slow. It takes about four seconds for it to rise to the top, maybe fuzzes around for another four seconds, and then falls. So it’s a four to eight second signal. Now, with functional MRI, we image that signal by hitting it about every one and a half seconds. So you’ve got a rise in signal of about four seconds. And so you measure it about three or four times until it reaches its peak. Now, in the case of Neuronferred spectroscopy, we acquire images at about 30 milliseconds. So it is hugely faster. And that’s what the Wikipedia note was telling you. But it’s hugely faster, measuring the same slow signal.

Jim: Oh, I’m damn glad I asked this question then. Now I know, okay, because that makes sense.

Joy: So we just measure the slow signal better with Neuronferred spectroscopy.

Jim: Well, that’s cool. Okay, let’s go on to the next kind of experimental paradigm question, which is that you emphasize that a lot of the work in the past has been single brains at a time. Though my friend Read Montague has done some work with linked brains going back 20 years or more, though in fMRI and obviously not walking around. I think he was experimenting with some walking around stuff not that long ago, I know, thinking about it. But anyway, in general, most of these experiments are single brain. And what you are really pushing hard here on is brains together interacting socially. Talk about that, how that differs.

Joy: So let’s go back to Read Montague. He deserves a huge amount of credit because he was the first to imagine that we could actually link scanners together and have individuals doing interactive tasks in a scanner. He imagined that if we could do that, that we could actually come close to measuring some of the really important human behaviors. And that is how we interact with each other. So he imagined that we could have like people playing chess, say one scanner to another and so on. He was successful at doing that about 20 years ago, but Reed was far ahead of his time. The temporal delays made it hard to do this in real time. There were the high magnetic field, made it very difficult to have additional measurements like eye tracking and physiological measurements and so on, go on simultaneously. And furthermore, the method of connecting scanners is doomed from the very beginning to setting social interactions because we do not have live face-to-face contact or live eye contact.

One of the reasons is not that those variables could not be overcome with some types of video, but the reason they can’t be overcome is because head movement is contraindicated absolutely in the scanner. If you’ve ever had an MRI done, the person puts you in the bore and says, don’t move your head. And you simply, it’s like moving around when your picture’s being taken, all the images get blurred. So in a scanner, you can’t move.
And that, in addition to the fact that there’s like 110 decibel noise that’s going on outside the scanner, you’re completely enclosed and alone. No matter what IT tricks you can employ, you cannot get the behavior of natural interactive communication to occur. And so if you can’t observe the behavior, you can’t image it. Just first principles of neuroimaging. So Read had the right idea and other people have followed him also with a similar idea, but they’ve been defeated by the technology that really hasn’t supported it.

Jim: So let’s then nail this off on what this new neuro-infrared spectra spectroscopy, God damn it, why does my tongue not want to say that today? What does that actually bring to the party? You did it, it alluded to it, but let’s just nail it down and just be very specific about what you’re able to do that FMRI was not able to do.

Joy: Okay, first of all, let’s nail down the name of this technology instead of calling it functional neuro-infrared spectroscopy, since you and I are both having trouble with that. Let’s just call it F-nears.

Jim: F-nears, okay, we’ll go with that. F-nears.

Joy: And so we have FMRI, the scanner technique, and F-nears, the head-mounted technique. Okay.

Jim: Because I have one of the world’s experts on this here, I’m gonna just have to ask a question for my own curiosity. How in the world does the light get through your hair and your skull?

Joy: So let me answer that question first, but hang on to your other question and we’ll nail down the differences, but let me just answer that briefly.

Jim: Okay.

Joy: You know, like if you have a laser pointer and you’re standing at a whiteboard and you’re lecturing and your laser pointer just points to the board to identify what you’re talking about. If you take that laser pointer and you point it on your finger, then what happens is that you kind of see this red glow, the light diffuses, the little lasers that are connected to the, everybody said like the little buttons on your swim cap that you put on, they’re little lasers. And these little lasers inject a small amount of light and it’s very much like your laser pointer. It’s completely non-invasive. It is no more dangerous than your laser pointer, which is dangerous at all. It’s called a minimal risk technology by our institutional review boards.

So these little emitters inject the light. Okay. So hair, hair is a problem. It doesn’t go through hair. And so these little caps that we have, have little holes in them. And so what you can do to get the laser mounted, so it’s perfectly on your scalp and not on top of a hair follicle or a hair. You have to kind of just move it just a little bit. These are tiny little laser points that you just have to put right on the scalp in between the hair. So that’s really, really important. Now, the light actually goes through the skull and goes down to the neural tissue and only goes down about three centimeters. And that’s why I said, we measure only superficial cortex with this technique.

Jim: Three centimeters pretty deep actually.

Joy: It’s deep enough to do some very beautiful recordings of the functions that we’re very interested in. First of all, near-infrared spectroscopy, or F-nears and FMRI, are measuring the same signal. So let’s start with how they are alike and then go to how they are different. We’re measuring the same signal. Neural activity in the brain is indicated by recruitment of blood flow to locally active areas that is detected both by MRI and by F-nearest. The difference is that F-muri has higher resolution and you can measure more of the brain. F-nears allows you to image outside of the scanner in live interactive, real social situations. So we have access to a whole new domain of human behaviors that we don’t have with FMRI.

Jim: Cool, that’s a great summary. Wonderful, thank you very much. All right, now let’s dig into the, you mentioned the hypothesis that zoom and in-person interactions, and we’ll get into the technique and the setup a little bit later that are essentially identical. Your hypothesis is they will produce different signals from the brain, that’s very short. What did previous research say about this? Obviously before you start to work on a project, you go back and look at related work of others. What was the previous work telling you about this before you did it?

Joy: Okay, so let’s start with the way back machine here. Face processing is a well-established question in neuroscience. Vision scientists have been studying face processing for a long time, and this is because faces are some of the most salient, most important, stimuli that the brain actually manages. Faces are extremely important to all of us, from the beginning of our life to the end of our life. So face processing has been the topic that neuroscientists have studied long before F-nears came on board. And the hypotheses that have been generated from these decades of studying face processing are that faces are encoded by the brain based on their features. So we have features of eyes, features of mouth, features of shape of the face, and these features are used by the brain to identify who the speaker is, the associations with the person, and so on. So features are the primary stimuli that differentiates type of face processing. Now, in the study that we did, the features are identical because they are the same face, the same task, the same condition. Everything is identical.

And so the conventional wisdom is there should be no difference between Zoom and in-person processing of just the faces. So let’s start with that background. But why would we suspect that there might be a difference? The reason is because it feels different. The reason is if you ask your grandmother, is it different talking on Zoom than in real person? She’ll say, yeah, stupid, it is different. So you don’t need to be a neuroscientist to sort of say, she was, horse sense says that this should be different. So then the question is, if it’s different, then why, how? Because if it’s not features, what is it? So that was sort of the background that led us to this study. There was a conflict of predictions. In one case, it seems like it should be different, but yet the bottle say that it would not.

Jim: So great, you’re challenging the accepted dogma. That’s what we like in science, right? That’s what the great science is. So now talk about the setup. How did you control, and I read all the details, and I go, whoa, you guys really thought this through. So why don’t you tell our audience your setup for the experiment?

Joy: Okay, yeah, and it’s true. I want to start with the credit to my people. I have an engineer that is masterful at setting up the IT, Adam Noah, he’s one of the authors, and a computational neuroscientist that is just amazing at helping processing these really complicated data pipelines, and that’s Chen Zhang. And then, of course, while I’m giving credits, the first author of this study was a graduate student of mine from China, who was very interested in remote communication as well as in-person communication, again, with a second language issue and so on. So there was quite a team of expertise that went into setting up this technology. As I like to say, this is not a plug and play. You really have to pay attention to getting it set up right. So let me tell you what happened. Again, let’s go back to the human brain. We always have to start with the brain. And the way we do functional MRI, and remember, we’re dealing with the human dynamic response function.

That’s the same response function, whether we’re doing MRI or whether we’re doing neuroinformatic microscopy. What we do is that we have blocks of time periods, like, say, 15 seconds, where we do the task, and then we turn off the task. This is a rest period. We let the signals fall back down to their baseline. It takes about 15 seconds. And then we hit the brain again with the task. So we have 15 seconds on, 15 seconds off, 15 seconds on, 15 seconds off, and so on. That’s how we conventionally do these neuroimaging studies. It doesn’t matter what technique we’re using, the brain is the brain. The brain still has the human dynamic response function, and we still have to manage it. So the way we manage that human dynamic response function, needing on and off blocks, is that we established what we call a smart glass. And so we have a glass window in between two participants sitting across the table from each other with a window in front of them.

It’s like sort of being at the bank. Can you have the heller with the window in front of you? Okay. So this window is smart because we run a current through it, and it becomes black. And we take the current away, and it becomes open like a window. So, voila, we have the perfect control of our experimental paradigm. We have the two people sitting across from each other, and they can see each other for 15 seconds, and then they don’t. But they can see each other 15 seconds, and they don’t. And so we have the best of all possible worlds that we have essential control of our timing, of our task. And with that time series, then we can operate the task and the computational analysis. It goes with it. Okay. That was for the in-person condition.

Jim: And then also, let’s talk about some other things about the in-person condition that I thought you guys were clever at. This is that you tightly controlled the distance that people were apart, for instance. And from that, you were able to calculate the angular interception of the size of their face, for instance, which turned out to be getting rid of another potentially confounding variable there.

Joy: Yeah, indeed. Indeed. It’s a simple calculation to calculate the degrees of visual angle that are subtended on the back of the retina of the eye by the face in front of you. Those are well-known, quite easy calculations that are based on the size of the face and how far you are. We calibrated our distances between the real face condition and the video face condition so that the face was exactly the same size. That’s just being a good scientist. It’s just calibrating the stimulus conditions. And that was done very, very carefully, actually, so that at the end of the day, there was no question that our results were not due to some artifact like the faces being a different size.

Jim: Okay. Now, then the second, then describe the zoom and then describe the sequence that an individual went through when they were doing each of the sessions.

Joy: The zoom condition, we actually put a zoom screen in the middle of the table. So instead of the window, there was a monitor. And so we had cameras on each person so that the camera could project the person across from you onto the proper screen. We set it up as a zoom situation with a monitor in front of you.

Jim: Let me ask a clarifying question. It wasn’t entirely clear. Were the people still sitting the same physical distance apart?

Joy: Yes, indeed they were.

Jim: And I’ll tell you why I thought that might be important is for pheromones or smell or something like that. So you eliminated that variable also.

Joy: We did and we did that quite purposefully because one of the things we thought actually that could be a confound here was the embodiment that is the person wasn’t in the room and that was what the results might be due to. But the person was in the room and they was in the room exactly the same way.

Jim: Got it. Okay, good. That’s one to clarify that point.

Joy: And I’m glad you did. Okay. So now, now we have the two experimental setup. Now I want to go on. Your next question was the timing, right?

Jim: And then what the tasks were? I don’t think it even said in the paper what the tasks were, but if it did, I missed it. I’ll confess the poor reading, but take through what a person experienced in each one of these sessions.

Joy: You know what? The task was so simple that, of course, it was explained in the paper, but you missed it because it was so simple. And probably, as the lead author on this, I should have made it clearer. It is the simplest task in the world. We ask people just simply to look at the face in front of them. No talking.

Jim: Oh, you did say that.

Joy: I did say that.

Jim: You did say that, but it wasn’t clear to me that that was the task. I somehow thought there was count the number of freckles on their nose or something.

Joy: No, we did not do that because we wanted to keep the task as ecologically valid as possible. You don’t count the freckles on the nose of the person, at least consciously, sitting in front of you or whatever, just looking at the face. I think if we were going to do this again, knowing what we know now, one always looks back at an experiment and says, I could have done it better if. And this is one of those moments. I could have done it better if we had a little bit better measure of behavior. That is, I would have liked to perhaps have people rate how connected they felt with their partner. We’ve done some experiments subsequently where we have used a rating, the kind of rating that I wished I would have used here. We did not do that. What I had for behavior was the eye tracking. And so we were measuring pupil size, which is a measure of arousal, visual sensing variables, like how long the fixations were, how many there were, and so on. So we had a physiological measure of task performance, but I didn’t have a rating.

Jim: Got it. Now let’s take us through the task periods and the rest period just quickly so people get a sense of how long it lasted, etc.

Joy: So each experimental run was three minutes, and we did four of them. So there were two runs in person and two runs on Zoom. So they’re a total of 12 minutes. There are three minute long runs and four of them. This is quite interesting because remember, I said that we set up blocks. We have 15 seconds of active blocks, 15 seconds of rest blocks, active blocks, rest blocks, and so on. But you can’t stare at somebody for 15 seconds. It just doesn’t work. I was using mostly Yale undergraduates as my participants. They end up laughing at each other. It’s very uncomfortable. So what we devised was an intermittent presentation so that we have the window open for three seconds, and you can look at somebody for three seconds, and then it closes for three seconds. It opens for three seconds, closes for three seconds, opens for three seconds, and closes for three seconds. That takes us 18 seconds totally, and then the rest of the 15-second period was rest.

The task period of just looking at the face, looking at the live face or the Zoom face in front of you was performed with these intermittent three seconds on, off, on, off, on, off. So that was the paradigm. And it’s a paradigm that we’ve developed previously. This wasn’t the first time we’ve used it. It’s very successful paradigm for this kind of interpersonal interaction study.

Jim: Go through a little bit how that task and the rest periods interact with the hemodynamic signal. You expected to build during the 18 and then subside during the 15 or link those two together for us.

Joy: Okay, okay. That’s very perceptive of you, Jim. Good thinking. Remember I said that that signal was a slow, dumb signal, and sometimes it doesn’t even get all the way to the top until about eight seconds, which is why we have a 15-second block of activity. So what we did for the analysis is that we didn’t separate out the three seconds on and three seconds off. That’s silly. We’ve got a signal that’s not even acquired in three seconds. So we lumped it all together. So we have a 15-second block of action, but it’s acquired in three-second little epics. Three seconds on, off, on, off, on, off. But computationally, we treated the whole thing as one signal.

Jim: Though of course you had the sub-signal so you could verify indeed that that was a valid thing to do.

Joy: Indeed, and we did. And one can see the evidence of the three signals. It goes bump, bump, bump, but it’s only a partial thing. That’s exactly right.

Jim: And then the 15-second, that’s enough time for the signals to subside?

Joy: Yes, it is. It is. And that’s very important. 15 seconds is about the minimum. And so many times my students and people have jumped up and down and said, why do we have to wait 15 seconds? Let’s give in. And then we do a 10. And the signal isn’t nearly as good as it would be if we did it for 15. And we always go back to 15. So finally, after all this time, we’re rigid about that rule. 15 seconds of baseline.

Jim: Yeah, that’s the importance of being a real practitioner. You know these things. This is this implicit knowledge that working in a field for many years you pick up. I just say they want to argue about it. We go to 10. You go, I’ve done this a million times, guys. You don’t want to go to 10, right?

Joy: If you go to 10, you compromise the signal. It’s not that you get it wrong. You don’t get it right as well.

Jim: All right. Now next thing before we move on to the data, how you collected the data, what was the end? How many people? I think it was 28.

Joy: That’s what my memory says, 20. And that’s about what we always do. We determine how larger samples have to be by what’s called a power analysis, which is an over-fancy term that just is a calculation of how many samples we need to acquire to detect a change in the condition if we know what the effect size should be.

Jim: Okay, that’s about right. Now one last question before we get into the data. I believe you said it, but the wording was sufficiently odd. It may have been a term of art that wasn’t clear to me, but I presume you randomized whether you did face to face first or zoom first?

Joy: Absolutely. We randomized across our participants so that our effects were not due to an order effect.

Jim: That just seems so obvious. I assume you guys of your quality would do that.

Joy: I’m sure we said that somewhere in the paper.

Jim: Let’s now move on to the data. And as you said in the paper, basically you had four channels of data. You had your eye tracking, your pupil measurements, EEG, and F-nears.

Joy: Well, actually, there’s really only three data streams.

Jim: Okay.

Joy: Because the eye tracking gave us pupil size as well. So we had eye tracking F-nears and EEG. So there were three modes of data that we were tracking.

Jim: And why don’t you tell us why you chose those three and what you expected? What insights you think each of the three bring to the question?

Joy: Okay. Well, the first question, the obvious one, F-nears, we’re interested in the neural circuitry. So this is the one that’s the closest to fMRI. We’re neuroimagers. We want to know how the neural information is coded and encoded. So we use the neuroimaging technology. The eye tracking is because of our prior studies that have shown us so convincingly that the visual system is very involved in our social behaviors. And so the processing of a face, for example, recalls upon very specific algorithms of the eye tracking system. And we wanted to be able to capture that and also pupil size, which is a very good indicator of arousal. So the eye tracking gave us a view at a completely different system than the neural processing. So it was those two were to obtain as big a picture as we could of what was going on. Similarly, with the EEG, the electroencephalogram, electroencephalography is a measure of electrical activity in the brain. And let me just take a little sidebar here because we haven’t talked about that before.

Let me just tell the audience what that is. This is global, large scale, not specialized, not localized, but global, large scale electrical activity that can be detected in the brain during neural activity. And what we did to get those measurements is that remember the swim cap that has the little buttons for the what we call optodes. Optodes are the emitters of the detector for the neural and the red spectroscopy. Well, in between those optodes, remember they’re spaced at about three centimeters apart. We can actually put little holes in the cap. And in those little holes, we can put a full complex of electrodes, these little surface detectors that detect electrical changes in the brain. So we can simultaneously detect electrical changes, as well as what we say local field potential changes. That’s from the F-nears, as well as eye movements. So this was a full approach to imaging everything about the brain that we could image with the technology that we have at hand.

Jim: That’s good. So now we have the data. All right, now we go to the next step. You process the data in various ways. And what popped out from your analysis?

Joy: Okay, so things don’t just pop out from an analysis. Generally, we start with a hypothesis. What do we expect? And how do we go about asking our data to answer those questions for us? So let’s just start with the eye tracking, because that’s kind of the closest thing we have to behavior. And what we wanted to know, how the eye in its mission to gather information about the face, the face of the partner, how was it performing differently or was it performing differently? And what we discovered was that for the in person condition, that the eye contact, that is fixation durations, the amount of time that an eye would secude and stay in a local place was longer. Suggesting that the eye is gathering more information in each fixation in person than on zoom. That is an interpretation. I don’t know that that is exactly right. But the fact is that fixation durations were longer for whatever reason. And it could be that the eye is gathering more information in each fixation.

Jim: A couple questions for you there. What was the quantitative, what was the ratio of difference approximately?

Joy: Was it statistically significant at p 0 0 0 1?

Jim: So big significance.

Joy: This was huge.

Jim: Got it.

Joy: This was not minor. This was not a barely there. This was a biggy.

Jim: Okay. Good. So on to the next one.

Joy: On to the next one. And the other thing that was also a very, very large effect was that the pupil size was different than in the zoom face. Again, suggesting that the arousal level was different. There are other factors when a person is standing in front of you having to do with just attention and general attentiveness. So the impact of the person was greater based on the evidence that the pupil size was larger. I want to go next to the neural imaging data in the EEG because here in order to test the hypothesis that the neural systems were different or the same, what we do is we analyze the data for each condition separately and we ask what parts of the brain are involved in face processing.
We get our answer and it turns out that this answer was as expected. We’ve done face processing in this paradigm previously. We knew what to expect and we were assured of the validity of our data by the fact that what we knew had to be true than that we have observed before. We also observed again. So that was we first did our validity checks.

Then we did the comparison. We compare the results from one condition that is the real person condition relative to the results for the zoom condition. That difference in neural activity is the answer to the question what is different in terms of how the brain processes information in real face versus zoom face. What we observed is that that difference is one substantial and profound meeting the highest level of statistical significance and localized in the dorsal aspect of the brain, bilaterally dorsal. Now what does that mean? It means that it’s the part of face processing that goes on in the upper parts of the parietal lobe in this case. Those parts of the brain have been previously associated with attention mechanisms, eye movement mechanisms associated with attention. That is when there’s something important going on and the eyes switch to that thing that’s important, it’s the parietal lobe that is involved, is associated with that seeking of information that is salient. So this dorsal stream in the parietal lobe activity can be interpreted based on what we know about that area as eye movement related salient activity, meaning the real face was more important to the brain than the zoom face. That’s a very big result actually. Those patches of activity that were different between the in person and the zoom are very large.

Jim: And they were different regions that were firing up or were they different strengths within regions or both?

Joy: They were different regions. The answer to your question is a little fuzzy because it could be that yes it’s still there but very minimally active in the zoom and that’s probably the case but the difference was very great.

Jim: I remember some work I did years ago, the dorsal parietal is also the part of the brain that has to do with object recognition as I recall. Is that correct?

Joy: Yeah, we oftentimes think of it as that we have the what and the where. This ventral stream is more the what and the dorsal stream the where.

Jim: Okay, so it’s the sort of relationship amongst things. Makes sense. Okay, so next.

Joy: Yes, yes. And it’s also consistent with the interpretation that there’s more information about the live face coming in. There’s more stuff coming in. It’s being processed by more brain machinery.

Jim: I’m going to ask another experimental design question. You know, at a naive level you might say the resolution is higher face to face. How high was the resolution on your monitors? Was it super high for that reason?

Joy: No.

Jim: It was not. Okay.

Joy: No, no, it’s in the paper. I don’t recall the exact but these are normal monitors.

Jim: Okay, so it wasn’t like 4x monitors exactly.

Joy: We didn’t want to do anything special.

Jim: Oh, I see. Yeah, because you want to compare it to field use of zoom. Got it.

Joy: That’s the way of saying it. It’s sort of like the the normal use of zoom.

Jim: Got it. Okay, that makes perfect sense.

Joy: Yeah, so we didn’t do anything. And in fact, the results might down the line be a kind of a call to action that perhaps that with improvements in the technology, particularly with the eye to eye contact mechanism, or the dimensionality mechanism, even resolution might make a difference in the quality of the interaction, at least based on these measurements.

Jim: Okay, we’ve talked about the F-nears. Now the EEG, what did we find there?

Joy: EEG does not give us information in general. EEG strength is giving us information about temporal oscillations of large clusters of neurons and generally the source of these large clusters of neurons are deep in the brain and this neural activity that is propagated up through the brain and we measure it on the surface. The information is really based in the temporal aspects of the signal and what we do, we measure the temporal aspects of the signal in two ways. We can just take the whole complex signal and we can ask is there anything about this whole complex signal that validates the presence of a phase. Now remember the current model of phase processing in the brain based on fMRI studies is that it is a feature based system and it’s primarily a ventral stream.

It’s what? The what stream? The ventral stream and it’s based on features. Well as it turns out EEG has a hallmark signal that is based on phases. Like if you want to study phases with EEG and you want to know you got it right the first thing you need to do is show that what we call the event related potential that is the whole signal mushed together gives you the hallmark of phases and that is it’s called the N170. The N170 means that 170 milliseconds after the stimulus onset you see a profound drop in the signal. The N, the negative, the signal goes down at 170 milliseconds and that just means that your EEG system is recording the brain looking at a face. It’s a standard sort of calibration check.

Jim: It’s kind of like a fingerprint for a face right? It’s a signal.

Joy: That’s right. It’s a fingerprint for a face so

Jim: That could be even a cartoon face of a smiley face you know stickers or something?

Joy: Particularly particularly the cartoon faces because remember fMRI the only faces that have ever been studied either their video faces, their picture faces, their cartoon faces, their emoji faces, their faces that aren’t real. The N170. So of course we looked at the N170. Our expectation was that since all the features of both conditions there were the same faces, the same everything, we would expect the same N170 and we did. That is what we found. That the N170, the girl activity said okay guys the brain knows you got a face there and it doesn’t care whether it’s live or not. It’s a face, it’s a face, it’s a face and what we were very happy to see that because in a way it linked these studies in a profound way to prior studies. But when we took the signal apart like good scientists you know we never stopped. So we say okay we got a complex signal here.

We’ll decompose it and people that do EEG have these standard ways of taking the signals apart. So we sorted out into bands, frequency bands and so like the say the 4 to 8 hertz it’s called the theta band. All right and the 8 to 12 hertz is called the alpha band and so on and so on. So we took the theta band, the 4 to 8 hertz band because it has been implicated in face processing of particularly moving faces. And again if I could have done this study again we would have taken other bands but we took the theta band and what we found was that the theta band had profoundly more power in the real face condition than in the zoom condition. So again it was supportive evidence that the real face had a greater impact on the brain than the zoom face.

Jim: And theta is a relatively slow wave so it presumably has long range coordination attributes. That’s at least historically been the hypothesis.

Joy: That historically is the hypothesis. That’s exactly right and these data are consistent with that.

Jim: Yeah so we’d say that that would mean that whatever is happening is happening perhaps more globally when there’s strong theta. Is that a fair?

Joy: It’s a very fair statement consistent with long cherished data collections over lots of years.

Jim: Okay cool. All right so we have found one where the two conditions are the same which is good. So you say okay these are both faces. All right we haven’t somehow messed things up by doing something that we don’t recognize as a face but we have found differences everywhere in pupils, in saccades, eye tracking, in blood flow by region particularly with dorsal, parietal area and you have found a significant difference in theta power. Is that a good summary?

Joy: That’s exactly a good summary. There’s one other finding.

Jim: Okay.

Joy: And this is like one of my favorite findings of all. One of the hallmarks of live interpersonal interaction is what’s called neural synchrony. That is when the parts of the two brains actually become organized synchronously. So they start to work together. And there’s been a lot of thought about this new parameter of interpersonal interaction and some evidence one represents a sharing of information between the two brains. And that’s the hypothesis that Uri Hasson at Princeton has promoted actually. And there’s quite a bit of evidence for that. There’s been some other evidence that suggests that in learning about content that is shared between two individuals is associated with increased coherence between the brain areas.

So it is an emerging observation with an emerging theoretical framework not totally well understood but of great interest to people who want to study interpersonal interaction. So of course we compared neural synchrony between the in-person condition and the zoom condition. We compared this with the regions in the dorsal parietal area because those are the regions that were uniquely observed during face processing in-person relative to the zoom. And indeed the neural synchrony during the active time period of the human dynamic response function just what you would expect was much greater in the live condition than in the virtual face condition. And when we did the control experiment which is computationally scrambled the partners. So it’s the same task, same timing, same everything blah blah blah except not the same micro interactions that go on between actual partners then of course the synchrony all went away.

Jim: Ah that’s great that’s a very wonderful test to rule out.

Joy: All the other possible explanations yeah.

Jim: Could you maybe explain a little bit more what neural synchrony is in this case?

Joy: Okay the neural signals the brain responses of two people actually become synchronized. That one one brain activity in one area goes up the other goes up but they’re completely coordinated. They go up and down in exactly the same way. And that’s a very interesting phenomenon thought to be associated with some of the properties of people linking together during a live interaction. Let me tell you exactly what it is because the linking together is something we don’t quite understand and I feel a little uncomfortable about explaining it that way but computationally that’s what it is. What it is is that we take the brain signals of each person and we decompose it. We decompose them like we do a Fourier analysis but in this case we do we use wavelets.

So it’s like we take the signal we break it apart it has big pieces and little pieces and we take each of those wavelets from the complex signal from each person from each area of the brain and then we ask how are they correlated across the two people. So we take the high frequency mid frequencies low frequencies we can do this a very high resolution because we have very fast big computer. And so the y-axis on the graph is the correlation between the two partners the two brains and the x-axis are the wavelets so that the frequencies so each wavelength each frequency gets correlated across the two individuals for each area of the brain. So it’s a big computation but it reveals something very special about what goes on in two brains when they are performing like a dyad as opposed to single individual brains. So it’s a hallmark of dyadic behavior.

Jim: Then let’s bring it down to the actual experiment here in the zoom case versus the face-to-face case how much did that neural synchrony vary?

Joy: Well it varied a lot and that’s the thing I think is very significant the in-person condition was associated with a very high level of neural synchrony as opposed to in relation to the virtual. It’s simply the way we show it in the graph is that it’s not there it doesn’t mean that it’s not there really but relatively speaking it was very minimal relative to what’s observed in the life and this is the neural synchrony under live face-to-face it’s something we have observed previously in other studies we expected it we know how to get it it’s vetted it’s a well-established technology and when we did not see it for zoom condition we thought that this was a significant hallmark of the difference between the two exactly what that means in terms of the perception I don’t know.

Jim: That’s my next point we’ve been playing real scientists up till now right carefully looking at an experiment you know the historical theories etc beautifully designed instrumentation data collection analysis and now we have a full set of results been presented then I ask you now to speculate what does this all mean I mean this is something we’re all interested in now for instance in one of my other roles I’m a mentor to CEOs of companies and and not-for-profits all the way from tiny startups up to quite large companies and everybody’s wrestling with the question of remote work for instance right it’s a huge question.

Joy: Right I like to think of myself as a pretty hardcore scientist and very aware when an interpretation goes beyond the data and so here the question that you ask is the really important question and yet it goes beyond the data it’s such a a perfect example of so many times in science we answer questions that aren’t really answers to the questions that you and the real world have and this is one of those so I can tell you as I have the brain is different in person than on zoom and it’s different specifically within coding faces and it’s not a small difference it is a big difference what does that mean in terms of how it feels the truth is that that’s kind of an example of the hard problem we don’t know how neural activity actually influences how we feel how we react I could just tell you that it’s different.

Jim: Okay. We were about to transition from the good detailed examination of a scientific experiment to some scientifically informed speculation on what it all might mean with respect to those of us who have to spend a lot of our time on Zoom and those of us who are designing companies and not-for-profits and scientific collaborations that use things like Zoom. So, Joy, I’d love to get your informed speculation on what your results might indicate.

Joy: Okay. First of all, a full disclosure here. I don’t think that my speculations are any better than anybody else’s speculations here. I don’t have more information than what I have provided here. And so everybody’s thoughts are as good as my thoughts, and these are just thoughts. So, first of all, these results are profound, meaning that this is not subtle. These results tell us something that our grandmother would not have told us, and that is this feeling that Zoom is different than real person is very real. It’s very real in terms of how the brain codes information. Now, what does that mean in terms of our real life?

Philosophers, the philosopher point of view, and it’s based on those that have developed the interactive brain hypothesis, that if the brain is functioning differently, its cognitions and perceptions are also different. That is an assumption. It’s a reasonable assumption. I don’t know of any data that supports it, but it’s reasonable to assume if the brain is functioning differently, its perceptions are also different. If that is true, then it means we need to take this seriously.

And if we take this seriously, how? One of the things that the data suggests is that there’s less information coming to the brain about faces. That’s based on the fixation length. There’s less information coming in to the brain. And we also know that this whole dorsal aspect of the brain is more active in the real face condition. Does that mean that the whole visual sensing system is more productive, that there’s more information that contributes to associations, that there’s a richer appreciation of the face in real person?

It could mean that. If it means that, then we might want to think about limitations of Zoom. Would we limit our thinking about telemedicine, for example? Would we consider that there might be limits in how information between a caregiver and a patient might be transferred? Would we think about it in terms of maybe an illegal arena? Is face processing different? Could you tell if someone is not telling the truth as well as you might tell in real person? That’s a question. I don’t have an answer, but I think my results suggest that that question should be addressed.

Jim: That makes a lot of sense, and I would say that’s a very modest way to put it, which follows naturally from your research without necessarily overstating.

Joy: I feel very strongly about not overstating the research. It says we have information now that tells us that we have a call to action. The call to action is we need to know more, and we need to know more in a hurry, I think, because Zoom is so prevalent, it’s here to stay. We need it, we use it. Look at you and I, we wouldn’t be talking because you want to be an Appalachian. I’m here in the middle of highly in academics. You’re never coming to visit me.

Jim: At least not very often, right? So yeah, it’s a huge, for me, it’s been wonderful. I can deal with the world and not have to deal with masses of people and noise and traffic, right?

Joy: Well, exactly. And airplanes and trains and all of that is scheduled time changes and so on. So Zoom is an amazing gift. It seems like a silver lining from the pandemic.

Jim: My wife and I talk about this fairly regularly. Imagine how nasty the pandemic would have been without Zoom or Skype. You know, just if it happened 20 years earlier, we wouldn’t have had that.

Joy: Oh my goodness. Absolutely. Absolutely.

Jim: We were lucky in some sense.

Joy: And we didn’t really have it. I mean, maybe techies like you had it, but when we here in my lab anticipated that we would be shut down, we didn’t have our communications of this kind set up and we practiced them. When we were still in the lab, we practiced being able to communicate with Zoom so that when we weren’t together, we knew we could still communicate and carry on. We weren’t as advanced as you.

Jim: Yeah, I’ve been doing this stuff since the 90s, early 90s actually.

Joy: Well, and thank heavens. You know, and I think it must be because of people like you and the development of this technology that allowed it to become mainstream so rapidly. It seemed like almost overnight. The whole world was on Zoom.

Jim: Yeah, fortunately Zoom and Skype had been around for, you know, 20 years and had, well at least Skype was and then the technology’s underneath it. They were pretty rock solid and fortunately the companies executed well and were able to scale, which was quite amazing. You know, Zoom went up 50X or something and so they give them credit for having dealt with that under the gun.

Joy: I give them a lot of credit.

Jim: Okay, a couple more questions for you about possible implications. First, I’m going to stipulate that even though your research doesn’t say that the signals and the data from in person is better, quote unquote, than from Zoom, I’m going to impute it anyway. Because, you know, we have enough personal subjective experience to make a plausible hypothesis that the in person is better than the Zoom. Right. In fact, in my own work, I talk about weak links and strong links. Often weak links being online things, Twitter, email, etc. Strong links being in person. I always say that at least to me, and this is totally qualitative, Zoom feels like it’s in the middle. It’s neither a weak link or a strong link, but it’s not as strong as a strong link. So let’s make the assumption that the signals you have found for the in person is better, quote unquote. What are some things we might do to get better and yet still keep the benefits of no travel and less CO2 and better use of our time and all that stuff. And so one that I alluded to earlier is these are things you might test hyper high resolutions of possibility. What do you think?

Joy: I wouldn’t make that my first priority for improvement. I think first of all, your question is the right question. What could we do to improve the media so that it’s more like how the brain likes to see faces, for example? And remember, I’m just talking about faces. I’m not talking about the human language here, which is a whole different thing as well. But what could we do to accommodate the brain sidebar here? I like to think of the face and the brain as fitting together like a lock and key that they belong to each other, that the human brain has evolved to accommodate the human face. And the beautiful articulation we have the muscles in our face that are so expressive have evolved in order to tweak the brain in a very specific way that builds on our sociality between us. It builds our ability to connect to each other. So in terms of developing technology to accommodate that, I think we need to focus on ways of providing eye to eye contact.

You could stick with a lower resolution as far as I’m concerned. You know, people with glasses, people who don’t see so well, they still connect perfectly well. And I think it has to do with the ability to, it’s even subliminal. You’re not even aware that you’re looking at people’s eyes and reacting to it. But that has been shown experimentally that eye to eye contact is very salient. So I would encourage the technology to develop ways in which the camera angles can be straight on. That would be my top priority. My second priority would be one that perhaps, I don’t know if this is a good idea, perhaps improve the dimensionality. That is that instead of the two dimensional screen, if we were wearing something like the 3D glasses, like go to a movie or something, that there could be more of a three dimensionality to the experience. That might also improve the sense of connectedness. Think what we want to do is develop technology that interfaces better with the human brain.

The human brain has not been considered in the development of this technology. I like to think of it as sort of neuromorphic thinking. Say, okay, what does the brain need?

Jim: This is a brain appliance in some sense. So how do we make it brain friendly?

Joy: That’s right. What does the brain need in order to promote connections, emotional connections between individuals? And when you have this idea of strong connection, weak connection, you’re talking about emotion, right?

Jim: I’m not really sure what it is, but it’s certainly noticeable and it’s efficacious in different ways as different affordances. You have a much higher level of trust, for instance, and people which you have positive strong links, which you can have negative strong links too, which means you trust them even less than some random bloviator on Twitter because you’ve met them face to face and they are bad, right?

Joy: Yeah. You know, there’s something else we’re talking about sort of very subjective experiences on Facebook. I find that a in-person encounter is one that I remember and value more than on Zoom. I don’t code it as quite as significant an interaction as I would if it was in person. And I think that’s probably the lack of eye-to-eye contact. You know, there’s something so amazing about the eye-to-eye contact. It’s a call to action. It’s a connection that’s very significant to us as social beings.

Jim: That makes a lot of sense. I really like that. I’m going to ask a final, this kind of goes in the opposite direction. That is huge amounts of money are being spent on VR and AR, right? Where we have avatars rather than our actual streaming selves. What do you think the issues are going to be if we try to do things like having meetings in virtual reality? I actually did one podcast in virtual reality. It was kind of crazy, but I would love to get your thoughts on how VR might touch on these issues.

Joy: In general, my position about these new technologies, AI, chatbot, VR is go for it. Just go for it. Develop it, make it fail, make it live, do as much as you can. If we can figure something out new, go for it. Do it. It won’t be perfect. There’ll be a lot of things that we have to work out. But if there’s an advantage to our technology, if we are smart enough to do the technology, we’re smart enough to figure out how to use it. So that’s probably not the answer that you expected.

Jim: Well, actually, I was expecting maybe that, but also the people developing the VR for things like group meetings ought to hire you to do that work to make sure that measure what the brain result is to make the VR brain fit as good as possible. I don’t know what the techniques will be.

Joy: You know, I wouldn’t have said hire me to help you do that. But I do cosine the wisdom in the notion of including a good neuroscientist who knows something about this to help design the technology. Because I think the technology has gone about as far as it can go without being constrained by the needs of the human brain. So it’s time to enter the human brain into the equation.

Jim: That’s a wonderful place and nice tagline for the whole episode. Joy Hirsch from Yale University. This was a really fantastic conversation.

Joy: Thank you, Jim. I enjoyed it too. This was a lot of fun.

Jim: It really was.