Transcript of Episode 12 – Brian Nosek: Open Science and Reproducibility

The following is a rough transcript which has not been revised by The Jim Rutt Show or by Brian Nosek. Please check with us before using any quotations from this transcript. Thank you.

Jim Rutt: Howdy! This is Jim Rutt, and this is The Jim Rutt Show.

Jim Rutt: We’re on location today in Charlottesville, Virginia, the offices of the Center for Open Science. Our guest today is Brian Nosek, who is the executive director of the Center for Open Science. Brian is also a professor of psychology at the University of Virginia. He’s perhaps well-known in the wider world as the leader of the Reproducibility Project for psychology, in which a team of researchers attempted to duplicate 100 classic psychology experiments.

Brian Nosek: Yes, well, thanks for coming to the office.

Jim Rutt: It’s great to be here for a second or third time. I don’t quite remember. Why don’t we start off with a description of the Center for Open Science? What do you do, and why is it important?

Brian Nosek: Yeah. Well, the Center for Open Science is a non-profit technology and culture change organization. Our mission is to increase openness, integrity, and reproducibility of research. The problem that we’re trying to solve is that there are lots of barriers to the pace of discovery. Some of those barriers are because of the culture, and the systems of how science operates. A central challenge, as we perceive it, is that the values for science are not incentivized in the daily practice of science.

Brian Nosek: That can be summarized, perhaps, with a single statement which is, as an academic researcher my rewards for success are focused on me getting my research published, not on me getting it right. The system, as it decides how to give me a job, advance me in my career, give me tenure, all of those things that I need to succeed in academic life, are about getting exciting and wonderful publications, rather than promoting rigor, reproducibility, and actual evidence that can be built upon. That’s sort of at the base problem for what we do.

Jim Rutt: Yeah, I see that a lot. I’ve been involved in science for the last 18 years, and governance of science, et cetera. We all know that most scientists are working in good faith trying to do their best, right? But as you point out, if the incentives are wrong, it forces you, basically, if you want to be competitive, to take it right up to the line, right? All right, “What’s your P score?” “Well, on a good day with the wind it’s okay.” Right? Nobody gets promoted for falsifying a hypothesis, right? Or very seldom, right? And so the incentives are fairly perverse to do really solid work.

Brian Nosek: Right, and they can sort of manifest themselves in insidious ways, right? I don’t think there are many, if any, researchers that get into science saying, “I’m just going to make up stuff.” We’re all, we’ll say generously, we’re all motivated by the search for truth. With that, nevertheless, is a need to have a job, to have a career, to have stability, so when I’m confronted with lots of decisions to make about my work that potentially have implications for the reward, then I might go down the dark side and intentionally do bad things. More likely though, I might be implicitly influenced by those, right? Motivated reasoning is pretty powerful for helping me to rationalize decisions that would benefit my career as the right decisions to make.

Jim Rutt: And before long, if you’re a senior researcher, it’s not just your career. Now it’s your lab.

Brian Nosek: Oh yeah.

Jim Rutt: It’s your graduate students. It’s your researchers, right?

Brian Nosek: And my responsibility to them. And my legacy, right? This is what I have found. These are my claims.

Jim Rutt: What are you guys doing to help push people back onto the bright side, away from the dark side?

Brian Nosek: Well, the great advantage that we have is that it’s almost consensual, the core values of science, right? People agree that science should be transparent. People agree that science should be reproducible. The barriers that we have, to try and change the culture to embrace more behaviors for openness and reproducibility, are practical. They’re cultural. They’re people saying, “Yeah, yeah. Those are great ideals. That’s just not how we succeed.” The problem we’re trying to solve is how do we change the policies, the incentives, and the norms in science so that people can adopt behaviors that will increase transparency, openness, and reproducibility.

Jim Rutt: Why don’t you go down all three of those for me? I think all three of them would be interesting.

Brian Nosek: Sure. I should actually start at the bottom. We frame this as a pyramid. At the base of the pyramid, in order to advance behavior change, culture change, you have to have infrastructure to do it. You have a way to do the behavior. Most of our organization’s actually building technology in the Open Science Framework, so that people can do the behaviors of being more transparent, share their data, share their materials, open up their research, and can register their designs, right? To make those pre-commitments to what they’re trying to study, how they’re going to analyze the data, what they’re going record versus not. There’s that base infrastructure of just making it possible to do the behaviors.

Brian Nosek: The next layer of that, which is still in the technology world, is how do we integrate that with you, the researchers’ workflow? You’re busy. You’ve got plenty of stuff to do. You’ve got lots of distractions. Saying “Oh, here’s this new set of behaviors that we want you to do to make your research more rigorous,” is a great way to make sure that those behaviors don’t get done, because there’s too much. All those lab members, all these people, personnel, blah, blah, blah, blah, blah. The next layer is make it easy, right? Have that technology integrate into how people do their research now. Those are the base of the pyramid for behavior change.

Brian Nosek: Now, norms, incentives, policies. At the norms is I need to see that other people in my field are doing these behaviors, are sharing their data, are preregistering their research. If I don’t see anyone else doing that, why would I do it, right? We take information about how we’re supposed to do our work from what others are doing. One of our initiatives is a very simple intervention to try to elevate these behaviors to be visible, so that others can see that open behaviors are occurring in science to promote their adoption. This simple intervention is to give these badges to journals.

Brian Nosek: There’s a badge for open data, a badge for open materials, and a badge for preregistration. If a journal decides to adopt those badges, then when you, as the author, have your article accepted at the journal, we’re going to publish it. The journal says, “And you could earn these badges. If you want to share the data behind your paper, then put it in a repository and then we’ll put this badge on your paper and put a link to the data.” Now, badges, that’s silly, but it is a very small, very easy thing to do for some. There are idealists that want to share their data. The key of this as an intervention is the visibility, right?

Brian Nosek: Now me, as the reader, I see your article with open data in the journal that I publish in, that I read. I say, “Oh, someone shared their data.” As more people do it, more people see, “Oh, more people are sharing their data.” As norms go, this is a very easy intervention because the behavior is already valued. We don’t have to convince people that seat belts is a good idea or whatever other normative change you’re trying to pursue. Instead, it’s there are people already doing the thing that you value, that you just don’t do because you don’t see anybody else doing it.

Jim Rutt: I like that, a lot. Because essentially you’re nudging people with things like your badges to move in the direction they know is the right direction already.

Brian Nosek: Exactly. That’s right. That is a key part. Most social interventions that try to leverage norms are trying to get people to do things that they don’t want to do, right? “You shouldn’t drink so much. Let’s try to show you normatively that other people aren’t drinking as much as you.”

Jim Rutt: But drinking is fun! [crosstalk 00:08:17]

Brian Nosek: But here we have a perfect situation, which is people agree. That’s a good thing. The next one is incentives, right? The badge has a little bit of incentive. I get this little credit. The next layer up from the badges, as norm changers, is incentives. Incentives, we can think of in terms of what is it that I’m rewarded for now. Okay. Publication is a key. Grants are a key. Getting a job is key. Can we change how it is people are rewarded for those things that they’re already rewarded for? We’re not going to change the publication system overnight, but what we can do is change how people are evaluated for what they get published.

Brian Nosek: The best example of an incentive change that we promote now is called Registered Reports. The idea of Registered Reports is a slight change, but a fundamental one to how peer review happens at a journal to decide whether you get published. The system now, I do all my research. I decide what of it is interesting enough to try to publish. I put that into a paper. Then I submit the paper with all my findings for peer review. Then, the journal decides to publish or not. With Registered Reports, we, instead, have the authors submit for publication before they’ve completed their research.

Brian Nosek: What you propose to the journal is, “Here’s the research question I have. Here’s the reason why it’s important. Here’s some background about it. Here’s some preliminary studies we did just to get a handle on it. Here’s the methodology that I’m going to do in order to ask that question as rigorously as I can.” No one knows what the results are, but that’s when peer review happens. If the editor and the reviewers agree, this is important and this is a great methodology to test that question, then me, as an author, gets in principle acceptance, at that point. I’ve got the journal article as long as I follow through with all the steps.

Jim Rutt: Whether you confirm your hypothesis or not.

Brian Nosek: Exactly.

Jim Rutt: That’s the key.

Brian Nosek: This absolutely the key because when it is in the regular system, everything is about the results, right? I need exciting results, novel results, positive results, a neat and tidy story, put it all together. That’s what gets reviewed. Is this interesting? Is it not interesting? Did you find the new solution to the problem? With Registered Reports, it’s all about the methodology. You need to ask the most important questions you can and test them as rigorously as you can. In the traditional system, 90 percent of published articles are positive results.

Jim Rutt: At least. In some fields it’s higher than that.

Brian Nosek: Yeah. It’s stunning. Either we already know all the answers and so why do we need to bother publishing it anyway or there’s severe publication bias.

Jim Rutt: Obviously gigantic publication bias in every aspect of the system.

Brian Nosek: Every aspect, right? That’s because we think negative results are less interesting. Positive results are more desirable. It is more interesting to find that this actually solves that problem or this intervention works rather than, “Nope, that doesn’t either.”

Jim Rutt: There’s another one that doesn’t work.

Brian Nosek: Yeah, right? [crosstalk 00:11:25]

Jim Rutt: Truthfully as a pragmatist, a businessman, most of my expertise comes from things I tried that didn’t work.

Brian Nosek: That’s right. You showing that it didn’t work would have a big impact on me who is in a similar business trying to figure out a similar problem and need to get to similar answers. If I knew that you already tried that and didn’t get it. That would be helpful.

Jim Rutt: Let me trail in on this one a little bit because there’s been some talk about preregistration for a long time. It never much happened. To what degree have you been able to get journals to buy into this? To the degree they bought into, do they actually follow through and publish, let’s say, the worst case, which is the result is murky. It’s just a little [inaudible 00:12:08] below the threshold that you would normally, .05 or whatever that you choose as your factor. It doesn’t exactly say no, but it doesn’t exactly say yes. Can you talk to me a little bit about what journals you’ve gotten to participate in this and do they really follow through on their commitments to publish murky results?

Brian Nosek: Great questions. Adoption is really good. We have more than 200 journals now offer Registered Reports as an option for publishing. There’s one journal that only does Registered Reports.

Jim Rutt: And which one’s that?

Brian Nosek: It’s called Comprehensive Results in Social Psychology. They launched the journal in order to do it this way, saying this is the future. That’s great. It’s very cool. All the other journals, it’s a complement to the regular system and I think that’s totally reasonable. There are a lot now, of journals. It started mostly in behavioral sciences where we have our origins. There’s a fair number of neuroscience journals, but it is now being adopted in more life sciences and extending beyond that. It’s expanding to see how well this model works across [inaudible 00:13:12].

Brian Nosek: We have done an initial analysis of, and other groups are doing similar, of the outcomes, right? As far as we know, no journal has yet declined to publish the outcomes because of the murkiness of the findings. There have been cases where it didn’t get accepted ultimately, but those were because the researches didn’t follow through with the plan.

Jim Rutt: That’s fair enough.

Brian Nosek: Right. You said you were going to collect a thousand people-

Jim Rutt: And you got 50.

Brian Nosek: Yeah, right.

Jim Rutt: Sorry.

Brian Nosek: That isn’t what you said you were going to do. That is an absolutely reasonable and appropriate basis for declining. Another appropriate reason for declining ultimately, which is I told you that I was the intervention that we have could, say, raise people’s self-esteem. The question was: When we raise people’s self-esteem, does it have this impact on what they do next something? But it turns out that I wasn’t able to increase self-esteem. The manipulation didn’t work, right? You could have a reason that people would say, “No, we’re just not going to accept that.”

Jim Rutt: A significant number of papers have been published under this form?

Brian Nosek: Yes, more than 100. It might be more than 200 now. There’s many, many more in the pipeline. The rate of publication is increasing fast. We have two outcomes that we’ve observed so far with this. The first is that 60 percent of the published results, the primary outcomes, are negative results in this model.

Jim Rutt: Aha.

Brian Nosek: Right?

Jim Rutt: No surprise.

Brian Nosek: No surprise.

Jim Rutt: However, less than 10 percent of negative results in traditional journal publishing.

Brian Nosek: Yeah. Immediately we can see that it is addressing publication bias. A lot of the stuff we do doesn’t turn out the way we planned. That’s the reality. That’s a positive effect, I would say, in the immediate. Simultaneously, a lot of journal editors would say, “That’s the reason I’m not going to adopt Registered Reports for my journal, because I’m going to have all of this junk of negative results, that no one cares about, no one will ever cite in my journal. I’ll be the one that ruined my journal’s impact factor.”

Brian Nosek: That second finding that we have is that we’ve looked at how often are papers cited and Altmetrics, right? These are true. We could agree or disagree on principle that you should make decisions based on how flashy findings are, but it’s a reasonable question asked. Does it make a difference? In the initial evidence, right, these are still early days given the number of publications, but what we have so far suggests that Registered Reports are not less, if anything they’re slightly more cited than comparable articles in the same journals, published at the same time.

Jim Rutt: That would be huge if you get a big enough [inaudible 00:15:52] and long enough time depth.

Brian Nosek: Yeah, exactly.

Jim Rutt: That would be huge, because it would then answer the self-serving objection of the flashier journals who try to live on high impact factors.

Brian Nosek: Exactly, right. We don’t know why, right? Why is it that these are being cited as much or more at this rate?

Jim Rutt: Could be some selection bias, could be better scientists are adopting your mechanism.

Brian Nosek: Yeah, could be better … yeah that is a possibility, certainly a possibility. Another option that at least is my speculative favorite is that the science itself ends up being better, because of peer review in advance, right? The actual mechanism itself.

Jim Rutt: That’s very … if that turns out to be true, then you will have reformed science in a fairly major way. [crosstalk 00:16:39] Let’s get the thoughtful analysis about what you’re doing before you do it.

Brian Nosek: Before you do it, right. The logic is easy. Afterwards, what do the reviewers do with my paper?

Jim Rutt: Nitpick.

Brian Nosek: They tell me everything I did wrong. You should have done it this way. You should have done it this way.

Jim Rutt: Too late now.

Brian Nosek: Exactly. All I do is feel bad. Those jerk reviewers point out what’s terrible about my research. With Registered Reports, what happens when the reviewers say, “This is what’s wrong with your design?”

Jim Rutt: Sometimes you’ll go, “Yeah, you’re right,” right?

Brian Nosek: And you fix it.

Jim Rutt: Then maybe half the time you say, “That guy’s full of shit. I’m going to ignore him,” but half the time he was right. That’s huge.

Brian Nosek: I can actually take that into account, change my design, and have a better design.

Jim Rutt: Had not thought about that, but that … if that turns out to be a significant factor what’s going on here, this thing could really have a huge impact in our practice of science.

Brian Nosek: Yeah, that to me is the most exciting, still speculative, but most exciting part of this. The other piece of this that’s incentives aligned still, that to me is exciting, is most people hear about this process and they say, “That sounds like applying for a grant.” Exactly, right? It sounds just like applying for a grant. In fact, it sounds so much just like applying for a grant that why don’t we make that a single process? We have done some matchmaking between journals and funders to test that out.

Jim Rutt: Wow.

Brian Nosek: There’s a few pilots where a single review process, if you get … you submit your proposals, what I’m planning to do, if it gets accepted you get the money and you get the in principle acceptance of the journal. Everybody likes this, right? The author has to, instead of applying for all that grant money over there and then applying to get the results published over here, goes one place.

Jim Rutt: And the grant maker has a guaranteed publication.

Brian Nosek: Right.

Jim Rutt: Which, again, I occasionally do grant making in the sciences. One of the things you analyze is this going to be a publishable result?

Brian Nosek: And how much of the time is the work that I fund going to be reported at all?

Jim Rutt: Right.

Brian Nosek: Right? So many funders report deep frustration with, we give out a hundred grants and we end up with half of them producing any science that others can read, right? What a waste, right? No one gains from that.

Jim Rutt: I like this. This is really interesting systematics that you’re discovering here as you get into this.

Brian Nosek: Right. There’s just boundless opportunity with, really, what is functionally a very simple change, right? That’s what’s great about it is we don’t have to reform the whole system. There’s a lot good in the system. We just need to pay attention to where those incentives are enacted and make just enough change so that it aligns the values that we have in getting things like preregistration into the journal workflow with the daily behaviors.

Jim Rutt: Are you finding … who are you finding adopting this? You mentioned people in the behavioral sciences, your home field, is it early career people? Is it later career people? Some of both? I mean, interesting I’ve run businesses that had some aspects surprisingly parallel, in fact, two of them were computer chip design software companies. I shouldn’t say I ran. I was the chairman of one. I was on the board and an early investor in the other.

Brian Nosek: Yeah.

Jim Rutt: We basically solved the hard problem of solving one aspect of chip design, but that was useless until we integrated it into their workflows. [crosstalk 00:19:58] took us an extra year. Then second, who adopted it was really important. We get Qualcomm or Intel to adopt it and other people will follow, but getting those first, we called lead bull customers, was another year or two. Talk to me a little bit about who’s adopting it and have you got any real lead bulls that are famous people using this.

Brian Nosek: You are describing the exact scenarios, right, that we confront. We actually have, as our description of culture change is we have the classic diffusion of innovation curve, right? The early adopters through the laggards. We talk about the strategies at each phase in this. For us, we have a two-prong strategy on this. Go for big dogs. With Registered Reports just as one example, Nature of Human Behaviour, eLife, and PLOS Biology, three very high profile, well-regarded journals are adopters of Registered Reports. Their adoption has been very useful for legitimizing-

Jim Rutt: Credibility. Huge. You get a nature journal to sign on and people can’t reject it as absurd.

Brian Nosek: Yeah, right. Well, yes.

Jim Rutt: At least they should not, right?

Brian Nosek: Yeah. That is, at that level, is very useful. Then speaking of researchers, we love having highly influential researchers try these out and then broadcast that they’re doing it, because that has the same credibility indicator. In general, these reforms are largely early career driven. Again, you have to just speculate on the reasons, but the most obvious one to me is that all of these behaviors are things that people going into science is how science operates.

Jim Rutt: Yeah, when you’re a 15-year-old nerd, that’s what you think is going on, right?

Brian Nosek: Yeah, right. You go and you study something. You tinker with it.

Jim Rutt: It’s honest, right?

Brian Nosek: Share what you find. Other people say, “Oh, that’s interesting. Let me try this.” Then, you get in and you feel like, “Oh, oh no. That’s not how it works.”

Jim Rutt: It’s a job like any other.

Brian Nosek: Right. You’ve got to drive for those publications. You’ve got to pursue grants.

Jim Rutt: You’ve got to make tenure, seven years.

Brian Nosek: [crosstalk 00:22:15] all of these other things. It is great to talk to early career students, graduate students coming in about these things, and they’re like, “Yeah, of course I’m going to do that. Why wouldn’t I do that?” That is where it’s very easy.

Jim Rutt: An analogy is GitHub.

Brian Nosek: Yeah.

Jim Rutt: That changed how a lot of people do software, right?

Brian Nosek: That’s right.

Jim Rutt: I do all my development projects on GitHub. Now, sometimes they’re private. Sometimes they’re not. Again, on old projects, it’s never worth the effort to retrofit it to GitHub, but when you’re starting a new project or bringing new people, a new team together, then if you can it makes sense to put it on GitHub.

Brian Nosek: Yeah, exactly. Right. There is a viral element there as well, right? Is that you get the early adopters who recognize the value of the service, can see how it can improve what they’re trying to do, or they’re just excited to try the service for the service sake. Then the extension into the early and late majorities is really about, “Oh, this actually can help me get work done, with people that I’m already working with. Oh, okay.” We see that a lot with the OSF, our core infrastructure, is that a lot of the early adopters who are excited about open science as a concept. They’re looking to use this tool for open science.

Brian Nosek: As we’re moving into the early and late majorities, really the people’s motivations are not about open science, they’re about the questions that they study in the lab. That’s why they’re in science. They’re not interested in open science, per se. If it can solve problems that they have, if it can make them more efficient for sharing their data within their team or making some of that available, what they can, with others, for registering their research once they’ve gotten that as something that can help functionally do better science, then these behaviors are easier to adopt. We have to be responsive to the different motivations across that adoption cycle and leverage the early ones for the later [crosstalk 00:24:11].

Jim Rutt: Makes perfect sense. Another example I can give from place of my career that you guys might find useful, you may already have figured this out and are working on it, which is diffusion by young people. When I was at Thomson-Reuters we owned Westlaw, actually we bought it. I was part of the team that bought it. Huge multi-billion dollar business sells online information services, research services to lawyers mostly, bunch of other stuff. That’s the main business. We spent 40 percent of our computing time giving free access to law school students, a fairly significant cost. People say, “Why do you do that?” In fact, when we were looking to buy the company we said, “Why the hell do you guys do that?” Right? They said, “Well here’s why, because…”

Brian Nosek: Once they’re users…

Jim Rutt: This is in the ’90s before it was completely ubiquitous. People are completely trained on Westlaw and they go to work for a law firm, they’re likely to flip the partner, right?

Brian Nosek: Yeah.

Jim Rutt: Because they totally depended on their associates.

Brian Nosek: Yeah, exactly.

Jim Rutt: In the same sense, a full professor at Yale or something is very dependent on his post-docs and his graduate students. If they come say, “Hey, we really prefer doing it this way.” You may flip a lot of those more senior people from the bottom up.

Brian Nosek: Yeah. No, that’s exactly right, because obviously the jobs change. The real engine of science is in people who are graduate students and post-docs. The PIs of labs are there to help facilitate it, but they’re not on these software tools doing the nitty gritty work.

Jim Rutt: At least not [inaudible 00:25:43], sometimes … some fields they are.

Brian Nosek: And some yeah, with some variations.

Jim Rutt: But mostly not.

Brian Nosek: Right.

Jim Rutt: Especially when the labs get big, they’re definitely not.

Brian Nosek: Yeah, my own lab, my whole lab has to use the OSF, because they’re in my lab, but I myself have never done the registration on the OSF. Everybody in the lab has to register their research. I’ve never done it.

Jim Rutt: Because why would you.

Brian Nosek: Why would I?

Jim Rutt: Yeah, it’s not your job, right?

Brian Nosek: Yeah. We collaborate on the design, but they’re the ones that do the operational work.

Jim Rutt: Very cool. Let me move onto another … unless you have some more things on your big picture?

Brian Nosek: Oh, well the last, the top of the pyramid is policy, right?

Jim Rutt: Okay, great.

Brian Nosek: We did infrastructure, make it possible. Then, user experience to make it easy. Then, norms to make it community based. Then, the incentives to make it rewarding. At the top is policy, make it required. An example of policy shift is that we’ve developed what are called the Transparency and Openness Promotion guidelines, the TOP guidelines. What this is is a policy framework. It’s eight distinct behaviors, open data, preregistration of studies, replication, et cetera, that characterize how you could think about open reproducible science and three levels of stringency that a stakeholder could expect of their grantees if they’re a funder, of their authors if they’re a publisher, or of their employees if they’re a institution.

Jim Rutt: It reminds me of the LEED standards for buildings.

Brian Nosek: Yes, right. Get a common language, right? And a common set of standards that different stakeholders that drive those incentives can adopt. Then, with these three levels of stringency, disclosure being the first one, right? For open data, you don’t have to share your data for my grants, but you have to say whether you shared it or not. You have to write it down [crosstalk 00:27:31].

Jim Rutt: And justify it [crosstalk 00:27:31].

Brian Nosek: Right. One level up is required, you have to share your data unless there’s IP issues or whatever else. Level three is not only do you have to share it, we’re going to check and we’re going to see if what you said is reproducible. We have that as an overall framework. So that’s how we organize, how do we get this decentralized community, there is no driver of science, it’s totally decentralized, how do we get them aligned and moving together to change behaviors? So the TOP guidelines are adopted now by more than a 1000 journals and all of the major publishers have signed onto them in principle.

Jim Rutt: Wow.

Brian Nosek: Funders are starting to adjust their policies in line with the TOP guidelines. The big lift will be trying to get institutions to do the same. We’re really just at the onset of that idea.

Jim Rutt: When you say institutions, you mean universities?

Brian Nosek: Universities mostly, but it can be other kinds of research.

Jim Rutt: How about the big gorillas, NIH and National Science Foundation?

Brian Nosek: Yeah, they are motivated to make these changes. We have lots of discussions with them. They have their own discussions with lots of others in the community. They’re hard to change.

Jim Rutt: They’re slow.

Brian Nosek: They’re slow, but when they make change, it changes everything. Those are the big, at least in the US, right? Those are the means of getting the entire community to shift.

Jim Rutt: Those are unfortunately fairly opaque organizations in some sense. Getting them to make a change at a true policy level.

Brian Nosek: Yes, right. Especially policies that impact how the researchers do their work. “Oh, we can’t tell a researcher how to do …” You can’t tell them to share their data? Wait a second.

Jim Rutt: You sure can, just say, “I’m not going to give you money if you don’t.” I guarantee you they’ll do what you … you’ve got the biggest stick in town, dude. Use the stick.

Brian Nosek: [crosstalk 00:29:24] I thought that the funders would be the easy part of the problem and it isn’t, because there is some degrees of deference and worrying about what impact, [inaudible 00:29:34] ways, right? Worrying about the impact of how we change [crosstalk 00:29:37]-

Jim Rutt: They don’t want to screw up, what they think at least, is science that works well. I would argue that they’re fooling themselves, particularly at NIH. I mean, biomedical research is a complete shit show, irreproducibility. I’ve heard number from knowledgeable people that say 90 percent of biomedical research fail on just the most elementary statistical significance test analysis because the ends are too small.

Brian Nosek: Yeah, there is significant power problems. The bear and Amgen studies, in 2011, 2012 were good examples of big efforts and failures to replicate core findings. We have our reproducibility project in cancer biology which is the same thing we did in psychology, but now in cancer biology. We finished all of the data collections, so now we’re writing that final report.

Jim Rutt: Don’t tell me the answer and I’ll bet you lunch that it has a lower reproducibility rate than psychology.

Brian Nosek: Okay, well a lunch has been bet and hands have been shaken. You heard it here first.

Jim Rutt: What I’ve seen of biomedical researchers-

Brian Nosek: But wait, what’s [crosstalk 00:30:38]-

Jim Rutt: I’m saying that the reproducibility will be lower in cancer research or any biomedical field you can name than it was in psychology.

Brian Nosek: I shook hands before I actually knew what I was committing to and I already know the answer. This is a problem for me now.

Jim Rutt: Oh well, it’s just a lunch. It’s just a lunch. Great. Well that’s actually huge. Again, this was a long battle to sell new policy, but a journey of a thousand miles starts with a single step.

Brian Nosek: That’s right. Every indicator we have is non-linear growth. It is all moving the right direction. It’s still a very small step on a very large issue, but there’s plenty of community can-do spirit.

Jim Rutt: What I love about it, as you pointed out from the very beginning, is while this is a huge lift, it’s not impossible because it’s aligned with the normative values of science itself.

Brian Nosek: That’s right.

Jim Rutt: You don’t want to make your results transparent?

Brian Nosek: Wait a second.

Jim Rutt: Why’s that? There may be a reason. It’s got personal data in it or IP issues, but the default case will have to be data open. Let’s switch to that. This is an area that I’ve been talking to Google and other people about for years. At the Santa Fe Institute, where I’m still an affiliate, we try very, very hard to adhere to open data and particularly code. Some of the fields I’m interested in, which are at the intersection of cognitive science, cognitive neuroscience, and artificial intelligence, most of the interesting work is actually in the form of code or code across training sets. Unfortunately, many of the most interesting researchers don’t release their best code. They’ll release it a year or two after. What are you seeing in the area of open data and particularly open code?

Brian Nosek: This is a big issue. It is broadly recognized that these challenges extend to code, right? There were some initial attempts to reproduce findings from computer science conference proceedings and ran into many of the exact same barriers that have been run into in other disciplines, where even getting access to the code couldn’t reproduce the findings in the conference proceeding, because of missing library or whatever it is that changed, there were problems even when the code was available. Oftentimes the code wasn’t available at all.

Brian Nosek: There is that change occurring. Data’s being shared more. Code is being shared more. Other materials, depending on the research domain, are being shared more, but it still is a long way to go. For us there’s multiple phases to think about in this change. The first phase is just getting people to share at all. I don’t care that I don’t understand it. The fact that you’ve made it available-

Jim Rutt: Good start.

Brian Nosek: Okay, let’s start there, right? But the second step is some quality control, right? Maybe it should be commented, I don’t know, right? Maybe it should be in a more accessible format? I don’t know. There’s going to be work to do for once people are onboard with the behavior of being open, is how do I anticipate that I’m going to open up this code, open up this data eventually? And change my research process so that I know that other people are going to need to look at this.

Jim Rutt: That’s interesting because as a programmer myself, I’ve actually got a new product about to come out that I wrote myself.

Brian Nosek: Is that right? Great.

Jim Rutt: Truthfully, it’s embarrassing as shit, some of the code, because I wrote it for myself, right? If I had known that I was going to open source it, I would have put more comments in, and wouldn’t have been quite so terse in my variable names and what have you. Again, it’s a long cultural road.

Brian Nosek: It is a long road, but the good thing is that we have a community, the open source software community-

Jim Rutt: That’s used to that.

Brian Nosek: [crosstalk 00:34:37] demonstrated how it is that you can get to an effective and efficient process for documentation along the way of sharing at the outset and even generating reward systems that benefit that, right? That senior person who doesn’t want to share their code now, maybe they’ll share it in three years, a lot of times the concern is priority, right? I want credit. This is mine. I’m not in here for the public good. I’m in here for me. Of course no one says it or thinks it quite like that.

Jim Rutt: The back of the brain’s always saying that.

Brian Nosek: There’s a conceptual shift along with the practical shift which is by opening early, I can actually get more credit. I can get benefits from other people being able to see and reuse and credit me for the work that I’m doing and sharing, rather than thinking about it in the closed model of I get credit by holding it close.

Jim Rutt: If there were the equivalent citation for the use of code, right?

Brian Nosek: Yes. Exactly.

Jim Rutt: Is that exist?

Brian Nosek: Yeah. One of the TOP standards is citation of data, code, materials, the components of research.

Jim Rutt: I love that.

Brian Nosek: Rather than just the paper. Since it’s all about the paper it really constrains how we think about rewards.

Jim Rutt: [inaudible 00:35:51] because if you can break that bottleneck, then the idea of hoarding the code or the data, so I can crank out more papers before anyone else does, may not be overwhelming if you also got credit for citations of your datasets and your code.

Brian Nosek: Right. And it can start to diversify what being a scholarly contributor means.

Jim Rutt: Well it could be a data specialist for instance.

Brian Nosek: Right. I am someone who is amazing to generate data, hate writing papers, but I generate lots of useful data. Why shouldn’t that be a viable path for productive contribution to science?

Jim Rutt: A machine learning guru. I can really come up with some amazing machine learning algorithms, but I’m bored to tears with data and I’m bored to tears with writing papers. Why can’t I create some great algorithms that do some amazing things that people can use in science and get credit for it?

Brian Nosek: Right. Of course in the business world, everyone understands this intuitively, right? The specialization and coordination across different specialties is valued.

Jim Rutt: That’s an interesting … actually I love that because one of my critiques of academic science is it’s too decentralized, right? I’m on the Board of Visitors of the Brain and Cognitive Science Department at MIT, an amazingly good department.

Brian Nosek: Yeah, Rebecca Sachs is on our board.

Jim Rutt: Yeah, I think I saw that. She’s now associate head of the department at BCS, really good person. Anyway, even at that level, I’m saying, “You know, no business would be so decentralized, right?” There would be more specialization and more cooperation across the specialty. What’s a company but an artificial organism to lower the barriers of internal competition within the various components because they’re all aligned on a single goal. Well, that’s not the way academic science works.

Brian Nosek: Not at all.

Jim Rutt: Not at all, right? Series of little independent [inaudible 00:37:41] which are also limited by the amount of work you can get out of one graduate student and one graduate student career, because the quantum of work has to be one dissertation’s worth, pretty much.

Brian Nosek: That’s right. And we leave so much talent on the table, right? There are many, many people who have lots of training and lots of skill who are in environments that they have very little access to resources. They’re at teaching institutions. They teach four classes a semester, right? They could be great contributors to science, but in the vertically integrated model where they have to come up with the idea, get the resources, run the studies, write the paper, analyze all of the parts, they don’t have time to do that. If they could contribute just this part, just this part, just this part, make it a horizontal model where they are part of a system-

Jim Rutt: Like a business. It has marketing, sales, IT, operations, janitorial, right? I mean, they’re all working for a common goal.

Brian Nosek: Yeah, so their contribution is focused on what time, resources, and skills they have available to contribute to it.

Jim Rutt: Interesting. I can give you a story from our BCS at MIT, the Visiting Committee is a very interesting process MIT’s been doing since 1875 approximately. It runs like clockwork. It’s a thing of great beauty. Someone should really write it up. Anyway, part of it is we go out and interview every level of participant in department from under-graduates through post-doc, early faculty, late faculty. We get their uninhibited, we hope, critiques. A few years back, on of the critiques was the level of programming and systems management necessary to do the work we’re doing today is really over the heads of many graduate students and post-docs.

Jim Rutt: We really need to build a departmental level, horizontal software, data management, network operations resource. We had some technologists and tech business people on the Board of Visitors. I will also say the department head was very sympathetic and they actually did allocate some fairly serious dollars to change how it was thought access to computer technology would be available within the department. Now someone doesn’t have to be fully vertically integrated. That has certainly empowered some of the labs in ways that they couldn’t have ever gotten to on their own, realistically.

Brian Nosek: Yeah, oh that’s a great example. Another one that I think of, that is a positive step I think in all of this, is in the life sciences the emergence of core operating facilities, where there is a lab that does this kind of technique-

Jim Rutt: Does imaging.

Brian Nosek: And other labs can make use of that resource. That collective sharing is a benefit for everybody.

Jim Rutt: We’re certainly seeing a lot of that in the life sciences.

Brian Nosek: Yeah, right. There’s certainly recognition that the problems they’re trying to solve are bigger than any individual lab can do. It’s forcing people into figuring out, “Oh, okay, maybe we need to actually reorganize this. And oh it turns out other people have solved this problem.”

Jim Rutt: Yeah, some of the techniques are so hard, just being an expert in the technique-

Brian Nosek: Technique, oh yeah.

Jim Rutt: Is a gigantic contribution to the science, but doesn’t necessarily leave room for a person to also be a PI, right? You could be intellectually at the same level of a big lab PI and be the master of a cutting-edge technique.

Brian Nosek: Yeah, it’s a bizarre thing that we’ve fetishized in science, academic science, that everybody essentially has to be the CEO, right? That’s it. That’s the only-

Jim Rutt: Yeah, the PI job is essentially a CEO.

Brian Nosek: Right, and that’s the only job that is what matters, but then obviously that’s not how-

Jim Rutt: Another model I saw that was very interesting, I recently stopped in and chatted with senior folks at the Broad Institute up in Cambridge. They have taken a different approach. They now have very senior research scientists, right? That are parts of … they have seven teams, right? That are much bigger than a PI lab. They have a PI, essentially as the CEO, but one of the things that they have insisted upon is that the PIs are typically part-time from elsewhere, Harvard or the big hospitals in Boston or MIT or whatever. They may be there a third time, half time at the most, that there be research science people at almost that level, I mean, paid really big dollars who are going to be there for the long term. Very different model.

Brian Nosek: Yeah, that’s great. I love that there’s experimentation like this happening.

Jim Rutt: Yeah, of course it helps somebody gave them 600 million dollars, right?

Brian Nosek: That doesn’t hurt.

Jim Rutt: Eli Broad, right? As a matter of fact, his name’s on the bill here. Again, it’s another model of doing science at a different scale where it’s not one little thing. What are the issues around data particularly now that we’re moving toward bigger and bigger data sets?

Brian Nosek: There are a lot of issues. Part of it is in the related but separate concept from open is fair. Fair data means findable, accessible, interoperable, reusable. All of those could be true without it being actually open, right? It could be I’ve made it so it’s possible to use, but you need to get permission from me to access the data. It being fair data means that you can actually make productive use of it.

Jim Rutt: For instance, maybe as long as you use this API, you can write an analysis routine which I will run for you and give you the result, but you never actually see my data.

Brian Nosek: Right. Or the privacy issues as you mentioned before, or IP issues. I’m willing to have people analyze it, I just can’t release the whole thing to you. Fairness is also important for data that is totally open, which is just how can we make it actually usable. One of the promises of open data is that we’ll be able to integrate lots of different datasets productively, to ask questions that we couldn’t ask in single dataset. That can only really happen if the data is fair so that you can actually put it together in some productive way.

Jim Rutt: Though it’s harder than you think.

Brian Nosek: It’s very hard.

Jim Rutt: We dealt with this in business a lot, especially Thomson-Reuters. We had thousands of commercial databases and trying to build intersections of them, normalized, because people the way they divided things up conceptually, the ontologies.

Brian Nosek: Oh, it’s a mess.

Jim Rutt: Yeah the ontologies, they’re not a super set of some one grand ontology. They’re all separate ontologies developed for different reasons. They don’t overlap. There’s no rhyme or reason to them really.

Brian Nosek: Half of the ontologies have been developed to solve the problems of the other ontologies being a mess.

Jim Rutt: Exactly.

Brian Nosek: Then each one, “Oh wait. That one doesn’t work either. Okay, build a new one.”

Jim Rutt: It’s one of my easy ways to dismiss somebody, someone who tells me they’ve solved the ontology problem, right? I go, “No you haven’t, right? I guarantee you.” It’s like someone said, “Oh I’ve beaten the second law of thermodynamics.” And I go, “Oh, see you later.” I think that’s one of those fundamental things you can count on that the ontology problem will always be with us.

Brian Nosek: Yeah, that’s right. That’s right. We, early on, in building the OSF, we’re confronted with this, because everybody that builds some tool like this to try to make common tools or make it easier for people to work in concert has to make some decision about ontology. We decided nothing. We’re not going to even try. Instead, what we’ll do is we’ll try to facilitate many different example metadata standards where people can say, “This is my metadata standard.” Then, we’ll just cultivate those that become popular. Those will be the ones we emphasize. Rather than try to impose one, let the community have some [crosstalk 00:45:15]-

Jim Rutt: That’s the only way you could do it. You were very wise not to try to solve that problem. You’re already boiling the ocean. This would be boiling five oceans, right? By itself. I would strongly commend you for making that decision. Let’s move on a little bit. Another one of my pet peeves, the insane costs of for-profit journals.

Brian Nosek: Yeah.

Jim Rutt: What do you think about that? Are you doing anything to help much lower cost journals or more lower cost or zero cost models of publishing coming to existence?

Brian Nosek: Yeah. This is a problem that we care about a lot. Although we have a small way that we’re involved compared to some of the others in this space. Open access is the tip of the spear on the open science movement, because it is mature. It has the longest history of the research community saying, “Really the research we produce should be openly available to anybody.” It’s weird, given the internet age, that we have this subscription model that closes access to publicly funded work for the public to consume and use in some productive way. The rationale for open access is a very easy case to make. Of course, there’s lots of vested interest in the business models of subscription because they’re enormously profitable. Elsevier, Taylor & Francis, Wiley make lots and lots of money based on this [crosstalk 00:46:37].

Jim Rutt: Obscene profit margins. If only you knew.

Brian Nosek: Oh my gosh, yeah. There are a couple on the Twitter sphere that publish each year their rates of return. It’s just so amazing, right? Who knew that Elsevier is consistently more profitable than Apple? That is big challenges. The way that we’re contributing is less on the advocacy side, because there’s others that are doing that well. We’d just be one more voice in a work that is happening to move the community to open access. The gap that we’re trying to fill is technology, is how can we make it easy for communities to make their research more open. We have a service called OSF Preprints, or OSF Papers, where any community can start their own paper sharing service. Usually this is framed in terms of pre-prints or post-prints, right? Papers before they’ve been peer-reviewed, sharing them for advance discussion and archive is the grand-daddy [crosstalk 00:47:39].

Jim Rutt: Of course, I use it every day.

Brian Nosek: Right. That has worked so well of physics and allied communities.

Jim Rutt: And now machine learning.

Brian Nosek: Yeah, since what? 1991 or something? It was released. That model is now … we’re trying to help facilitate extending that to all disciplines that want to launch one.

Jim Rutt: We found, culturally, really difficult in biology in particular, I don’t know why.

Brian Nosek: Yeah, it’s changing fast.

Jim Rutt: Well that’s good.

Brian Nosek: Finally bioRxiv is the most popular service for the life sciences. It’s growth rates are like ours, which is just this massive growth in sharing of pre-prints in the life sciences. We host, now, 29 different pre-prints services across a variety of different communities. The latest that launched last week was EdArXiv for education research. The week before IndiaRxiv launched for Indian researchers to have more open access data. Our largest pre-print repository is INA-Rxiv which is Indonesian research community, pre-print sharing service.

Jim Rutt: That’s really disciplinary. They have [crosstalk 00:48:45] any which way.

Brian Nosek: [crosstalk 00:48:46] Yeah exactly. There are regional ones, Africa, Arab language, French language, and Indonesian. Then, there are a lot of disciplinary ones. PsyArXiv for psychology, SocArXiv for the social sciences, EarthArXiv for earth sciences. That is, for us, the way that we can fill a gap in trying to promote the openness of the outcomes of research is just allow communities to launch their own services, make their papers more accessible. That is a complement to whatever happens in the publishing world.

Jim Rutt: In the journal world. Okay, which gets me … sounds like a little outside your own sphere, but if we’re going to talk about science and the process of science, we have to talk about peer-review. At one level it just seems like a really, really weird way to do something. On the other hand, I’ve never heard anybody come up with anything any better. What’s your thought on where we are with peer-review today? Are there any alternatives? Do technologies or platforms like your open science platform provide opportunities to do peer-review in a way that’s less slow, expensive, opaque, et cetera.

Brian Nosek: This is an area ripe for a lot of innovation. The positive thing is that a lot of that innovation is happening. Over the last six or seven years, a number of different technology groups and innovative editors have said, “We’re going to change things. We’re going to see what happens.” What we want to try to promote organizationally is experimentation with this. We arrived at this system of peer-review in a really ad hoc manner. It is a really ad hoc system, right?

Jim Rutt: It’s newer than people think, right?

Brian Nosek: It is newer, yeah, exactly.

Jim Rutt: In the 50s, the current form of peer-review didn’t really solidify until the late 50s.

Brian Nosek: Right. Which is surprising to people. “Oh I thought it was always like this.” “Nope, Newton did not have reviewer two to deal with.” Right? It was different then. The experimentation, I think, is very useful here, because the existing system of serial submission to different journals of a handling editor that has complete authority of whether it gets published or not. An ad hoc selection of individual reviewers is a very inefficient system. We have potential integrations with peer-review services on these pre-print services that we host. There are a number of different peer-review services, that if we attach it to the pre-print services, could start to innovate on trying it. “Let’s just try out a different way of doing peer-review. Let’s do it totally open. Let’s do it where we invite some people and then don’t invite other people,” or whatever.

Jim Rutt: What would you say are some examples of good experimental peer-review systems?

Brian Nosek: A real interesting model is one that eLife has been using. I mentioned them earlier as adopting Registered Reports. They’re willing to try lots of different things. Their peer-review approach, that they’ve been using for the last four or five years, has a consultative approach among the peer-reviewers. You submit the paper just like normal. We have an editor just like normal. The editor selects peer-reviewers. They do their initial evaluation. Then, the peer-reviewers talk to each other and come up with a summary evaluation. Rather than this, reviewer one said x and reviewer two said y and you’ve just got to deal with it. That comes with a summary saying, “This is what our feedback is.” And, all of that is open. They share the entire process, the outcomes of that and then the response letters from the authors. When you see the paper in eLife, you can see that entire history.

Jim Rutt: Wow, I love it.

Brian Nosek: It’s great.

Jim Rutt: That’s great.

Brian Nosek: Right? There’s often, in good peer-review, having been an observer of it for a long time and contributor, but mostly me observing other reviewers, sometimes the scholarship’s amazing. Amazing. Yet, in this standard system, it’s totally closed off from the world. When I read the paper, from what I … I was a reviewer, I say, “Oh my gosh. The paper does not capture some of the real big issues that came up in the review process. It would be so useful for people to be able to see that.” That’s, I think, a great illustration of just one simple thing, of opening that up that can really change how peer-review could be used.

Jim Rutt: That’s really, really interesting. I’ll have to take a look at that. It’s something a number of us have talked about, but [crosstalk 00:53:16] in any group, at least the people I know, no one’s gotten past the level of talk. I’m glad to hear there are people actually out trying different things.

Brian Nosek: Yeah, there is a real nice test bed. The real challenge, I think, is to get some of those innovations into the traditional journals to really scale it up.

Jim Rutt: Great. Well, thanks for this amazing detailed discussion about open science and your organization here. Let’s move on to another topic.

Brian Nosek: Sure.

Jim Rutt: Probably in the wider world, you’re most well known for the Reproducibility Project in psychology. Type your name in, bunch of that comes up, right?

Brian Nosek: Yeah, right.

Jim Rutt: For our audience, in fact I would say, it’s really, I think, your project that started this slow motion avalanche of increasing concern. That’s when I first saw it see light of day, when people said, “Wait a minute. How could this be?” Right? You really started something there. Increasing quality of concern about research and various biases that enter in to what gets published, et cetera. I want you to talk a little bit about the replicability project with respect to psychology. How’d you come up with the idea? How did you do it? Broad outline on what the results were that you found.

Brian Nosek: Yeah. This project started in late 2011. It was a time where in psychology, my field, there was increasing concern about reproducibility. People were saying, people have been saying, “We have a problem,” for forty years. It was really becoming a more active discussion, even among the regular crowd, rather than just the methodologists who were worrying about this. There were debates of there is a problem, there isn’t a problem. And, people giving great theoretical reasons, but very little data, all right? It was just anecdotes, right? “We couldn’t replicate this finding.” “Well, of course you couldn’t replicate that finding. No one believes that finding, but it doesn’t mean there’s a problem.” It’s that kind of back and forth. We just thought, we really should have some data. This is a problem that we are prepared to solve, because we know how to do science on science.

Jim Rutt: Yeah. We’ll do a-

Brian Nosek: We can study it.

Jim Rutt: Yeah.

Brian Nosek: It was like, “Okay, well if we want to study it, what do we need? Well, we actually need to try to replicate a sample of studies and see how well we do. Well, we don’t have enough resources in one lab to do that and the incentives are bad,” right? Doing replications is not the exciting thing to do in science. I said, “The only way that we could actually get this done is to make it a community project. Let’s see if other people are interested in the same problem and then we’ll distribute the resources. Each of us will run a replication. We’ll put them all together.” We made an announcement saying, “Here’s the idea. We’re going to do this replication project. Anybody else want to join?” Within a week it was something like 50 people had joined the project. We’re like, “Ah-”

Jim Rutt: Pretty amazing.

Brian Nosek: “There are people that care about this. We have a really opportunity here.” We devoted a lot of 2012 to just to design this project and doing it, starting the initial studies for getting a sample of studies. We picked the year 2008.

Jim Rutt: Yeah, I was going to ask you, one of the things I don’t know is, how did you avoid systematic bias in the selection of the studies?

Brian Nosek: Yeah. We tried to minimize it. We can’t remove it entirely because randomly sampling from what would we sample from [crosstalk 00:56:45]-

Jim Rutt: [crosstalk 00:56:45]

Brian Nosek: What we ended up deciding was, let’s pick 2008. It’s one year. Let’s pick three journals. There were three prominent journals in psychology. We will start from the first issue published that year and the first papers in that issue and try to source as many of them as we can, try to see if we can get people to [inaudible 00:57:08], right? The idea is we’re going to have a systematic way of trying to identify which studies to try to replicate. Then, replicate as many of them as we can. In the end, we had about 160 of possibles ones that we could have selected. We were able to actually complete about 100 of those. It doesn’t remove selection biases entirely, at all, but it is the most systematic approach that-

Jim Rutt: That was feasible.

Brian Nosek: That’s feasible to do. We had that and by the end of the project, 270 people had contributed to earn authorship. Another 83 or so contributed non-authorship. It was a really wide community project. We completed 100 replications. We published that as a collection of summary in 2015 in science. The top line outcome was that across a variety of different criteria to decide whether we successfully replicated or not, because it turns out that’s a hard question to … less than 40 percent of the studies that we tried to replicate, we did successfully replicate. For most people, that was a number lower than they would have expected [crosstalk 00:58:23]-

Jim Rutt: Shocking I’m going to say. When I saw that, in the science paper, when it came out, I go, “Holy moly.” Right? I might have guessed 75 percent would have been reproducible, but 40 percent?

Brian Nosek: 40 percent.

Jim Rutt: That’s like, “Whoa, what’s going on here?”

Brian Nosek: Yeah, what it spurred is a lot of very good debates about what it means. Is that the right number or not?

Jim Rutt: Yeah, were you fair to people? I saw some of that discussion.

Brian Nosek: Oh yeah. Right, did we … there’s multiple reasons why we might have succeeded or failed. Failing to replicate doesn’t mean that the original finding is un-replicable. There are general explanations, right? It could be that the original was a false positive. It wasn’t really there. They observed it by chance [crosstalk 00:59:07]-

Jim Rutt: Statistical. Whatever your P value is, that leaves room for random error.

Brian Nosek: Right. [crosstalk 00:59:12].

Jim Rutt: But it should only be a small percent.

Brian Nosek: Small percent of the time would be the default assumption. Second is that it was the replication’s a false negative. We screwed it up. We did a terrible job.

Jim Rutt: The technique was hard and you never mastered it, right?

Brian Nosek: Right, whatever it was. We can’t rule that out. We have lots of processes to try to minimize the likelihood of that, but we can’t rule out that that didn’t happen. Then, the third is the most interesting area, which is both are true in the quote-quote sense true, right? The original, they did observe it and it was a real effect. The replication failed to observe. That was real too. And, that the reason that they’re different is that something fundamental is different between them that no one, yet, really has a full handle on, right? The theory doesn’t specify those conditions. There’s some changes that moderate when you see that effect or not.

Brian Nosek: That’s where lots of the debate was. It was like, “Of course you didn’t replicate it. You ran your study in Indiana. Ours was done in Illinois. Totally different kinds of people,” right? Or whatever. It could have been lots of different things. That is the debate on substance of why it succeeded and failed. What has happened since that project is that a lot of questions have been raised about why it succeeded or not, so a lot more replication studies have been done and are being done.

Brian Nosek: We’ve published four more big replication studies after that trying to look at some of the different questions that have been raised by the initial one. One of the ones that were just finished up, is taking on one of the fundamental critiques of the original Reproducibility Project in psychology. That was that was we screwed it up. There was 11 findings, from the 100 where the original authors didn’t approve of the design. We always engaged the original authors, got the original materials if we could and then asked for their critique. In 11 of those, they had not endorsed-

Jim Rutt: Your approach.

Brian Nosek: Our approach.

Jim Rutt: The team’s approach, right?

Brian Nosek: We proceeded anyway because the team thought it was reasonable to do it. One of the critiques was, “Well, that’s why it failed. If you had met their criteria, it would have succeeded.” That is quite possible.

Jim Rutt: It’s testable?

Brian Nosek: Yeah.

Jim Rutt: It’s also testable for opposition.

Brian Nosek: It’s plausible and testable. What we had organized was a replication of the replication project.

Jim Rutt: Oh, I like it.

Brian Nosek: We took 10 of those 11, that we were about to get teams together for. The teams ran two versions of the experiment. One was the one that we ran in the Reproducibility Project:Psychology. The second was one that went through peer-review as Registered Report, peer-reviewed in advance by experts and ideally including the original authors, until it was accepted. This is a reasonable protocol. This is the right test. Now they’re running both of these protocols.

Jim Rutt: What was the result? What’s the bottom line?

Brian Nosek: We don’t have an answer yet.

Jim Rutt: Oh.

Brian Nosek: I don’t even know the answer. The data collection is done.

Jim Rutt: Okay.

Brian Nosek: But I am blind, as one of the lead authors, I am blind to the outcomes of the individual studies, because what we, Charlie Ebersole is the lead author, what he and I have worked on is writing the summary report, where we write the entire paper, including the results section, with just Xs in the parts where it says, “And, this is what we found.”

Jim Rutt: I love this.

Brian Nosek: The whole paper is written not knowing what the results are.

Jim Rutt: Another way to eliminate bias, right?

Brian Nosek: Yeah. In fact, in this model, and Dan Simons is the editor of the journal it’s published in and he’s really refined this model, is he requires the authors to write: what will you say if it comes out this way and what will you say if it comes out that way? You have to put it in, in brackets, which phrasing. Basically, whole paper’s written before you know what the results are.

Jim Rutt: That’s like the new networks do on presidential elections. This is what we’re going to say if X wins.

Brian Nosek: Yeah, that’s right.

Jim Rutt: This is what we’re going to say if Y wins.

Brian Nosek: Everybody gets to know it, because it’s registered right there. It’s awesome.

Jim Rutt: I love it. That’s really meta actually.

Brian Nosek: Totally.

Jim Rutt: This is taken openness and anti-bias to a new, higher level.

Brian Nosek: Right. We’re always pushing the envelope there. This is a perfect place as a test bed to try to improve rigor, which is in evaluating the rigor of other research, right? The critics are so useful to us.

Jim Rutt: You’re also critiquing yourself?

Brian Nosek: Because, yeah, that criticism is like, “Okay well, yeah. Let’s try to do it better. Let’s try to do that better. Let’s try to do that and see what survives.”

Jim Rutt: I would be so interested in the results of this, this particular. This is brilliant.

Brian Nosek: I’m super excited.

Jim Rutt: I just love this. Now I’m going to ask you for a little speculation. This is outside of what you can prove, but what are your views on what’s behind the replication crisis? Why do so few across so many disciplines fail to reproduce?

Brian Nosek: Yeah. I do think the core is the incentives problem, is that because, and the way the incentive problem plays out that is most consequential for replication is in selective reporting. That selective reporting plays out in two ways. I run many studies and I only report a subset of them. How do I decide what subset to report? The ones that are negative results don’t quite work out are less likely to be reported than the ones that do. The second is selective reporting of analysis. Once I’ve looked at the data, the data informed my subsequent judgements of how I continue to look at the data. That iterative process is super useful for exploration, and it’s super bad for [diagnosticity 01:04:51] of statistical inferences. To the extent that I have that flexibility of doing many analysis and reporting a subset. I am necessarily going to be introducing noise into the reports.

Jim Rutt: We know there’s all kinds of spurious correlations in any dataset.

Brian Nosek: There have to be.

Jim Rutt: Yeah, there have to be, especially if it’s any high dimensional space, I guarantee, right? If you give yourself too much freedom to go on a hunting expedition for correlations, you’re going to find some.

Brian Nosek: Yeah. We are against freedom. Oh wait, no, no, no. It’s not that- [inaudible 01:05:23].

Jim Rutt: [crosstalk 01:05:26] point clear, because this is really important for me.

Brian Nosek: Yeah, what we want to do is make it very clear when we are freely exploring the data versus when we are not. We’ll simple characterize these as confirmatory and exploratory approaches, right? A confirmatory approach is I had an idea in advance. I had a way I was going to look at the data and I laid out that way.

Jim Rutt: I preregistered it.

Brian Nosek: I preregistered it. Committed to it.

Jim Rutt: So I can’t accidentally drift.

Brian Nosek: Then, I open the data. Look at it the way I said I was going to and report everything I said I was going to do, right? That maximizes the diagnosticity of the statistical inferences because I don’t know the data when I make all my plans. The data are actually testing my hypothesis, with my analysis plans. Exploration is I’m in the data and I’m looking at it and trying to discover things that I didn’t anticipate, right? All my plans before hand are naïve, right? I have a terrible understanding of how the world works. Yeah, I have a plan. I analyze it that way and then I realize I was wrong. Then, I go into the data and I say, “Well, what might actually be happening?” That exploration is super generative. It’s super useful.

Jim Rutt: To lead you to your next test, right?

Brian Nosek: Yeah. Right. It informs what I’m going to try next. The problem is when I mix up when I’m doing exploration and I think I’m doing confirmation.

Jim Rutt: I love this. This is a very beautiful distinction.

Brian Nosek: It’s actually pretty easy to implement in a molar way, which is with preregistration, right? The function of preregistration is not to say confirmatory research is more important than exploratory research-

Jim Rutt: It’s a different thing.

Brian Nosek: It’s a different thing, right? It shows you when you’re doing one versus the other. Of course, there is some gray area in there. That’s unpacking the complexity, but the basics is pretty simple, which is commit to what you know you’re going to look at before you look at the data. Then, once you look at the data [inaudible 01:07:21], you had looked at the data and adjust confidence in the claims accordingly.

Jim Rutt: That’s brilliant, I must say. Simple, but brilliant, which are the best kinds of ideas.

Brian Nosek: Exactly, right.

Jim Rutt: This is not a hard one, but it has huge implications.

Brian Nosek: Big implications. Just like the Registered Reports, right? Where it is you draw that line and make commitments make a big difference.

Jim Rutt: Really important. Then, this goes to maybe the biggest question before the field of science, particularly this issue of data exploration versus data confirmation, is frankly statistics 102 for anybody who has a background in experimental design. Why have most practicing scientists made these fairly obvious statistical errors? Is there insufficient education in experimental design? What’s your thought on why these, when you stop and think about it for a little bit, these patterns are so pervasive?

Brian Nosek: There are multiple reasons, I think, one is that a lot of the tools of statistical inference aren’t actually so intuitive. What we think intuitively, a P value for example means, is not what it means. That’s hard. Even people who have been doing it for a long time, even people writing statistics textbooks, often write it wrong, because it isn’t aligned with intuition. We have that to overcome, which is a hard one. The second level is, as you’re saying, is training, right? In order to overcome those intuitions, we need to improve our training so that we know it? A third level of challenge is that the system actually rewards some of the wrong ways of thinking-

Jim Rutt: Back to the incentives problem, right? [crosstalk 01:08:57] if you have good systems, people are going to-

Brian Nosek: Yeah, right.

Jim Rutt: Round the edges. They’re going to be …

Brian Nosek: I have to get less than .05, okay. I’m going to get less than .05.

Jim Rutt: Crank, crank, crank, right?

Brian Nosek: Yeah. A lot of that plays into some of the oversimplification of it, of thinking dichotomously, right? This magical barrier of less than .05, as below it means I found something and above it means I didn’t. Everybody knows that’s wrong.

Jim Rutt: That’s obviously wrong.

Brian Nosek: Obviously wrong.

Jim Rutt: We obviously know that there’s not much difference between .05 and .0499, right? Yet, magically, one is, or .051, one is a magically right answer and one is a magically wrong answer even despite the fact that they’re down in the noise with respect to actual difference and significance.

Brian Nosek: Right and despite us knowing that it’s not right, it’s hard to resist, right? That’s even true in how we write in very subtle ways. We found this. We didn’t find it there.

Jim Rutt: Even though what we really mean is there were differences in how significant our data spoke to us, right? Rather than saying black and white, yes, no. What about training for young scientists in experimental design? Is much of that happen or is it done as more on the job training or are there mandatory courses in experimental design now for graduate students?

Brian Nosek: Yeah, it’s highly variable. That depends on institution. It depends on within department and institution. And, it even depends on within lab, within department, within institution. The problem is that it’s highly heterogeneous. You don’t know, and especially going into a field as a grad student, this isn’t usually something you would think of as what kind of training in experimental design and statistics am I going to get? In this path some do, but many don’t. And, it’s hard to assess what my training’s going to be. There was, I don’t remember the source, but I recall a review of life science training programs where they looked at what is the training for experimental design and statistics in the program. It was some ridiculous percentage, like 70 percent. There was no class on experimental design-

Jim Rutt: At all.

Brian Nosek: At all, right, in this program. I’m making up the number, but it was something where [inaudible 01:11:16] the zero to me is a ridiculous percentage, right? It was ridiculously high. A lot of it is deferred to the lab, and if we’re deferring to the lab then it could be extremely technique focused or it could be general and you just never know.

Jim Rutt: In fact, they could just be replication of what the PI or another senior person’s perspective and it may or may not be rigorously correct. It may not be at all. You’re basically just generating bad patterns from the past. Let me see, what else we got here. There’s been a little bit of writing recently about some other psychology replication failures. I think most prominently the Zimbardo Stanford prison project. Do you know anything about that?

Brian Nosek: Yeah, that one is an interesting debate in that there have been no replication attempts and one that found something much milder than the original. The recent debates about the Zimbardo study is that it didn’t actually happen the way it was described. This is a fascinating, a very different kind of issue, which is the findings were presented in one way, but the reality of how the experiment was done is totally different. There’s some fascinating papers that just come out, that got access to all of the transcripts, original materials, the full videos, all this background information and historical analysis of that.

Jim Rutt: Sort of a forensic analysis almost, right?

Brian Nosek: Yeah. This latest paper that came out basically said, and this is oversimplifying what is a much more nuanced paper, but the top line result from my read was basically the PI, Phil Zimbardo, and the lead warden told the guards how to act. Now the whole point of this was-

Jim Rutt: Was that there was an emerging.

Brian Nosek: Yeah, people adopted these social roles.

Jim Rutt: Yeah that would totally change the read that most people put on that experiment.

Brian Nosek: Certainly it’s interesting enough to say when people say act like this, people do, but that was demonstrated in a much more compelling way in the Milgram experiments.

Jim Rutt: Exactly. Just for fun, I went to see what replication was going on with Milgram and there actually had been some recent replications. There was a recent study in Poland, which pretty good ends, looked like it confirmed Milgram.

Brian Nosek: Yeah, everything that I’ve seen has shown that those Milgram findings hold up to the extent that you could still do them within ethical boundaries now-

Jim Rutt: And with everybody knowing about Milgram, that’s the other problem.

Brian Nosek: Right. There is … people know a lot about it.

Jim Rutt: This recent Polish study actually had one interesting datums. They did warn that the end was not small enough to take this to the bank, but that women were three times less likely to go all the way to the limit than men were.

Brian Nosek: No kidding, wow.

Jim Rutt: Yeah. I mean huge-

Brian Nosek: That wasn’t observed in families originally.

Jim Rutt: They did warn that that end wasn’t large enough to be conclusive, but they said that it was interesting.

Brian Nosek: Intriguing, yeah.

Jim Rutt: Someone should ramp that up.

Brian Nosek: That’s the right way to frame it, which is this interesting, someone should follow it up.

Jim Rutt: With a bigger number focused just on this question. Let’s have a balanced number, make it easy. All right, last thing here, let’s move onto your own research, the Project Implicit. This is … well, why don’t you put it in your words. What is Project Implicit and what is it you’re trying to get at?

Brian Nosek: This is the work that I did for the first lion share of my career where our core interest is in thoughts and feelings that exist outside of conscious awareness or conscious control and how they might still impact our perception, our judgment, our action in everyday life. The popular term that has emerged is implicit bias. This is implicit bias research. Project Implicit is actually a nonprofit. It’s the nonprofit I ran before starting COS, that is a collaboration between multiple universities that have been responsible for doing a portion of this research. It’s a large research area, lots of different contributors. My laboratory, Mahzarin Banaji’s laboratory at Harvard, she was my graduate mentor, and Tony Greenwald’s laboratory at University of Washington, he was Mahzarin’s graduate mentor, are the three originating labs. Now there are many other labs involved in it, but we developed and advanced the initial evidence about one tool for trying to measure implicit bias, which is called the implicit association test. We developed a number of other methodologies related to it.

Jim Rutt: Is that the common one you see on the internet for looking for racial bias with word reactions and things like that?

Brian Nosek: That is Project Implicit.

Jim Rutt: Okay, cool. All right.

Brian Nosek: In graduate school I built the first version of that website. That is the work that we did. Then the running of that website came with me to University of Virginia. The website is housed at Harvard where my collaborator is, but we ran it at University of Virginia until I started Center for Open Science. Then, my former graduate student, this is an academic [crosstalk 01:16:12]-

Jim Rutt: Inheritance [crosstalk 01:16:12].

Brian Nosek: Yeah. The collection of my former graduate students from my lab now run Project Implicit. It’s been a super useful research enterprise. Millions of people come to our website every year and complete these tests. We get lots and lots of interesting data about how you actually try to measure feelings or thoughts that might be different than people’s conscious values. Then study how are they relevant in everyday life? Do these actually predict what people do? How do they change? Is changing them related to people’s behavior change? It’s a very active research enterprise, with lots of very interesting debates that would take an entire another hour and a half to unpack.

Brian Nosek: It’s a fascinating area, but for me, the great part of it in looking from what I do now to what I started with is that really a lot of our current active application to try to change research practices is rooted in that work on unintended bias, right? It’s that really this isn’t … I don’t feel looking at my own career, like I’ve changed direction. What I’ve really done is say, “Take all of this work we learned about how people apply, motivate, and reasoning are biased without recognizing it and need help to live according to their value. I don’t intend to be racially biased, but I can be without intending to because of these things that are clicking away in my mind that might drive me in different directions. Likewise I don’t intend to find out false things, but I might because of all these things clicking away.

Jim Rutt: I can see how the two are very parallel and how moving from one to the other was really not unnatural at all.

Brian Nosek: No, it was a very … we draw on that work constantly. To me, that’s very gratifying. I feel like what we’re really doing is applied work on some of those basic questions.

Jim Rutt: Well, Brian I’d really like to thank you for this discussion. This has been even richer than I was hoping it would be. Dug into some of the very deep issues on why science doesn’t work as well as it could. It still works great, so don’t get me wrong people, I love science and support science, but work like Brian’s is absolutely critical to have society get a greater return for the fairly massive investment we make in science, just between NSF and NIH, last numbers I pulled up were 46 billion dollars.

Brian Nosek: It’s a lot of money.

Jim Rutt: That’s a lot a money. If we could make that even 20 percent more efficient, that’s a very significant thing. I talked to Brian early in the Center for Open Science and he had some of these ideas. He didn’t even have all these ideas, but this thing has really grown to be a very impressive operation.

Brian Nosek: It really is moving. What’s most gratifying about it is that there is an incredible grassroots element to it, right? There’s so many different people in the community and so many different areas that are working on these problems. Really our role, we feel, is just to try to help connect all of these different players that are really changing the system.

Jim Rutt: I’d recommend anybody working in the sciences or in the funding of science or the governance of science to check out the Center for Open Science, what’s the URL?

Brian Nosek: C-O-S dot I-O.

Jim Rutt: C-O-S dot I-O. Thank you Brian.

Brian Nosek: Great, thank you Jim, appreciate it.