Dan Housman: Hi everyone, I’m here with the Novel Cohorts Podcast. I’m speaking with Melissa Haendel and Christopher Chute. We’re here to talk about a really exciting project, which is the National COVID Cohort Collaborative (N3C) Consortium. It is focused on COVID-19 research which is a very hot topic these days, for obvious reasons. I just want to get started and ask, Melissa and Chris to give a brief background on themselves and then we can get into discussing some of the opportunities for doing research working with this group.
Melissa Haendel: Sure. Thanks so much, Dan. It’s really nice to be here. I am the Director of the Center for Data to Health (CD2H), which is a consortium program that aims to coordinate the Centers for Translational Science across the US. The consortium is made up of 60 different clinical institutions. The goal of CD2H is to help coordinate collaborations and sharing of resources and technologies for informatics in the clinical and translational space. I’m very excited to be here today and talk to you about the work that we’ve been doing as a consortium to combat the pandemic and bringing people and their data together to help reveal new discoveries and strategies for treatments to help reduce the impact and severity of COVID-19.
Christopher Chute: I am Christopher Chute. I am the Chief Research Information Officer at Johns Hopkins, and an Internist, Epidemiologist and Informatics person by training and background. I’m also a co-lead with Melissa of the CD2H and the N3C, and also happy to be here.
Dan Housman: Great. Melissa, Could you tell us a bit about the N3C? How it came about? What’s different about it? Some of what you’ve already accomplished.
Melissa Haendel: So, the National COVID Cohort Collaborative (N3C) is exactly that. It’s an effort to try to create a national resource that is comprised of patient information for many different clinical resources. The goal is to create it in a collaborative partnership model. The community is very large, we have over 1,000 members in just a few months. There were almost 200 authors on our first manuscript. A very large number of people have been working very hard together. So that’s the collaborative part of the N3C. What makes it really unique in particular, is the fact that we’re are partnered with the different clinical data model communities and sharing networks. They have done a lot to help institutions get their data into a common model and there are multiple common models. Also, Chris can speak more to this in a few minutes, that’s his area of expertise. So, the goal of the N3C is to harmonize those models and get all the data into one model. This not just for distributed querying like the common data model research networks do, but actually, all in one location with all the data in one secure enclave. It is providing the foundation for machine learning and statistics over very large data set that no one would really have access to otherwise. So, it’s very complimentary and synergistic and built-on the prior work done by those common data models. The other thing that’s very different about it, is the broad access. We’ve been able to create a regulatory structure that in combination with the fairly significant degree of security, validation and compliance to allow a broad diversity of users. This includes commercial entities, citizen scientists, students, as well as traditional researchers that can request access to different tiers of data within the system. We believe that this broad access will bring together experts in machine learning and statistics that might not normally work on sensitive clinical data together with clinicians or clinical data experts. So we’re really trying to embrace the philosophy of it takes a village to create an environment where people can really work together across different disciplinary boundaries and work on these data together.
Dan Housman: How much data and what kind of data is already been pulled into this initiative and what do you expect, coming up in the next few months?
Christopher Chute: We already have 57 organizations that have signed the data transfer agreement. Most of the major academic medical centers in the US are participating. Twenty of them have physically deposited the data. The rest are in various stages of regulatory review. At present, we have close to half a million persons in the database. We anticipate over time to have on the order of a million cases of COVID-19 with other match controls across this community. We’re early in the process, literally open for business yesterday. We’ve only been in existence for four and a half months. So, in this four and a half months, I think we’ve accomplished about as much as we can. But the trajectory is clearly very positive. We anticipate this to be, the largest collection of COVID-19 cases in the world with associated electronic medical record data going back two years for each person in the repository.
Dan Housman: Chris, what kind of research are you seeing? At least early requests for data access or discussions for using the data today.
Christopher Chute: Yeah, that runs the spectrum. I mean the fun part is we’ve established self-organizing clinical domain task teams that include acute kidney injury, critical care, diabetes, and social determinants of health. There are about 20 of them at present. These are communities that have chosen to rally together. These usually comprise scores of persons and admixture of clinicians, data scientists, statisticians, and informatics people. These people are working together to address the subtleties of the specific domains, as they’re associated with COVID-19. What’s happening in parallel and is really quite gratifying, is the commitment of many organizations, persons and team members to collaborate on the common data cleaning and information organization so that it can support these focused analytics in an effective way. For example, we have in like codes and other associated information about laboratory tests, we probably have a score or more of serum creatinine measures, when a clinician really wants to know if the creatinine is elevated or not. So, harmonizing these kinds of elements becomes the initial spadework that we’re doing. I’m pleased to say that we have literally hundreds of people contributing to the process of these data cleanups organized through the N3C mechanism and process. In terms of commercial organizations and pharma companies participating, many of these people contributing to the state of clean up, are coming from the OHDSI community, and are employees of pharma organizations. They’ve recognized the importance of contributing to this common work, and the philosophy of N3C is that we are a community. This is pre-competitive work if you will, that all boats rise higher if we all have higher quality data to do our specific analytics. All the data that is so normalized and harmonized remains open & accessible to all investigators. So, this is building the common wheel. Even in the domain task teams, pediatrics, pharmacoepidemiology, and other domain intense groups, we have very large numbers of people rallying together. We’re striving to have maybe a handful of papers in each domain, with scores of authors that would be deep thoughtful, highly impactful publications. Rather than 1,000 publications done by one or two people outside of team context. We’re not prohibiting independent research but all the data remains open and viewable by everybody else. So, it lends itself to team engagement and participation, which we’re persuaded and we’re already seeing evidence will lead to better science.
Dan Housman: You’re both very involved in the governments. What’s the process for Life Sciences, Med Device, and any commercial organization to do research with N3C?
Melissa Haendel: The process is basically the same for everyone. We really wanted to level the playing field, and have everyone be welcome. We have three different tiers of data including synthetic data that’s still being validated, but will hopefully be broadly accessible. We have de-identified data that you can request access and then we’re justified, which would require use of the central IRB managed through Johns Hopkins through Chris. Although, you can also have a local one if you prefer for an institution or an organization. And then, if you need access to the limited data set, which has geo codes and dates of service, that are really relevant for pandemic analytics. You would request access to that limited data set with a project specific IRB. These rules apply to anyone who is participating, whether you’re a commercial entity or citizen scientist or at an institution. The organization that you’re affiliated with would sign a data use agreement. This would stipulate that all investigators making requests would attest to a set of common principles & data user code of conduct. This include things like you will not take screenshots or download data, and there’s no data egress allowed. There is a publication and attribution policy. There’s community guiding principles for behavior and collaboration best practices. So, all of these things are in place for all parties, whichever tier you request. If you do request the de-identified or limited data set, you do need to have human subjects training documented. Also, all users of the system have to take security training so they understand the nature of the sensitivity of these data. The balancing act here is that everyone can have access, but there’s some hoops to jump through in order to get that access. Those hoops are trying to best balance regulatory control and access to these sensitive data with security and compliance. It’s really a unique opportunity to provide this kind of broad pre-competitive access. But the hoops are that each individual person who participates does have to make sure they have the proper training and attest to the rules of the road so to speak.
Dan Housman: I think a lot of people are curious of what kind of challenges you’ve already overcome to try and get here? What have been the roadblocks? And, how did you solve them? Because I know it’s been hard for any group to do something like this in the past.
Christopher Chute: Yes. How do we count the ways. I think, all parties who participated in this have been enormously gratified by the enthusiasm, energy and the positive attitude. We’ve had only a handful of instances where people have raised serious barriers or questions. I think the attitude on the part of the participating legal teams, administrators involved in administrative review & approvals and the scientists have not been like what’s the problem? But have been much more aligned to, how we make this work, because this is an important work. This is an important problem. You can contrast this with certain databases that contributed to the Lancet articles and dwindling journal articles that had to be withdrawn, because of a lack of data providence and openness. We see the N3C process as the counterpoint to those episodes. Here, we’ve embraced transparency, reproducibility, and openness, so that any individual should be able to reproduce the analysis of another party or group. Assuming that they’ve gone through the administrative and regulatory approvals to get access to that data. Everything is completely transparent. And given that kind of philosophy, the barriers have fallen away. We did have a little kerfuffle just before opening about the definition of safe harbor. But we actually went to HHS Council and the Office of Civil Rights and had a determination that we were in full compliance with the legal expectations of safe harbor de-identification. That was our most serious hurdle. But otherwise, I think it’s been a celebration of how do we do it, rather than overcoming obstacles.
Dan Housman: And so our group, Graticule is really interested in how to bring together, the commercial teams from life sciences companies and others. What are you hoping and imagining might come about if we go beyond the work to do curation with sort of ODHSI working group, but more on the research side can come about from bringing these groups. Do you have thoughts of how they can really move the project forward, in particular, these commercial organizations?
Melissa Haendel: I guess there’s lots of different ideas. But, the ones, that we’ve been discussing this far especially for pharma companies are those where they have drug development in targeted areas. making sure that the variable definitions which are relevant to those particular domains are well curated. So that, they can actually use this data to inform whether or not there are any confounding factors or influences of drugs. This can be either ones that they themselves produce or ones that are more generally, in their targeted domain. I feel like it could actually help the pharma companies better understand the effects of their drugs in a COVID-19 population. And then also, that might lead to better care recommendations for either being on those drugs or using them in certain ways that might reduce the overall COVID-19 impacts. There are so many systems involved in the response to COVID-19 in the body that it’s not going to be as any surprise that there are many drugs that influence those trajectories and tissues. So, I think the pharma companies with their really deep pre-clinical data and in combination with the ability to evaluate the impacts of their drugs or related drugs in this cohort are going to be really foundational for understanding their own trajectories for their own drug development.
Christopher Chute: And I would add that the philosophy of N3C in terms of engaging the broader scientific community spans not just the academics. We recognize that there is enormous scientific talent in commercial organizations, be they IT companies, pharma companies and other entities. The aspiration of N3C is to bring together all of this talent across academia and industry. Even citizen scientists where relevant to maximize the benefit and learning that we can achieve from this precious resource. Our ask is that the data be treated respectfully and that the people adhere to the code of conduct and recognize that this is a collaborative environment. It is open and transparent and proprietary analytics are not why we established this environment. But, as participants in the industry can partner as individuals, organizations and roles that make sense to address the questions that are of interest to those organizations. But, do so in a way that would contribute, either variables or elements or content or understanding or analytic pathways that could be shared by everybody, and vice versa. The commercial organizations are welcome to leverage the previous established work on variables, elements, and algorithms that have been deposited there by ongoing work. So this is we hope a synergistic process, and that all boats will rise higher.
Dan Housman: I know that in the structure of all these workgroup, there is now a task group for commercial. It’s emergent and just started. This is just a mention to everybody in the audience that it’s something they can join and participate in. It’s still something that can be influenced in terms of what are the key topics to tackle. As, I think we don’t know what might occur once a lot of research starts to get initiated. What kind of barriers might come up because of commercial concerns or other IP issues? I’ll ask, Chris and Melissa, how you envision the teams working with the other workgroups?
Melissa Haendel: We had a very quick organizational structure to get everybody organized once we realized this was the way we needed to go. And so the task teams are formed, and they can be situated within an individual work stream. For example, the clinical groups that are looking at particular clinical elements are situated in the clinical scenarios group of the collaborative analytics. But sometimes teams are cross cutting. For example, we have a task team that’s focused on data linkage and hashing strategies. It spans both, some of the technological work streams as well as the governance work streams. We recognize that there’s a lot of need to do some cross cutting work. I think for the pharma group that’s exactly where that group lays. It’s really a cross cutting group to help coordinate the needs of pharma and commercial interests, both in terms of governance. We are making sure that we are creating a kind of a space for investigators from commercial entities that is welcoming and participatory but also meets the specific needs of those commercial entities. But, then also in terms of the analytics, as I mentioned earlier, really identifying the key partners in the thematic areas. It’s really a matchmaking, a commercial partner with specific expertise, whether it be in machine learning and AI, or in a particular clinical area, such as acute kidney injury with the experts and analytics that are going on in those particular areas. So that we really kind of bring together a different suite of experts in different areas together to tackle common problems. We know that no clinician can come in by themselves without the assistance of experts in machine learning and statistics and bioinformatics workflows. Similarly, nobody from bioinformatician is going to come in and know the right clinical questions to ask. So, we feel that the pharma task team is really key to bring those common interests in from commercial partners. And to bring the perspective that where their expertise is in their particular commercial forte but really bring that to the different groups where that expertise is very much needed.
Dan Housman: Well! Thanks so much for joining today. I recommend everyone in the audience to just check out N3C. If you can start participating whatever level you’re capable of within the organization and spread the word. I think at least what I’ve heard this week, which is really exciting, the data is open for business and if you can get through the organizational governance challenges, it’s time to start doing some research. I’ll leave it to Melissa and Chris and ask them for any last words you’d want to share with the audience briefly as a takeaway.
Christopher Chute: I was gonna say N3C is an unprecedented activity in terms of its scope and magnitude and access to clinical data. At some level it’s a social experiment, and we’re quite excited about its potential.
Melissa Haendel: Yeah, and I was gonna say something similar. But I’ll add to that, the social experiment really needs to be by everyone and for everyone. We want to have the cohort be as representative of all Americans as possible. We want to have as many Americans as well as others in the world, participate in the analysis of the data, so that we can expedite the best possible care for COVID-19 patients. We can identify treatments and treatment strategies and together reduce the impact of COVID-19. It takes a village and it’s a new kind of village that’s never been built before. And we’re grateful for everyone’s participation.
Dan Housman: Great. Well, thanks so much for your time today and looking forward to hearing more great things in the future.
Melissa Haendel: Thanks so much.
Christopher Chute: Thank you.