The partnership between UCSF, Fortanix, Microsoft, and Intel aims to overcome the data privacy issues that inhibit the development of the broad data sets necessary for reliable clinical algorithms.
Artificial intelligence (AI) in clinical healthcare has met its greatest obstacle: the need for data privacy. To develop reliable clinical algorithms, AI needs data from a wide array of organizations, including hospitals and health systems, to create data sets that are representative of a broad spectrum of patients. The problem? Healthcare organizations are hesitant to share this information due to growing security concerns and the vital need to protect patient privacy.
To address this dilemma, UC San Francisco (UCSF) has joined forces with a powerhouse trio of technology partners: Fortanix, Intel, and Microsoft. The academic institution's new BeeKeeperAI, a privacy-preserving analytics platform, will feature novel technology that will enable disparate parties to share data from their own protected enclaves in the cloud without ever having to move it.
The technology behind this approach is complicated, but the potential benefits are tremendous, with the opportunity to more rapidly deploy regulated AI solutions to improve clinical outcomes.
The approach involves:
- The confidential computing capabilities of the Fortanix Confidential Computing Enclave Manager
- Intel’s Software Guard Extensions (SGX) hardware-based security capabilities
- Microsoft Azure’s confidential computing infrastructure
- UCSF’s BeeKeeperAI privacy preserving analytics to calibrate a proven clinical algorithm against a simulated data set
HealthLeaders spoke with Michael Blum, MD, UCSF associate vice chancellor for informatics, executive director of the Center for Digital Health Innovation at UCSF, and professor of medicine (cardiology). Following are highlights from this discussion, edited for space and clarity.
HealthLeaders: How did this AI initiative evolve?
Blum: We've had some large collaborations in artificial intelligence and machine learning. We learned that we could develop proof of concept algorithms that worked very well off of a limited data set, but to develop clinical algorithms that worked across a broad population of patients … that were clinically reliable, required much larger and much more representative data sets.
Acquiring those data sets was a time-consuming undertaking and expensive. It hindered our ability to go forward and develop those algorithms. That led to the understanding that the privacy that we all cherish in healthcare was an obvious barrier to this field advancing as it should.
We needed to find a way that allowed data owners and algorithm developers to share so we could develop bigger data sets, more representative data sets, as well as allowing [data owners] to get exposed to algorithm developers without risking the privacy of the data.
HL: It's come to light that certain populations aren't being studied or included in some research initiatives. Does your initiative provide a way to make sure that AI solutions are more inclusive?
Blum: That's exactly right. It's unusual for a single organization to have broadly representative populations. If you have a community that has a broadly represented set of ethnic populations, they might not have rare disease populations sufficiently represented. A population that has a lot of primary care patients might not have a lot of very sick patients in it. Those [circumstances] tend to generate inequities because not everyone is represented in the data set. Then the algorithms that get developed reflect those inequities and disparities in care, because if an algorithm is trained on those narrow data sets, it develops exactly what it sees. This technology allows us to bring in data that has broadly represented populations. As the algorithms are trained, they can be effective on the whole population, rather than just the narrow slice that they were trained on.
HL: Can you further describe the issues that hospitals and healthcare systems face as owners of data?
Blum: The move into electronic health records and imaging centers has created an environment where we have a tremendous amount of data available in our organizations. The organizations feel a couple of ways about that. One, they have a tremendous sense of responsibility to protect all of that patient data. Breaches not only violate patient privacy, but they're terribly damaging to the reputation of the organizations, and they come with some significant financial penalties. Healthcare organizations have many reasons to hold this data as tightly as possible and do everything they can to prevent any data exposures, which makes complete sense.
The problem with that is it directly impacts our ability to develop AI algorithms. It's very unusual for one organization to have a sufficient amount of data to develop a really good clinically useful algorithm. We need data from multiple organizations. That could be multiple healthcare organizations, academics, big community healthcare organizations, biopharmas, drug companies, the government—all kinds of players who have very important pieces of the healthcare data. And again, all of them are incentivized to keep that data very private and very secure. Yet they all want to participate in the development of these very new, very powerful tools.
HL: How did your partnerships with outside companies develop?
Blum: UCSF has developed great capabilities in artificial intelligence, machine learning, and deep learning in healthcare. But as we started running into this problem that I was describing with the confidential computing and privacy preserving computing, it was obvious that our ability to develop algorithms was a small part of the problem.
One of our philosophical underpinnings with innovation is we develop new technologies, and we prove that they work in a clinical environment, but we don't scale them out. We rely on our technology partners to do that. To [identify] the experts in this privacy preserving and confidential computing space, some of our friends at Intel introduced us to our friends at Fortanix. And Microsoft is interested in this space as well. So quickly the groups came together and said, "This is a huge problem for the industry. We each have a piece of this; let's get together and see how it goes forward."
HL: How does the approach you're developing overcome the challenges you've described?
Blum: BeeKeeperAI and the Fortanix technology allow any healthcare organization to keep their data in their own private controlled cloud so they never have to send that data anywhere. It stays where they are, it's protected where it is. The data can still participate in the validation of AI algorithms—or eventual development of those algorithms. It allows you to have the best of both worlds. It allows the data to participate in this very important technology development, and it allows the data owners to benefit by either participating in the project or they could have financial benefit from it. They can use the data to help with development of the AI technologies, yet they can also feel completely comfortable that their data isn't going anywhere they don't want it to go.
On the flip side, the people who have developed the algorithms don't want their intellectual property stolen; they want to keep their secret sauce, secret. With the BeeKeeper and Fortanix technology, it's a zero-trust environment. Neither party has to trust the other party. The technology takes care of all of that. The algorithm is protected, and the data is protected.
HL: What are the biggest challenges that you have to overcome?
Blum: Time is one of the challenges for sure. There are a lot of other folks who are trying to develop privacy preserving technologies. There's an effort to get this into the marketplace. More importantly, the underlying goal here is to develop technologies that really help patients and help healthcare organizations. So far, AI has a lot of promise in that space, but because of the issues we talked about, there've been some significant disappointments. We want to drive that [process] much faster … to get to the benefits of these powerful new technologies.
AI technologies have the opportunity to affect everything from frontline patient care, to improvement in administrative functions, patient experience, and outcomes, and to decrease the cost of care. All that takes time. We want to accelerate the ability of companies and academic organizations to develop these technologies. In the current world, it takes years to create those data sets and then do the AI development on it. We want to reduce that to months—not years.
HL: What will be possible once you've achieved your goal?
Blum: We're building towards a future world where I, as a leader in a healthcare organization, can leverage the data that we have to help develop algorithms and technologies that can propel healthcare forward much faster and solve some of our pressing healthcare problems … in a way that my data set is completely protected. I can do it using the same technologies that feel very straightforward and comfortable to me, like cloud computing. I'm now able to participate in that development without having to worry that my data is at risk.
As an innovator and as an algorithm developer, I now have access to a large world of data that would have taken me years and cost millions of dollars to accumulate—if I ever could. I don't have to worry any longer that the IP [intellectual property] that I've struggled to develop is at risk for being exploited any longer. I now have access to the data I need to develop and validate my algorithms, and I know I can do that in a safe way. That's a much better world to be in from a healthcare and technology perspective than where we are now.
“The privacy that we all cherish in healthcare was an obvious barrier to this field advancing as it should.”
Michael Blum, MD, UCSF associate vice chancellor for informatics and executive director of the Center for Digital Health Innovation at UCSF
Mandy Roth is the innovations editor at HealthLeaders.
Acquiring broad data sets for AI clinical algorithms is time-consuming and expensive, driving up costs and delaying innovations that could improve patient outcomes.
UCSF's BeeKeeperAI platform will enable organizations to share data without ever losing control of it, creating a "zero-trust" environment.
The ability to create broad data sets from a variety of organizations not only improves the AI capabilities, but it also helps address inequities and disparities in care.