Skip to main content

Probe Uncovers Hospitals' Inability to Protect Patient Privacy

 |  By smace@healthleadersmedia.com  
   June 25, 2013

Researchers and a journalist were able to re-identify, without much fuss, the de-identified medical records of scores of patients, thought to have been protected by HIPAA. Here's how they did it.



theDataMap

Patients concerned about privacy have more than flimsy hospital gowns to worry about. Their medical data may be showing.

First, a visual aid. Click on the map and take a look at where much medical information is flowing today.

This map, constructed by some of the nation's leading privacy experts, is an apt illustration for a big problem. In theory, all the healthcare providers on this map are complying with HIPAA, the Health Insurance Portability and Accountability Act of 1996 and its subsequent amendments.

So how come, in a year-long investigation, a few researchers and a journalist were able to re-identify, without much fuss, the de-identified medical records of 85 patients treated in Washington state in 2011?

The answer to that question is a big challenge to HIPAA and to providers in 33 states, and perhaps beyond.

The story resulting from this investigation, "States Hospital Data for Sale Puts Privacy in Jeopardy," hit Washington D.C. like a health privacy bombshell earlier this month. As I've been researching a HealthLeaders magazine story on HIPAA, it's obvious that the revelations are troubling to healthcare CIOs and other executives as well.

Here's how the investigation worked: State public health departments, eager to expand medical research, collect de-identified discharge data from hospitals. HIPAA permits this disclosure, in part because the privacy advocates who helped write HIPAA made it easy for states to pass tougher versions of the federal HIPAA law. The problem is, however, that most never did. So in 33 states, this discharge data gets sold for little or no money to all takers.

Through a Freedom of Information Act request process, the story's author, Bloomberg BusinessWeek writer Jordan Robertson discovered that the primary buyers of this data turn out to be public and private corporations not primarily known as public health researchers: Truven Health Analytics, Optuminsight/Ingenix, and WedMD, among others.

"Hospital records are very useful in enriching prescription data databases, because a prescription record will only show you what medication you're on," Robertson told an audience at the 3rd Annual Summit on the Future of Health Privacy in Washington D.C., which I attended.

"If you can link that with a hospital record, you can also learn what your original diagnosis was, which physician recommended you that particular drug, as well as all these ancillary conditions, so it turns out, and I had no idea about this, but hospital discharge data is one of the most valuable pieces of data in the medical data ecosystem," Robertson says.

No one is yet saying that these companies are the ones re-identifying patient data, but Robertson's investigation shows how easily it can be done.

Assisting Robertson in his discovery was computer scientist Latanya Sweeney, professor of government and technology in residence at Harvard University and the mastermind of the Data Map, first envisioned in 1997 and used to discover former Massachusetts governor William Weld's medical records in a redacted data set.

"Suppose you know someone went to the hospital," Sweeney said. You might know the name of the hospital, the data of admission, and maybe something about why they were there. "It's also the same kind of information that a financial institution would know about someone who said they were going to be late making credit card payments, because of a hospitalization," she said. "It's also the same kind of thing that a data mining company would know could extract from purchasing pharmacy data."

Another source of such data is news stories, so Sweeney and Robertson surveyed the NexisLexis online researcher service, and by tapping only three news sources in the state of Washington, the team was able to search for mentions of hospitalizations, and identified 81 subjects, many in news stories identified by their names, their ages and what happened to them.

Through some other searches on the public Internet, the investigation exactly identified 35 of the subjects, a 43 percent success rate. Sweeney then hired a temp who could use computers but didn't have a specific background in computer science, statistics, or medicine. The temp had two days to match the remaining 46 subjects using ordinary Internet searches, and was able to match every one of them.

When Robertson contacted subjects for comments for his story, they were astounded, and sometimes angry, that he knew their medical diagnoses and treatments from those hospitalizations, and that the system could be used in such a fashion.

"The data has a lot of value in the wrong hands, and we've chosen to publicize this, because we're trying to draw attention to it," Robertson says. "This could have been done just as easily by a private investigator or by a short-seller, if they had the wherewithal and the means to do it."

It dawned on me that while providers may do everything they're supposed to do to abide by HIPAA, loopholes like this state public health exemption, renders information accessible. And once it's on the Internet, data can live forever.

States may be lulled out of their inaction by Robertson's story. Already, Washington state has told Robertson it intends to tighten its data standards. Unless the entity requesting the information is truly a public health agency, the state will likely charge steeper fees to access the data. Already, the state of Pennsylvania, seeing increased demand from commercial data companies, increased the cost of the data sets.

But Robertson noted that the uses of secondary health data, including for marketing purposes, is projected to be in a $10 billion industry by 2020. So how likely is it that commercial interests will let higher fees slow them down?

Meanwhile, it's worth pondering just how injurious the release of this data can be to patients.  

In his investigation, Robertson discovered that the records included diagnoses and procedures, diagnoses mostly, that many patients didn't even know that they had. And even worse, some of these diagnoses can really impact a patient's public reputation.  

"Somebody who goes in for a broken arm from a car crash might show an addiction to heroin, or methamphetamines, or cancer, or all these other things that show up in a hospital intake," Robertson says. "There's some really sensitive stuff in there. Many of these states are releasing more data than even some of them realize."

Technology almost always is a two-edged sword. No one wants to deny researchers the data they need. The promise of big data and analytics is to do an end-run around this country's archaic clinical trial process to discover new statistical correlations between genetics, environment and disease. None of it is possible without electronic medical records and the aggregation that big data makes possible.

But we must be careful. The genie is out of the bottle, and adequate safeguards must be in place, or our health privacy will be endangered. At the end of Sweeney and Robertson's talk, one audience member asked, probably rhetorically, if her medical records could return exclusively to paper.  

No one really expects that to happen. But this bombshell of a story is a warning that we must deal with all the implications of the electronic systems we are deploying.

Scott Mace is the former senior technology editor for HealthLeaders Media. He is now the senior editor, custom content at H3.Group.

Tagged Under:


Get the latest on healthcare leadership in your inbox.