Skip to main content

$3 Million Health Puzzler Draws to a Close

 |  By cclark@healthleadersmedia.com  
   April 04, 2013

Nearly 2,000 math wonks from around the world who entered a health claims data hacking marathon with a $3 million purse can shut down their computers and kick back.

Last night was the deadline by which teams had to submit entries to win the coveted Heritage Health Prize.

More than 1,600 teams of stats wizards, physicians, engineers and even rocket scientists submitted nearly 36,000 ideas of how to solve one of the greatest prediction puzzles in healthcare:

Based on claims history, which individuals in a health network population are likely to be admitted to a hospital in one year, and for how long?

What great knowledge to have. With prescience like that, payers would happily pull out all the stops to target precious resources and keep high risk enrollees from needing costly hospital care and gobbling $30 billion a year that might not have to be spent.

But winning the big Heritage pot, some contestants tell me, is probably impossible because the requirements for winning are so stringent.

The contest's rules require the winning formula to identify the actual days of hospitalization that were required for 115,000 anonymized Southern California health plan enrollees from about 70,000 of their patient records for each year. The accuracy threshold score must be .40.

Believe me, I was told by some of the contestants, this is a really tough thing to do. And by deadline, partial scores reveal that no one got there.

"They set the bar too high," says Chip Lynch, a data consultant in Kentucky whose team, "ChipMonkey," is in 84th place.

There are some hints at who might have an edge, though. A leaderboard shows partial accuracy scores for each team, based on a 30% portion of the test claims database. But there is no clear winner and there may not be.

If no one wins, the team in first place gets only $500,000, and there will be no $3 million prize.

The contest was announced in December, 2010 by Richard Merkin, MD, Heritage's president and CEO, who officially launched it two years ago today. Entrants had to qualify for the big prize by getting a certain percentage right by last October 4.

For privacy reasons, the three years of real claims data were downloaded in a de-identified, sanitized version of the actual claims database provided by the contest's sponsor, the Heritage Provider Network. HPN is an ACO-like physician group based near Los Angeles with 700,000 covered lives in eight Southern California counties.

Jonathan Gluck, spokesman for the prize, explained the reason his boss is going to all this trouble: "You never know when you look at a problem how it's going to be solved. And having these different perspectives to think about it in different ways, we thought would lead to novel solutions."

The "getting people to think" part has succeeded, Gluck says, beyond their expectations. "We are ecstatic with the wide scope of individuals participating from around the world, from all walks of life." Even a hedge fund manager is part of a team.

Gluck says the winning algorithm will become the property of the Heritage Provider Network.

"We'd originally intended the algorithm would be publicly available," he explains. "But… we had people raise the question of whether insurance companies might use this dataset to raise insurance premiums.

"So to avoid that issue—we're doctors, not an insurance company, and it was never our intention to do that—we'll keep the algorithm proprietary so we can make sure it's shared only with entities that are going to use it for appropriate purposes."

It's hard to get people to understand, Gluck says, that a true predictive model would actually lower, not raise, insurance rates, because it helps providers know how preventive care would benefit certain patients "to keep them from going to the hospital."

"When you think about it, a three-day stay in Southern California can cost $12,000. And I can give you tons of preventive care, avoid the hospitalization, and the system will still come out way ahead," Gluck says.

Entrants had access to a lot, but not all information on the Heritage claims database. They could see, for example, how many times a patient had been to a primary care doctor or specialist. They could also see the enrollee's sex, ranges for age and zip codes, how many days the patients had spent in a hospital in recent years, and their comorbidities, as well as some information on prescription usage.

Their weight, race, socioeconomic status, educational level, occupation and employment data were not provided, nor were the names of their providers.

Some teams tried to increase their odds by submitting multiple—in one case as many as 670—entries with only minor variations among them, a strategy called "overfitting."

Mansour Sharabiani, MD, a member of a two-person team called "Almata," which shows up as the top contender on the accuracy leaderboard, e-mailed and called me from London where he is a research epidemiologist and biomedical scientist with the Imperial College, specializing in predictions of mortality from heart attacks.

He submitted 352 entries, thinking his contributions might "lead to improvement of quality of care," but also because of the "fascinating aspect… to participate in a world class competition that had drawn top scientists and Nobel Prize winners as well as important players in the industry."

But even though his score is the highest, it's a few points short of the .40 required. Does he think he has a chance of getting the $3 million, or merely the $500,000?

"We have a reasonable chance," he replies, "although nothing can be said with certainty." That's because how each contestant's score would fare against the remaining 70% of the database is anyone's guess. "We do not know whether the subsamples were split on a random basis, or there were other considerations," Sharabiani says.

As the website warns, "The final standings may be different." 

How would he spend the prize money? I asked Sharabiani. "I don't know," he replies. "I hadn't planned to have the money. This was more about being a challenge for me and the money really didn't count."

On April 8, contest organizers will reveal all but the top 10 place holders, and those 10 will be informed privately. During the following eight weeks, their algorithms will be verified.

The winning entry and its $3 million prize—or its $500,000 one, will be announced at Health Datapalooza IV June 3 or 4 in Washington, DC.

Pages

Tagged Under:


Get the latest on healthcare leadership in your inbox.