Skip to main content

Peer Review of Surgeons' Skills Carries 'Threatening Undertones'

 |  By cclark@healthleadersmedia.com  
   October 17, 2013

A study in which experienced surgeons submitted samples of their work for evaluation by an anonymous panel finds a large variation in technical skill. Now come questions about what is to be done with the information.

When experienced bariatric surgeons in Michigan were asked to select videos of their best gastric bypass operations and submit them for anonymous review so that their technical skills could be evaluated, judges found huge variations.

But most important, when their scores were linked with the state's risk-adjusted bariatric outcomes registry, patients of the most poorly rated surgeons' were twice as likely to die and have post-operative complications, than patients of surgeons with the best scores. And surgeons with the worst scores took 40% longer to complete their procedures than the best surgeons.

The study, published in the Oct. 10 New England Journal of Medicine, is loaded with implications. It suggests that hospitals, accreditation organizations and physicians have some deep soul searching ahead. Though it should be replicated by others, there is now a proven peer-review process to objectively critique the skill of surgeons who completed their training long ago.

Most important, and perhaps scary, is what doctors and hospitals do with the information they gain—however career limiting and economically devastating it may be—and whether patients have the right to know.

I asked the study's author, by John Birkmeyer, MD, director of the Michigan Surgical Collaborative for Outcomes Research and Evaluation, (M-SCORE) to discuss this fascinating and apparently first-of-its-kind study.

HLM: Has any other research project rated surgeons' skill and then paired it with patient outcomes like yours?

JB: To my knowledge, no. There's a large body of attempts to rate surgical residents' skill, but I'm not aware of any systematic attempt to peer-rate the technical skill of mature practicing surgeons, much less link those ratings to objective measures of surgeons' outcomes.

HLM: Might this prompt surgeons to be defensive and to object to such a process?

JB: Yes, there certainly is a threatening undertone to this.

HLM: The obvious question is whether this technique might be used to score surgical skill for procedures other than gastric bypass. Is it ready for prime time?

JB: There's no doubt in my mind we have a reproducible, reliable, and highly informative way of judging the comparative quality of practicing bariatric surgeons…although it will have to be confirmed by others.

And I believe it would likely extrapolate to other procedures such as robotic radical prostatectomy and robotic hysterectomy. And eventually, not just complex videoscopic surgery but other types of complex, complicated, surgery where how well the procedure is done bears on the outcome: Neurosurgery, spine, cardiac, vascular surgery and major orthopedics. I think scientific research will bear out this approach for a broad swath of surgeries.

HLM: What should you do with the poor performers? Now you'll have knowledge those folks aren't up to par, and you've now actually linked this to patient harm in higher complication rates. That's real.

JB: You have to ask who is the "you" in that sentence. Is it hospitals credentialing their own surgeons? The American Board of Surgery charged with certification of those surgeons? Regardless who the decision maker is, it's important to recognize there's a bell-shaped distribution. There will always be variation.

The challenge is where to draw that dotted line. What types of surgeons need to be targeted for remediation or redirection? The bottom 1%, 10%, or the bottom quarter? I expect there will be a lot of difficult discussions.

HLM: Yes, because you now have evidence of harm that you didn't have before. Surgeons submitted what they thought was their best work.

JB: That's a really important point, and I hadn't thought of it that way. The American Board of Surgery fails about 10% to 15% of surgeons every year, and they have to retake the test, and maybe go through other steps.

But you're right, the difference between those tests and what we're talking about here is, nobody has empirically linked how well you do on those tests to patients' risk of dying or other types of bad outcomes as we have with this study.

The stakes are much higher. That's why there's a threatening undercurrent to these findings as they get extrapolated to other specialties.

HLM: If I were a patient who had had surgery by a surgeon judged to have lower level of technical surgical skill, I would want to know. Did you tell the patients whose videos were used in this study?

JB: The study was monitored by the University of Michigan Institutional Review Board, and these surgeons submitted videos that were completely stripped of patient-identifying information. So we don't know which patients were attached to those videos. And furthermore, there was a blood oath of confidentiality, so even I don't know the identify of the surgeon who was the 2.6 or who was the 4.8. [Scores ranged from 1.0 – 5.0]

And I am completely sympathetic to the argument that this has such fundamental implications from the perspective of patients in Michigan, and why is it that we aren't telling them. I get that.

But at the same time, these Michigan surgeons really took a significant risk, subjecting themselves to this type of study. They deserve credit for doing a very difficult thing, putting the state of science forward. I couldn't in good conscience feel these surgeons should be penalized for the inevitable findings that put surgeons at both ends of this range.

HLM: But if this were to become mainstream, and you had this knowledge, wouldn't you have to tell patients prospectively?

JB: Do you think so? We don't currently. Patients don't have access to other measures of surgeon performance, even when surgeons are systematically tracked and get feedback on their outcomes against their peers. The profession has not made it standard of care for those data to be made available to patients.

I appreciate the distinction… that technical skill has a much more powerful association with a patient's true risk, but at the same time, it falls along a continuum of a broader range of information we have about the knowledge and skills of a physician.

HLM: Not to belabor the point, but in California and some other states, the public has access to mortality rates of named surgeons performing coronary artery bypass procedures. I personally know of physicians whose poor rates prompted their medical executive committees to reroute them to pursue another field of medicine.

JB: Cardiac surgery is the only specialty where that happens, and the public health effect of making that information available has been fairly small.  Sometimes, by chance, surgeons can wind up in a given year on the wrong side of the line.

The political challenges are real as well, because surgeons are very sensitive about their outcome rates, and at the end of the day, attribute those to factors bigger than just them. There was real palpable apprehension among the participating surgeons. They appreciated just how threatening this was.

HLM: I want to come back to the idea that in the view of these participating surgeons, this was their best work they submitted, right?

JB: The advantage of having surgeons select videos of their best work makes this even more reproducible. If you didn't control for case difficulty, you'd introduce bias and measurement noise.

HLM: Were you surprised that fellowship completion or teaching hospital practice weren't linked to better outcomes?

JB: No. Keep in mind that we were focusing on surgeons who had been practicing for a mean of a decade. They had done thousands of bariatric operations.

HLM: How can you protect against bias among the peer reviewers? Perhaps some might favor a technique or sequence similar to their own.

JB. We found very little evidence of that. I worried we'd have hard graders and easy graders that would contaminate findings. We re-rated all these videos with a second expert panel, with no attachment to the Michigan collaborative, and they were slightly harsher. And all the best surgeons in the original rating were still the best, and the worst surgeons still the worst.

And we had the best and worst surgeons submit a second video, just to make sure there wasn't something weird about what they submitted on the first pass. And again, when we re-rated those, the best and worst got put in exactly the same place.

And when I showed these videos to non-surgeons, including lay people, with no labels or prompts, every single lay person said, "Oh my God, I definitely would want the first guy," and they were exactly right, of course. Everything wasn't quite as good.

Tagged Under:


Get the latest on healthcare leadership in your inbox.