Monday, May 23, 2016

Operative risk and surgeon decision-making

Should it surprise you that surgeons might have differences of opinion about whether or not a patient should have an operation?

It doesn't surprise me, but apparently a lot of people were taken aback by an Annals of Surgery paper published online last week stating just that.

The authors gave 767 surgeons four brief complex clinical scenarios and asked whether they would operate on each patient. The vignettes were purposely designed to not have "correct" answers.

In response to the question would you recommend an operation, the surgeons could choose one of the following responses: very likely, unlikely, neutral, likely, or very likely.

If you were in the emergency department with mesenteric ischemia, would you want a surgeon who responded "neutral"?

Why the authors selected five possible choices is puzzling. In real life when you are faced with a difficult decision in the middle of the night, you don't have five options. You have only two—operation or no operation.

More about that later.

The surgeons' estimation of the risks of each procedure varied widely, and most of them agreed about recommending surgery only for the patient with a small bowel obstruction.

In a Vox story about the paper, a Harvard health policy expert, Dr. Ashish Jha, said the findings were “disturbing."

I would call the findings "expected." These were difficult cases with no right answers.

A second paper in the same journal by the same investigators came up with somewhat different results. It randomized 779 surgeons into two different groups. One group had access to the American College of Surgeons operative risk calculator score and the other did not. For the same four clinical scenarios, surgeons who were given the risk calculation score estimated risks significantly closer to what the calculator’s values were—another non-surprise.

The difference between the estimated risks between the two groups was statistically significant but probably not clinically significant. For example, surgeons who used the risk calculator score estimated operative risk for the small bowel obstruction patient at 13.6% compared to 17.5% for the surgeons who didn't know risk calculator score, p < 0.001. Would the 3.9% difference between the two estimates really change a surgeon's mind about operating? I doubt it.

The effort to quantify risk so precisely may not only be wrong; it could be impossible.

Radiologist Saurabh Jha blogged about this two years ago. He wrote, “Numbers are continuous. Decision-making is dichotomous. One can be at 15.1%, 30.2% or 45.3% risk of sudden cardiac death. But one either receives an implantable cardioverter defibrillator (ICD) or does not. Not a 15.1% ICD.”

Jha concluded, “You can remove the burden of judgment from a physician but then you will no longer have a physician.”

As the authors of the two papers pointed out, among other points to be considered is that the risks of not operating are unknown. The topic has not been studied and probably never will be.

The most important finding of the second paper was that "averaged across the four vignettes, the two groups did not differ in their reported likelihood of recommending an operation, p = 0.76."

Since the first paper portrayed surgeons as wildly erratic at estimating risk, it of course received all the attention.


Arnon Krongrad, MD said...

Imagine how inconsistent are surgeon assessments when clinical outcomes are only apparent not minutes later but years later.

We once presented clinical vignettes and asked surgeons (and radiation oncologists and geriatricians) to state if Connecticut prostate cancer patients would live: 1) <5 years, 2) 5-10 years; or 3) >10 years after diagnosis. Among the details they received were Charlson index abstracted from charts and cancer grade re-read by Donald Gleason, who invented the grading system. The real outcome was determined from reports from the medical examiners about time of death.

The experts got 34% right. They were essentially flipping three-sided quarters.

It is no surprise at all to see surgeons varying in their assessments. With time, if we better gather data and analyze, their assessments will be more uniform. Until then, what we have is their judgment.

Skeptical Scalpel said...

Arnon, thank you for the comment. The prostate cancer prediction outcome is very interesting. Was that ever published anywhere? If so, I'd like to have the citation.

Unknown said...

The project was winding up as I was winding up life at the university, so it sits archived in a box in my attic, never published. That said, I have pulled the box and will share one related image by Twitter in just a bit. More perspective ...

In those days, we were modeling all kinds of stuff related to prostate cancer outcomes. Among the analyses was one that looked at proportion of variation explained (PVE) by certain factors. One data set was the Connecticut watchful waiting dataset, which had been unusually groomed by UConn chairman Peter Albertsen: chart abstractions, pulling biopsy to be re-read ...

The model we developed, also not published, got to total adjusted PVE of 29% for survival, which was a substantial improvement over a previous effort using SEER data with different variables. So using that Connecticut dataset, we set out actually to next test if experts could outperform mathematicians in predicting individual outcomes defined as:

1) time to death (3-part multiple choice)
2) cause of death (2-part; prostate CA vs. other)

We asked 12 experts to participate, 3 each from 4 disciplines: urology, medical oncology, radiation oncology, and internal medicine/geriatrics. Every participant had a demonstrated lifelong interest in prostate cancer.

The experts did badly, as referenced above. In the second test, the 2-part multiple choice, they scored 60%. For a 2-part multiple choice getting at the heart of their daily work -- will a man with prostate cancer who is not treated for prostate cancer die of prostate cancer -- doing a hair better than flipping quarters doesn't seem too impressive.

Incidentally, there was no real observed difference among the specialties. The urologists did as badly as the geriatricians, etc.

We then also submitted a learning set to a mathematician skilled in neural networks and biostatistics. He trained his computer, which, given a validation set, outperformed the experts, but minimally. So in short, in our surgical "Deep Blue vs. Garry Kasparovs" the machine won, but everyone did minimally better than flipping coins.

This gets to the notion of PVE. When the individually informative variables explain little of the variation in the distribution of the outcome, no model can precisely predict individual outcome: not human, not machine. In the grand scheme, I think we under-appreciate this point and get distracted by "statistically significant" but inadequate input variables. It ain't enough to look at Charlson Index and Gleason score. Something else is influential in explaining variation in outcomes. We just don't know what that is yet.

This ties to your post about variation in hospital readmission. If surgeons account for 2.8% of the observed variation ... etc.

Unknown said...

Studies like this, wherein the opinions of various surgeons are compared in a clinical scenario, are inherently flawed by the fact the scenario is inherently artificial. Why? Because no such scenario can impart the human side of the story, or the art of medical practice.
Every patient comes to the physician with their own unique baggage, be it age, medical history, or personal situation. What might be appropriate for one individial with a bowel obstruction or perforated ulcer might be less so for another, differently situated patient. Add to this what the surgeon brings to the table, his/her experience with a given procedure, skill set, and available resources.
The decision to operate or not is only the first of many decisions. And every one of them must be colored with an eye toward what is right for that particular patient, that particular person, that particular family. The choices are often not clear cut, usually being some shade of gray on a black/white scale.
It is not at all surprising that physicians in general and surgeons in particular so often disagree.

artiger said...

"...a Harvard health policy expert, Dr. Ashish Jha, said the findings were “disturbing."

How many health policy experts actually operate?

Anonymous said...

Needed a study for that? I could have told them that. I think the issue in my case, would have helped, is a gut check for the surgeon. Doing a little bit of EBM would have helped. They didn't base their decisions on what was in front of them and that is what hurt more than anything.

Skeptical Scalpel said...

Arnon, thanks for the explanation of your work. It is interesting and confirms my opinion about the two studies I wrote about in the post. I still think you should publish it.

Edison, I agree that no matter how many people insist that vignettes are a valid way to assess surgeon judgment, there is no substitute for talking to and examining the patient in person.

Artiger, some health policy experts operate, but not the one who commented on the paper.

Anonymous, I'm not sure what the surgeons in the studies based their responses on. That was not addressed.

Post a Comment

Note: Only a member of this blog may post a comment.