Wednesday, March 19, 2014

A study says you can trust online physician ratings

This abstract comes from the Social Science Research Network:

Despite heated debate about the pros and cons of online physician ratings, very little systematic work examines the correlation between physicians’ online ratings and their actual medical performance. Using patients’ ratings of physicians at RateMDs website and the Florida Hospital Discharge data, we investigate whether online ratings reflect physicians’ medical skill by means of a two-stage model that takes into account patients’ ratings-based selection of cardiac surgeons. Estimation results suggest that five-star surgeons perform significantly better and are more likely to be selected by sicker patients than lower-rated surgeons. Our findings suggest that we can trust online physician reviews, at least of cardiac surgeons.

You won't be surprised to learn that I don't believe it. As is my custom, I decided to read the entire paper the full text of which can be found here. At 37 pages, the raw manuscript is rather lengthy. As a public service, I waded through it.

The authors, non-MD faculty from the William E. Simon Graduate School of Business Administration at the University of Rochester, in New York, combed the ratings for Florida cardiac surgeons on the website and classified surgeons into three categories—five-star surgeons, non-five-star surgeons, and those with no ratings at all.

They looked at 799 quarterly opportunities for ratings over a 9-year period and found that 21% of surgeons had an average of 1.9 online ratings. The 79% of surgeons who did not have an online rating performed 79% of the total surgeries in 2012, the year that the authors analyzed for patient results.

The five-star surgeons had a mean of 1.8 reviews each, and only 10% had more than 2 reviews.

The average mortality rate for coronary artery bypass grafting (CABG) among the Florida cardiac surgeons was 1.8% in 2012. The five-star surgeons with multiple reviews had the highest mortality rates at 3.3%.

I could find no evidence that patient mortality rates were adjusted for risk. But a lot of statistical manipulations took place. It's all explained by this simple equation—one of many.

 The authors say, "For a representative patient who is severely ill, being treated by a five-star surgeon can reduce the in-hospital mortality by 55% compared with being treated by a non-five-star surgeon. [I have no idea how they determined that figure.] Moreover, the negative and significant coefficient of no-ratings suggests that patients treated by surgeons without ratings also have a lower mortality rate than those treated by non-five-star surgeons, all else being equal." Huh?

And this, "Patients with private insurance are less likely to select the surgeons without ratings than patients with Medicare. We suspect that patients with private insurance have to use search engines to figure out whether a surgeon is within the network that an insurance plan covers, while government patients enjoy a large physician network." I question that assumption. My experience is that patients with Medicare sometimes have problems finding anyone to care for them, let alone the best surgeons.

It turns out that half of the five-star surgeons had only one review. In one iteration of the study model, five-star surgeons with multiple reviews had higher mortality rates than those with only one review, but then they also say, "One surprising finding is that five-star surgeons with a single review show no statistical difference in performance from those with multiple reviews."

Are you as confused as I?

The paper makes no mention of the possibility that some of the online ratings could be fake. Recent articles [here and here] suggest that one-fifth to one-third of such reviews are phony.

You can manipulate the statistics all you want, but you won't convince me that one or two or even 20 online ratings are valid or useful in choosing a surgeon.


Anonymous said...

I agree - this is a really lame paper. What's the statistical basis for binning 5 stars, all other star surgeons, and surgeons with no ranking? You have a 1-5 scale: Use it! They should have modeled the hazard for each additional star or half-star, except there probably simply wasn't enough data to reach the conclusion they started out wanting to reach.

Does anyone really believe that a surgeon who got two 5 -tar ratings is appreciably better than a surgeon who got one 4-star and one 5-star rating?

P.S. It looks like this is (basically) just a logistic regression.

Skeptical Scalpel said...

Thanks for commenting. You and I agree.

Anonymous said...

I agree that user ratings for physicians are useless. But, how should one choose a surgeon, or any other physician? Office friendliness and bedside manners play a part, of course, but there is no real metric on outcome adjusted for patient morbidity and hospital resources.

So, are we just throwing up our hands and say every doctor is as good as any other? How does a patient choose a "good" doctor, cost no object?

Are surgeons (and other doctors) fungible service providers,

Skeptical Scalpel said...

Anon, excellent question. I would ask my primary care doctor "Who would you have operate on you or a family member?" Another option would be to ask nurses at a hospital who they would choose.

Here's something I wrote sort of on the subject.

GOTS said...

Objectively rating any type of physician is hard and in some cases would seem to approach the impossible (how does one rate a psychiatrist or allergist or ID doctor?). However, cardiac surgery is one of the few specialties for which objective information does actually exist. The Society of Thoracic Surgeons (STS) instituted an Adult Cardiac Surgery Database over 20 years ago. It now contains records of approximately 5 million operations. The data is objective (age, gender, comorbidities, anatomy of coronary lesions, valvular pressure gradients, bypass times, blood loss, complications, length of ventilator support and ICU stay, hospital inpatient duration stc). These data are collected by independent data managers and submitted electronically to the Duke Clinical Research Institute where it is independently analyzed. Every participating institution and each physician is provided with specific risk adjusted outcomes (mortality, morbidity, LOS) on a yearly basis with comparisons to the norms of other programs within the participants own region and to the nation as a whole. Results that are outside of statistical norms (both bad and good) are highlighted so that each program can assess their own strengths and weaknesses. The database is a voluntary one but programs are audited by an independent agency at certain intervals and the data accuracy has been excellent. The level of penetrance is remarkable in that currently over 93% of the cardiac surgical procedures done in the US are being submitted to the Database. Similar Databases involving General Thoracic surgical procedures and Congenital Heart surgical procedures have been instituted but have less penetrance as they have been in existence for less than a decade.

The Database was begun for internal quality improvement reasons and the feedback given to each program allowed them to identify and subsequently address their own weaknesses. Over the past 20 years, this has led to a progressive decline not only in absolute mortality statistics, but also improvements in the observed to expected mortality, morbidity and length of stay nationwide.

In the last two years, The Society collaborated with Consumer Reports to publicly provide quality ratings for cardiac surgery programs using a one star, two star or three star rating system.
Programs have to volunteer to be included in this Public Rating System and, although there is understandable reticence to provide one's own ratings publicly given the medicolegal climate, about 40% of all the programs are currently participating.

I am unaware of any other data driven, risk adjusted, independently audited quality assessment program that allows for objective evaluation of the vast majority of practitioners within any given specialty. It has been a costly effort both in terms of dollars and manpower but it likely represents most accurate and thorough rating system currently in existence for ANY specialty.

So the answer is is possible to rate surgeon's results in an objective fashion. What this system does NOT rate are the interpersonal skills, compassion and patience demonstrated by cardiac surgical practitioners. Also, as it covers only the postop stay, it addresses only the operative mortality and morbidity and NOT the long term results (survival, freedom from infarction or CHF hospitalizations etc).The STS is attempting to remedy this latter situation by linking the Database directly to the CMS administrative database, thus allowing identification of hard endpoints such as heath, number and cause of hospitalizations and medication costs. It's their hope that in the future, such combined databases will allow for short term and long term quality improvement initiatives that will fundamentally improve healthcare in the specialty.

Skeptical Scalpel said...

GOTS, thanks for the detailed comment. The cardiac surgery rating system is not the type of online rating I was blogging about. I was referring to the ratings by patients and their lack of credibility as well as the paper that claimed online patient ratings could be trusted.

The cardiac surgeon ratings you describe are useful, but if 60% aren't participating, it's not clear how useful the ratings are. What about things like indications for surgery and refusing to operate on the sickest patients, which was a problem (and maybe still is) with the New York state outcomes reporting?

As you mentioned, things like interpersonal skills, compassion and patience, which are important to many patients, are not a feature of your system. Of course, it boils down to do you want a compassionate surgeon with poor technical skills to operate on your heart or would you rather have a jerk with excellent technical skills do it?

Anonymous said...

Thanks for the cardiac surgery post.

There are 2 things that strike me.

The long-term results (e.g. survival, morbidity 3 months post-op) are not known with the current database. I think that most patients would consider that important.

There is an opportunity to game the rating system, in the way that colleges and law schools do. Getting a patient off the ventilator, out of the ICU, and home are admirable goals. But, isn't there the temptation to do all three just to bump up rankings?

These cardiac surgery annual reports are now mainstream, and are trumpeted in local newspapers.

StaphofAesclepius said...

Like a certain Texas neurosurgeon's high ratings? No thank you. I like your idea of rating patients more.

Skeptical Scalpel said...

Staph, thanks. Despite a few obstacles, it might work.

Skeptical Scalpel said...

Anon, good points. The system can be gamed, that's for sure.

Anonymous said...

@GOTS: Is it possible for the public to view these outcomes?

Skeptical Scalpel said...

I found the link by googling Society of Thoracic Surgeons. Here it is

Anonymous said...

Hmmm I've rated doctors. I've put the specifics into good and bad, both listing technical competence, knowledge of subject, ability to handle simple/complex problems (problem solving), while at the same time rating bedside manner, his office staff, etc. I'm probably one of the few who does that, generally why I have people pick up on most of what I list. I tend to be more factual and give examples of good/bad. I was once called by an "MD's MD" 'very perceptive' and detailed.

If you pick all the ratings together if you use common sense, you can generally figure it out. I usually can - especially with websites that only print good stuff.

Skeptical Scalpel said...

My question is, "How do you know that a website is printing 'good stuff' when fraudulent reviews are so common?"

Post a Comment

Note: Only a member of this blog may post a comment.