First you need to know something about HealthGrades and its rating system. Using a large Medicare administrative database (that is, the data are submitted by hospitals for billing purposes), HealthGrades compares hospitals on an observed vs. expected outcomes basis. For some reason, hospitals are rated as five-star (best), three-star (as expected or average) or one star (poor). There is no mention of four- or two-star. And according to their methodology, “…70% to 80% of hospitals in each procedure/diagnosis were classified as three stars, with actual results not significantly different from predicted results. Approximately 10% to 15% were 1-starhospitals and 10% to 15% were 5-star hospitals.” For non-statisticians, that would be classified as a normal distribution.
Now what would happen if every hospital in the U. S. performed at the level of a five-star hospital? Well, the observed rate of complications and deaths would go down but as long as one compares observed vs. expected outcomes, the distribution of hospital ratings would still be normal with 10%-15% being above average, 70%-80% average and 10%-15% below average.
Therefore, with the possible exception of hospitals in Lake Wobegon (“Welcome to Lake Wobegon, where all the women are strong, all the men are good-looking, and all the children are above average.” [Garrison Keillor]), all hospitals cannot be above average.
Then there is the problem of using administrative databases to judge clinical outcomes. By this passage from HealthGrades’ own description of its methodology the following disclaimers are listed.
“Limitations of the Data Models
It must be understood that while these models may be valuable in identifying hospitals that perform better than others, one should not use this information alone to determine the quality of care provided at each hospital. The models are limited by the following factors:
“Cases may have been coded incorrectly or incompletely by the hospital.
The models can only account for risk factors that are coded into the billing data–if a particular risk factor was not coded into the billing data, such as a patient’s socioeconomic status and health behavior, then it was not accounted for with these models.
Although Health Grades, Inc. has taken steps to carefully compile these data using its methodology, no techniques are infallible, and therefore some information may be missing, outdated or incorrect.”
There are a number of peer-reviewed articles questioning the validity of using administrative databases in clinical outcomes research. A study of patients with cerebral aneurysms, from the Bloomberg School of Public Health at Johns Hopkins University, found many large discrepancies between the Maryland state administrative database and the clinical records of the patients at their institution. A paper from Harvard and Tufts concluded “Cardiac surgery report cards using administrative data are problematic compared with those derived from audited and validated clinical data, primarily because of case misclassification and non-standardized end points.” A systematic review of papers on infectious diseases found that administrative databases have “limited validity” for the evaluation of co-morbidities, a key factor in risk adjustment.
Try this for some hospitals that you might be familiar with. Compare HealthGrades ratings with “Medicare Hospital Compare,” which one must assume is using the same outcome data since HealthGrades uses Medicare’s data for its ratings. Here are the results for heart attack outcomes for three hospitals in New York City. (See Table.) The rating scales are the same, three possible grades.
I don’t know which one to believe. Do you?
Note: A previous blog post of mine pointed out a few other issues with HealthGrades that everyone should be aware of.