Analyzing CAHPS Survey Data
Users of CAHPS surveys may take a few different approaches to analyzing their data to generate results that can be reported to health care providers, administrators, and consumers. The CAHPS team supports users in conducting their analyses of the closed-ended surveys by developing and releasing the CAHPS Analysis Programs in SAS. In addition, the CAHPS team has been studying the potential of natural language process (NLP) as a way to analyze patient comments.
First Step: Get Data Ready for Analysis
Consult Preparing Data from CAHPS Surveys for Analysis (PDF, 369 KB) to learn:
- How to transform the raw data from the CAHPS surveys into data that can be analyzed using either the CAHPS Analysis Program in SAS or other statistical software.
- How to compute frequencies, top box scores, and other proportional scores (e.g., % Always, % Usually, and % Never and Sometimes).
CAHPS Analysis Program in SAS
The CAHPS Analysis Program—often referred to as the CAHPS macro—is a free program written for SAS (version 6.0 or later) that enables survey users to conduct the analyses needed to produce valid comparisons of performance across similar health care organizations. The CAHPS macro:
- Adjusts the data for case mix.
- Generates a distribution of survey results for each of the measures.
- Calculates the average score (the mean across all response categories) for both individual survey items and composite measures.
- Indicates whether an entity’s scores are statistically different from the average.
AHRQ’s CAHPS Consortium developed the CAHPS Analysis Program to work with all CAHPS surveys. It is updated periodically to add functionality, produce additional types of output, and correct or debug issues with previous versions.
Current version: 5.0
Requirements: Base SAS and the SAS/STAT module.
- CAHPS Analysis Program with test modules (ZIP, 68 KB)—updated June 2020
- Instructions for Analyzing Data from CAHPS Surveys (PDF, 965.5 KB)—updated August 2020
Using Natural Language Process to Analyze Patient Comments
In 2019 and 2020, the CAHPS team conducted a study to explore and assess natural language processing (NLP) approaches for the purpose of analyzing narrative comments. In this study, the team used various NLP and machine-learning methods to recognize the presence or absence of a specified list of concepts in a set of real-world patient narratives.
To do this, the team used four fairly simple approaches to implementing machine learning algorithms to predict the codes assigned to patient narratives by a team of human coders. In general, these approaches performed better than chance in predicting the human-generated codes. This is encouraging as it supports the promise of these approaches to analyzing patient narratives at a larger scale. However, the machine learning approaches performed far better at predicting common codes than at predicting rare codes. The performance of these approaches could be improved by increasing the size of the data set used to train and build the machine learning algorithms, but creating these data sets is labor intensive.
One important conclusion is that there may be labor-saving potential in leveraging the strengths of both machine and human coders, potentially in creative ways. For example, machines could first be used to optimize the sample of narratives supplied to human coders, such that rare-but-important content is oversampled. Efficiency may also be gained by contracting model building to specialized companies.
Read more about the team’s methods and findings: Using Natural Language Processing to Code Patient Experience Narratives Capabilities and Challenges.