Synthetic Healthcare Database for Research (SyH-DR)
The Synthetic Healthcare Database for Research (SyH-DR) is an all-payer, nationally representative claims database. The database consists of a sample of inpatient, outpatient, and prescription drug claims, including utilization, payment, and enrollment data, for people insured by Medicare, Medicaid, or commercial health insurance in 2016. AHRQ created SyH-DR, in part, as a resource to facilitate improvements to price and quality transparency in healthcare.
SyH-DR is a synthetic database that replicates the structure and statistical properties of the original claims data while protecting privacy and confidentiality of people and institutions. Synthetic data are created by statistically modeling or changing original data so that new values or data elements are generated while maintaining the original data's statistical properties. Additional steps, such as masking, are taken to reduce the risk of identifying people and institutions so that the data may be made publicly available to a broad community of researchers.
An approved application and data use agreement are required for access to SyH-DR.
Overview of SyH-DR
- The Agency for Healthcare Research and Quality (AHRQ) created SyH-DR from eligibility and claims files for Medicare, Medicaid, and commercial insurance plans in calendar year 2016.
- SyH-DR contains data from a nationally representative sample of insured individuals for the 2016 calendar year.
- SyH-DR uses synthetic data elements at the claim level to resemble the marginal distribution of the original data elements.
- SyH-DR person-level data elements are not synthetic, but identifying information is aggregated or masked. Please see [the data documentation] for a complete listing of synthetic, masked, and retained data elements.
- Although SyH-DR was designed to be analytically valid, researchers should be aware of the recommendations and limitations described in [the data documentation]
Data Files in SyH-DR
SyH-DR consists of 14 files. Each of the three insurance categories (Medicare, Medicaid, and Commercial) has three claims files (inpatient, outpatient, and pharmacy) and a person file. In addition, two provider files provide a limited set of hospital characteristics and are linkable to the claims files by the facility ID. All variables were harmonized across payers so that the files have the same structure, variable names, and definitions to allow for ease of analysis across payers.
- Commercial Inpatient File
- Commercial Outpatient File
- Commercial Person-Level File
- Commercial Pharmacy File
- Medicaid Inpatient File
- Medicaid Outpatient File
- Medicaid Person-Level File
- Medicaid Pharmacy File
- Medicare Inpatient File
- Medicare Outpatient File
- Medicare Person-Level File
- Medicare Pharmacy File
- Medicaid Provider File
- Medicare Provider File
AHRQ approval is required for access to SyH-DR. To request access to SyH-DR, follow the steps included in the Getting Started Guide (PDF, 13 MB) and submit the required application form and data use agreement (PDF, 516 KB). Completed applications will be reviewed by AHRQ.
- SyH-DR Sampling, Weighting, and Synthetization Methodologies (PDF, 12 MB)
- SyH-DR Codebook (PDF, 3 MB)