Population Health: Behavioral and Social Science Insights

Mathematical and Computational Simulation

Full title
Mathematical and Computational Simulation of the Behavioral and Social Drivers of Population Health

By Mark G. Orr, Bryan Lewis, Kathryn Ziemer, and Sallie Keller


The use of computational and mathematical simulations is gaining momentum in population health. Specifically, these methods are particularly useful when examining population health issues that implicate dynamic behavioral and social processes. The goal of this chapter is to familiarize the reader with a variety of modeling approaches, across scales and types of phenomena, and to discuss emerging issues and directions. We review some examples of recent work that addresses research questions that are difficult to penetrate precisely because of the complexity they represent: (1) the dynamics of health behavior within the individual, (2) policy effects on the black/white disparities in obesity-related behaviors, and (3) the effects of human behavioral choices on the spread of infectious disease. Then, we address some of the implications of simulation and modeling for public health practice ranging from the obvious (e.g., incorporate simulation into public health practice where appropriate) to the more subtle (e.g., simulation may highlight unforeseen data needs) and offer guidance for potential users of the simulation approach. The directions for future research are many and multifaceted; we emphasize two key challenges in moving forward. The incorporation of "Big Data" into simulations is welcome and inevitable but comes with several manageable but substantial obstacles. Finally, we suggest that the key challenge for moving forward is cultural. How do we build consensus that simulation should be a core methodological approach in population health?


Defining Computational/Mathematical Simulation and Its Purpose

A natural confusion arises when one tries to explain computational and mathematical modeling and the reasons why we use it to aid in understanding and leveraging changes in population health. Such confusion probably stems from the ubiquity of the statistical modeling approach in population health, a highly successful venture that has aided greatly in determining which variables are associated with key health-related outcomes. Computational and mathematical modeling has a different but related focus—i.e., to understand the systems, processes, and related dynamics of population health phenomena. Most often, this is accomplished using computer simulation of some sort, a method for understanding how the dynamics of a system unfold.

The strength of the computational/mathematical modeling approach is its focus on dynamics, feedback loops, and interdependent non-linear processes, all of which are notoriously difficult to estimate1 with the statistical modeling approaches commonly used in population health. Furthermore, it affords a useful mode for thinking outside of the available data, towards future data, and potentially, about what is not yet known. In short, computational and mathematical modeling offers an alternative and unique perspective in relation to statistical modeling. Thus, the two approaches are well poised to be mutually informative.

With the exception of mathematical models of infectious disease, the use of simulation in population health was relatively novel until recently (for two early exceptions, see Morris and Kretzschmar2 and Weinstein, Coxson, Williams, et al.3). The systems science approach, coming into prominence in the 2000s,4-7 has helped to spur the recognition that the tools used to study complex systems, including simulation, can be useful for studying population health phenomena, especially in three areas—theory development, intervention/prevention, and policy—all of which use simulation for understanding the implications of an idea, X, specifically, for providing insight into three types of "what-if" scenarios. In theory development, the "what-if" can be translated to: what if X is a true process (e.g., would the HIV mortality rate increase or decrease, given an increase in antiretroviral therapy if it was known or hypothesized that ART caused an increase in risky behavior of a specific magnitude in a gay community)?8 In intervention/prevention work, the question becomes: what if we intervene by doing X (e.g., what outcomes can we expect from different social network-driven intervention strategies to reduce or halt the spread of obesity along social channels)?9 In policy, the "what-if" considers the effects of policy X—e.g. to what degree would the prevalence of cigarette use change given a ban on menthol cigarettes?10

Our primary task here is to probe into the state of the art in simulation modeling that is of direct relevance to population health and that takes into account social and behavioral processes. We will do this by providing examples from the literature of selected cases describing recent work in this area. The examples we review were selected because they represent recent attempts to provide novel insights into difficult-to-understand phenomena—phenomena that call for the representation of complex and dynamic processes.

The first example focuses on individual-level health behavior theory (e.g., Reasoned Action Theory,11 Health Belief Model,12 Social Cognitive Theory13), borrowing techniques from cognitive science and computational psychology, to provide insights into how people dynamically integrate past experience with current social and environmental contexts to inform behavior. The second example, nearly opposite the first example in scale, addresses ways to leverage what are considered dynamic macrosocial determinants to reduce racial disparities in obesity-related behaviors. The third example explores the importance of understanding dynamic social and behavioral processes with respect to the spread of infectious disease within large segments of the U.S. population. Finally, we will explore the implications for public health practice and directions for future research.

Three Examples

Individual-Level Health Behavior

The closely related fields of health behavior change, health behavior and education, and behavioral medicine share the conviction, rightfully, that theoretically driven intervention and prevention efforts are more successful than those that are not theoretically-driven. The published literature supports this conviction.14,15 Theory provides organizing frameworks for understanding what to measure, how to measure, and from whom to measure. Despite a deep commitment to theory, the health behavior field has yet to forge a strong bridge to population health, especially in reference to dynamic processes (e.g., the effects of peer influence and the built environment on behavior). These types of processes, although acknowledged in health behavior theory, are not well understood at the level of the individual and thus are not yet well integrated with the population health approach.

A significant barrier to integrating health behavior theory and population health is the dearth of work in behavior change that leverages simulation at the individual level of analysis. The exception is the work we describe next, previously published work of our own16 that was designed to explain some of the more difficult issues in health behavior theory related to the dynamics of behavior and learning—an issue for which simulation is well-suited if not required.

Our work re-conceptualizes health behavior theory using computational modeling practices developed in the fields of cognitive science and computational psychology (i.e., the general study of the mind as an information processing system). Health behavior theory, which stems largely from psychological principles and constructs is at its essence about information processing; beliefs, learning by imitation, valuation, and so on are fully engaged with the notion of information and information processing.

Specifically, we developed a computational model of Reasoned Action Theory11 (hereafter RAT), a well-known theory that explains the performance (or not) of a behavior as driven by one's intention to perform the behavior where intention is driven by attitudes, norms, and perceptions of behavioral control, each of which is driven by a set of beliefs related to the behavior in question (i.e., beliefs → attitude, norms, control → intention). The innovation of our model is that it attempts to capture the dynamics of what we call intention formation, a conceptualization of how intention is generated on-the-fly that accounts for: (1) what a person has learned about a behavior (in terms of related beliefs) from past experience via social learning, and (2) the pressures from the more immediate social context also with respect to the same set of beliefs. For example, imagine this situation: a drip coffee aficionado is in conversation at a cocktail party hosted by the Society for Espresso Drinking on the topic of reasons to abandon drip coffee in favor of espresso (two very different types of coffee). Our model attempts to capture the on-the-fly intention formation (to consider switching from drip to espresso) of the drip coffee aficionado given his/her past experiences and the current conversation at the cocktail party. Outside of this more limiting case, consider these other, similar examples directly relevant to population health and health behavior: an adolescent moves to a new school and community; a woman is exposed to tobacco-related point-of-sale advertising; and, finally, a man changes jobs, relocating to a new community and new company.

Figure 1 represents the formal structure of our model—most prominent is the representation of each belief as a coupled set of memory units, one representing positive valence and the other representing negative valence that can vary in activation from not active to fully active. Our model assumes that the memory units are activated or cued by social exposure to relevant beliefs. Once cued, the system settles into a local equilibrium that is dictated by both the presence of the cued beliefs and the set of connection weights between memory units. The set of connection weights, importantly, are derived from past experience and encode a learned, cultural belief structure. The equilibrium state, in the aggregate across beliefs, captures the formation of intention given the cue—i.e., the current and immediate social context. This model structure was used as the basis for the simulation we describe next.

Figure 1. Schematic of the computational model of the Theory of Reasoned Action

Figure presents a schematic of the computational model of the Theory of Reasoned Action, representing a coupled set of memory units, one of which shows positive valence (attractiveness) and the other shows negative valence (aversiveness).

Source: Borrowed with permission from Orr MG, Thrush R, Plaut DC. The Theory of Reasoned Action as parallel constraint satisfaction: towards a dynamic computational model of health behavior. Plos One 2013;8:e62409. Used with permission under Creative Commons Attribution (CC BY) license.

We present the main results of a simulation that attempted to capture a person's attitude formation given a shift in social contexts. Specifically, we focused on sexual behavior in adolescent females and a shift in social context from exposure to less positive to exposure to more positive beliefs about sexual intercourse. To do this, we trained our model to learn the belief structure in the context of 10th grade females who had never had sex. (The data to train our model came from a Reasoned Action Survey conducted in a U.S. school district in the Northwest17). Then, we shifted contexts of the model to 12th grade females who had never had sex and 12th grade females who had previously had sex. Thus, this simulation procedure probed the degree to which intention formation was constrained simultaneously by past (being exposed to 10th grade non-experienced females) and current contexts (sexually experienced females). The key finding for this simulation was that the immediate social context was able to influence but not override what was learned from past experience—i.e., both the past and the present were important. Specifically, while exposure to more positive beliefs about sexual intercourse in the immediate social context did in fact increase the intention to have sexual intercourse, this tendency was dampened by past learning from contexts with less positive beliefs about sexual intercourse.

One might be tempted to argue that these results reflect no more than common sense: yes, of course both the past and the present impact health behaviors. This interpretation, however, misses the point. What we have accomplished in this simulation is highly important for the health behavior field—the development of a formal and mechanistic account of how the past and present are simultaneously and seamlessly integrated to generate an intention state. The formalism, called constraint satisfaction, was derived from prior work in cognitive science18,19 and social psychology20 (to model personality, attitudes, and cognitive consistency) and thus has a non-negligible degree of generality.

Insights from our simulations were: (1) our model allows for explicit representation of how beliefs might be formed (this is not addressed to any satisfaction in the current health behavior literature); (2) it affords an integration directly with other types of simulations, ones that focus on population health phenomena (e.g., influence spreading on social networks); and (3) it provides very explicit, novel, and testable hypotheses (e.g., about the assumed belief structure and the learning rates of beliefs). We point the reader to another paper of ours that lays out more completely the implications of this approach for health behavior.21 In this work, we introduced into the literature the term "computational health behavior modeling" in an attempt to drive the development of a sub-discipline within the health behavior field—one that calls for the integration of contemporary health behavior theory and simulation.

Macrosocial Drivers of Disparities in Health Behavior

The macrosocial approach to population health addresses social processes that are potentially amenable to change at the societal scale.22 Changes in macrosocial factors are uniquely promising because of the potential for widespread and long-standing societal and structural changes that, in turn, may support long-term population-level changes in health. We focus here on the macrosocial drivers of health-related behaviors, an important aspect of non-communicable disease and one that has large potential to benefit from the simulation approach due to the potential for long- and short-term feedback among the many related components of the system.

Figure 2 illustrates the macrosocial approach. Macrosocial processes represent factors such as income and educational distributions, crime rates, political will and policy, and corporate practices that are amenable to change at the societal level. The macro-behavioral interface represents the way in which macrosocial processes shape peoples' physical and social contexts (e.g., norms; the built environment). The behavioral process level captures individual-level behaviors that are directly related to outcomes (e.g., physical activity and dietary behaviors → obesity). Figure 2 allows us to think about how to change health behaviors related to health outcomes in a way that respects the need to (1) make long-term structural changes, (2) engender contextual changes in individual-level behaviors, and (3) address the population as a whole. A key feature in Figure 2 is the bi-directional arrows, whereby the influences between the macrosocial processes, the macro-behavior interface, and health behavior are mutual and bidirectional, thus illustrating the potentials for dynamic feedback among these levels of analysis.

Figure 2. Schematic of the macrosocial approach to understanding population health dynamics

Figure presents a schematic of the macrosocial approach to understanding population health dynamics, where macrosocial processes (socioeconomic, political, and industrial) interface with the local environment and social and cultural norms that in turn affect health behavior and outcomes.

In this section, we present recently published work of our own that used simulation to address a pressing population health issue—racial/ethnic disparities in obesity.23 Over one-third of the U.S. adult population is obese (35.7 percent),24 leading to increasing rates of chronic disease (heart disease, stroke, type 2 diabetes, some cancers) and medical costs ($147 billion in 2008).25 The obesity problem is even more problematic for key racial/ethnic groups. The highest age-adjusted obesity rate is among non-Hispanic blacks (49.5 percent), followed by all Hispanics (39.1 percent), and non-Hispanic whites (34.3 percent).26

The basis for the simulation was an agent-based model that was developed within the macrosocial framework to capture behavioral, social, and environmental determinants of diet and physical activity, the primary determinants of obesity. For the simulation presented here, we concentrate on diet-related behavior alone (a measure of the quality of dietary intake as driven by individuals' decisions and choices).

The structure of the model was designed to represent the racial and economic distributions of black and non-Hispanic whites in the 100 largest metropolitan statistical areas in the United States. Each agent resided in a household within a specific neighborhood and underwent a simulated developmental trajectory (birth, schooling, workforce, retirement, death) during which the agent engaged in health-related behaviors (smoking, dietary intake quality, degree of physical activity) to generate health outcomes (body mass index [BMI] among these). The determination of diet quality for each agent was structured so that neighborhood-level variables (school quality and access to good food stores) had a direct effect on each agent's dietary choices, as did the behavior of other agents via social network influence. Figure 3 shows the structure of the model in relation to dietary behavior.

Figure 3. Causal structure of the agent-based model of macrosocial drivers of obesity disparities

Figure depicts an agent-based model of macrosocial drivers of disparities in obesity, including neighborhood income, household income, school quality, school attendance, good food stores, and healthy diet, which in turn affect downstream variables like body mass index and cardiovascular disease.

Source: Borrowed with permission from Orr MG, Galea S, Riddle M, et al. Reducing racial disparities in obesity: simulating the effects of improved education and social network influence on diet behavior. Ann Epidemiol 2014;24:563‐9. Used with permission from Elsevier.

Using this model, we simulated the potential long-term (~2.5 generations in terms of agent lifecycles) effects of a social policy—improving the quality of schools in neighborhoods that suffer from poor school quality—on racial disparities in diet quality. We used the ratio of students to teachers as a proxy for school quality. For the policy manipulation, we simulated targeting neighborhoods in the lowest 20 percent of school quality where we reduced the student-to-teacher ratio by about 60 percent (from about 15:1 to 7:1). It is important to keep in mind that the policy manipulation was designed to not target race directly and was of considerable strength.

The simulations were focused on two questions: Question 1: To what extent does targeting school quality have an effect on racial disparities in dietary quality? Question 2: Are the effects of targeting school quality self-sustaining (what happens when we stop implementation of the policy)?

Regarding question 1, we found that targeting school quality reduced the black/white disparity in dietary quality by about 40 percent. Given the timeframe of the simulation (2.5 generations), we did not see any evidence that the disparity would be reduced further given more time. This probably stemmed, in part, from the structure of our manipulation. By targeting neighborhoods and not race, we effectively targeted about 40 percent of blacks and 15 percent of whites, so both blacks and whites saw an improvement in dietary quality. It is of interest that over time, the school quality manipulation affected the system globally. Both the levels of gross household income and the quality of food stores increased noticeably by the end of the simulation—the former driven by the increase in agents' combined education levels and the latter driven by the former (Figure 3).

Regarding question 2, because of the presence of social network effects (on dietary behavior) and the potential for system-level effects, we were surprised to find that the effects of the policy manipulation were not self-sustaining. Once the policy was no longer in effect, the initial disparities returned. Although we have yet to pinpoint exactly why the policy was not self-sustaining, this feature of the simulation points to the potential complexity in understanding dynamic processes that include many interdependent agents (in this case people). in summary, our simulation was supportive of macrosocial policies and in line with the literature on the determinants of population health and racial disparities. The value here, really, is to illustrate the potential of simulation modeling for thinking through the complexity of such issues in population health. We would like to emphasize that these conclusions must only be taken as a suggestion for the potentials of macrosocial, upstream policy effects, and the potential issues to consider, and that the reader should not rely on the precise numerical results. The simulation was designed primarily as a proof-of-concept to show these potentials. We urge readers to consult the original work for details on the model assumptions, data calibration and validation, limitations, and simulation results.

Population-Level Infectious Disease and Individual Behavior

The modeling and simulation of population-level dynamics of the spread of infectious disease have undergone a wholesale increase over the past decade in the number of parameters used. To a large degree, the newer parameters capture aspects that are crucial for understanding and potentially predicting epidemics (e.g., inclusion of social networks and geospatial data). Not surprisingly, some of these parameters attempt to account for somewhat complex social and behavioral processes and policies, such as zoning laws, court-ordered quarantine, and vaccination requirements for school entry. Although such policies can work, they are susceptible to variations in human behavior and social processes. Consider the tension between herd immunity and the so-called anti-vaccination movement in the United States today—the effectiveness of our policies, and thus our herd immunity, is at risk of degradation due largely to social and behavioral processes (e.g., spread of values and attitudes).

The field of computational epidemiology arose in part from the supposition that in order to study relevant policy implications, not only did we need to improve our ability to rapidly simulate the large-scale spread of infectious diseases,27-31 but we needed to do so while incorporating key social and behavioral processes. Through the use of simulation techniques, we now have support that this supposition was true; incorporating individual-level behavior when attempting to model population-level infectious disease dynamics yields obvious benefit.

Here we describe two simulation studies32,33 where we seek to explore the potential impact of individual-level behaviors on infectious disease dynamics under the condition of heterogeneity among individuals. These simulation studies represent individuals' behavior as a set of simple rules that alter the risk of exposure to or further transmission of disease. These studies were done with an agent-based network diffusion simulation platform, where the synthetic populations are painstakingly constructed through the fusion of many data sources (U.S. census, time-use, etc.) such that the household structures, geo-spatial distributions of demographics, and time use of the populations are preserved to the census block level.

As illustrated in Figure 4, and described in more detail elsewhere,34-36 census information at the block group level is further disaggregated to the individual level, while maintaining the joint distributions of demographics most predictive of time use patterns. The time use surveys are then assigned to demographically appropriate households and used in a microsimulation where every member of the population is assigned locations for each of their activities; steps of iterative refinement using capacity information for these locations (e.g., enrollment data for schools) ensures the locations have the appropriate numbers of occupants and preserves the distribution of commuting distances. Once the synthetic society is constructed, discrete event simulation platforms are used to represent the percolation and time course of the processes being modeled. In the case of infectious diseases, the duration of various states of infectiousness and incubation, as well as differential levels of infectiousness based on receiving different treatments, are represented. These simulations result in a single realization of the complex processes, and thus are run many times to generate distributions of possible outcomes. The studies discussed below were conducted on large populations capturing a wide diversity of demographics, specifically synthetic representations of Miami and Seattle (2.1 million and 3.2 million individuals respectively).

Figure 4. Basic ingredients for a highly-detailed agent-based synthetic population

Figure depicts the basic ingredients for a highly-detailed agent-based synthetic population, including disease and behavior states, dynamic social network generation, and percolation through the social network,. A synthetic population can be used to simulate and/or model the potential impact of individual-level behaviors on, for example, infectious disease dynamics and the transmission of disease.

The first study33 illustrates the importance of care-taking behaviors within a household on the extent of disease spread across the population. Specifically, it looks at how a household alters its contact patterns when a single household member is symptomatic with influenza. Two extremes of responses are explored for comparison's sake. The first "single care-taker" scenario has the symptomatic individual remove themselves from contact with all other members of the household except one person who cares for the sick individual and thus is at risk for infection. The opposite extreme, is that all members behave the same regardless of the symptoms of any of their members and are thus in contact with other household members while engaged in their home activity. To study the impact of these two behaviors, simulations were run for outbreaks of moderate to high levels of infection (cumulative attack rates of ~15 percent, and ~25 percent respectively) and in two cities with different demographic profiles (Miami: larger household sizes and older population; Seattle: smaller households and younger population). The "single care-taking" behavior was enough to reduce infection rates 10-88 percent, with the greatest reductions occurring for epidemics most similar to an average influenza season. These reductions were analyzed across household sizes, which showed the reduction to be larger in larger households, further reinforcing the importance of the behavior within the household. Certainly the degree to which these behaviors are truly followed and how they are distributed in the population rests somewhere between the extremes analyzed; however, this study demonstrates that these behaviors are an important element of disease transmission.

The second study32 compared the cost-effectiveness of an individually motivated behavior against a more top-down, driven behavior. This was motivated by the need to evaluate policy choices between subsidies designed to enable more choice and more directed interventions requiring closer surveillance. Two behaviors were studied as they were dynamically applied during the course of an outbreak of influenza. The individually motivated behavior is based on a person's desire to seek care, in this case get a vaccine or a course of antiviral prophylaxis, to reduce their chance of infection. In the simulation, this motivation is queued when the number of symptomatic people in an individual's social network exceeds a certain threshold, so that the individual's local social context becomes the driving force for mitigating the outbreak. This was compared to a more top-down approach that offers these interventions to sections of the population (census blocks or households with children in a particular school), when the numbers of people in that section exceeds a threshold. Care was taken to make these thresholds comparable and to explore the sensitivity of adherence with the offers. Even under the extreme conditions where adherence was absolute in the top-down branch, the number of cases averted per vaccine or course of antiviral prophylaxis was highest in the individually motivated care-seeking case. This finding was robust across a range of thresholds, diagnosing rates, and outbreak sizes. In many of these cases the number of cases averted per course is two orders of magnitude larger. This demonstrates the tremendous power of enabling individuals to take actions based on their particular environment rather than broadly applying actions to groups of individuals, even when precise aggregate knowledge about these groups is known. This highlights the profound effectiveness that individual-level behaviors can have in mitigating future cases of infectious diseases, especially when individuals are enabled to take effective action.

In summary, infectious disease dynamics at the population level are absolutely driven by individual-level behaviors. The studies described here illustrate how critically important it is to take these behaviors into account when simulating and argue for inclusion of individual-level behaviors in all simulation-based analyses of infectious disease spread. While the implementation of individual-level behaviors in these studies is still based on relatively simple rules, by taking the context of the individual into account, these simulations are a step in the direction of including individual-level behaviors into large-scale simulations of infectious disease spread.

Implications for Public Health Practice

Hopefully, we have convinced the reader that the simulation approach has much to offer in terms of understanding population health phenomena, especially those that involve social and behavioral processes. As described above, the principal appeal of simulation is that it allows for thinking in terms of "what-if" scenarios related to theory development, intervention/prevention, and policy. So, the implication is simple: Public health practice, at least when social and behavioral constructs are implied, should incorporate "what-if" simulations whenever feasible. In fact, we want to push people to think of new and creative ways to ask "what-if?" Beyond this rather broad assertion, however, are the details. Under what conditions should we ask "what-if"? Are there classes of questions that are more amenable to "what-if" scenarios? And so on.

Our collective experiences provide some insights into these questions. First and foremost, developing a simulation is fundamentally about developing formal algorithms to study the time-dependent processes thought to underlie a phenomenon. So, when developing a "what-if" scenario, keep in mind that a set of formal algorithms will represent the "what-if." Although this might initially seem to constrain the potential questions you can pose (e.g., How do I make an algorithm for how self-efficacy changes?), this is not necessarily the case. A shift in how you see a particular problem or phenomenon may yield the development of feasible algorithms. Second, for many phenomena of interest, multiple levels of analysis will be represented in a simulation. Much of the difficulty will come from defining algorithms that address the interplay between levels of analysis (e.g., the interplay between a person's attitude and his/her built environment) because this is probably where there is a dearth of theoretical and empirical work. Third, there are different classes of algorithms; some are strictly probabilistic (e.g., given X days of exposure to others smoking, my chance of smoking will increase by Y), some are more mechanistic, and some are both. Being mindful of these types of distinctions can aid in the development of algorithms and help to sort out the relation between the data and theory that might serve to ground the algorithms.

Finally, we suggest strongly that one should get involved in a direct way with building and using simulations, either through collaboration with experienced modelers or by self-guided exploration with dedicated software.a Furthermore, there are useful textbooks that introduce the simulation approach for behavioral and social phenomena.37,38

Two other important but subtle implications remain for consideration. First, simulations have the potential to develop new social and behavioral theory or expand on existing theory (e.g., the individual-level health behavior simulation presented above) and thus may offer an opportunity for public health practitioners to apply new theoretical advances. In other words, public health practitioners should be concerned with both the potential of using simulations for thinking "what-if" and the integration of new theory arising from simulation work into practice. Second, the development of the algorithms that drive a simulation often draw a stark picture of what we don't know. This can be tremendously fruitful for both the science and practice of public health.

Directions for Future Research

The simulation modeling approach in population health is a method that weaves together several threads in applied mathematics, computer science, and complex systems. It represents some theoretical first principles on the nature of complexity and its potential characteristic signatures, as well as theoretical constructs from topical fields such as genetics, psychology, sociology, economics, public policy, and social epidemiology. When incorporated with social and behavioral sciences, the number of possibilities becomes rather large. In this light, we feel presumptuous to provide specific directions for future research. So, instead, we focus the directions for future research on the very important issue of how to increase the extent to which the simulation approach is useful for the understanding of and intervention in population health phenomena that implicate social and behavioral processes. We have identified two central challenges in moving forward: (1) incorporation of "Big Data" to be used for empirical grounding of the simulations and (2) integration of simulation modeling into mainstream population health. We address each in turn.

Challenges in Empirical Grounding with the Advent of "Big Data"

In the context of population health, empirical grounding refers to developing confidence that the assumptions of a simulation model represent the processes that occur in the real world. This issue is not new, not in population health modeling or in any other fields that use computational or mathematical simulation. At this point in history, the consideration of issues at the intersection of empirical grounding, population health, and the social and behavioral sciences must seriously consider the role of the so-called Big Data revolution. The availability of massively unstructured data on human behavior and social process is increasing beyond precedent.

Generally, there are two principal sources of data to consider: administrative data and digital data. Administrative data are data collected for the administration of an organization or a program. Examples of these include Internal Revenue Service data for individuals and businesses, Social Security earnings records, Medicare and Medicaid health utilization data, and credit card transactions. Of particular interest for population health simulations is the use of data from electronic health records (EHRs) used by large, organized health systems, such as Kaiser Permanente, and by Federal entities, such as the Centers for Disease Control and Prevention. Digital data are data being reported for purposes other than statistical or administrative use. These data come from all variety of sensors, including mobile and wearable devices, and through the use of the Internet (e.g., Twitter). All of this has the potential to provide extremely large volumes of data in near real-time.

These two types of data offer possibilities for studying behavior and social drivers of population health at a finer level of geographic and/or demographic resolution and in more frequent time intervals, as well as studying human interactions at a societal scale, with rich spatial and temporal dynamics. In contrast to designed data collection, the data volume per unit cost is cheap. Crowd sourcing organizations, such as Amazon Mechanical Turk or Jana.com, exist and offer inexpensive venues to collect data. The clear cost driver is no longer the data collection but rather the development and execution of the data analytics, a significant game-changer in how to think about social and bio-informatics and the interface with simulation modeling.

Naturally, there are issues, both known and unknown, with these new data sources. They have unknown quality, population representativeness, and statistical properties. These factors may be knowable, but they simply have not been well studied. Also, the data come with little to no documentation about coverage, representativeness, bias, and longitudinal gaps in the data. These data may present time continuity problems, e.g., companies may merge or change focus. Finally, as exemplified by EHRs, are the challenges of interoperability and privacy concerns. Many of the data formats of EHRs lack interoperability, which in turn, hinders effective exchange or integration of health information. None of these issues are insurmountable; thus, we should look forward to their integration with simulation approaches.

Big Data, given its potential allure, needs to be considered carefully. The domains of interest for population health simulations often represent multiple levels of analysis and, thus, multiple disciplinary theories and notions of processes. Historically, data collection in the social and behavioral sciences has been purposeful and largely driven by theory. Given Big Data, we must be careful to avoid the trap of deriving simulation modeling assumptions that are driven by new data sources without new and well-tested theory. A related issue is that the empirical grounding of any assumption about a process should be considered as conditional on other relevant processes, e.g., people's individual decisionmaking strategies are embedded in neighborhoods and social contexts that interface with policy and industry concerns dynamically over time. Big Data, given its unstructured format and high-temporal signature should excel in this respect.

As a final consideration, it is instructive to consider work in other disciplines that face similar hurdles when attempting to build confidence in model assumptions. For example, consider the Across Trophic Level System Simulation (ATLSS) family of agent-based models used to understand ecological issues in South Florida (e.g., the long-term effects on population dynamics of white-tailed deer and the Florida panther in relation to water management policy).39 The degree to which many of the ATLSS assumptions are empirically grounded is unprecedented in terms of what we have accomplished in population health simulations, especially in relation to assumptions that govern the behavior and dynamics of model entities—e.g., data from radio-tagged panthers over time and space with high resolution to develop the agent-rules for habitat usage. We mention this contrast simply to generate discussion: Does population health need such temporal, spatial and process resolution? And, if yes, will Big Data provide for this need?

Integration into the Mainstream

The technical skills required for development and implementation of simulations are not typically found among researchers, educators, review panels, editors, and others in population health. Over time, there is potential for this to change if the next generation gains more exposure to simulations through pioneering programs like the National Institutes of Health (NIH) Institute on Systems Science and Healthb and the newly developed core curriculum at the Columbia University Mailman School of Public Health.

The largest barrier for reaching this potential is cultural and normative—will we agree as a field that simulation should become a central approach in population health? And, will others (policymakers and the public) see population health as a discipline that can answer dynamic process questions? In order to remove this barrier, we need clarity in our collective expectations of what simulations actually do well, despite their limitations. Two issues are critically important.

First, the types of simulation models used in population health excel as qualitative tools for understanding alternative futures, situations, contexts, and importantly, unanticipated levers of change. These models generally are not designed for predicting precisely what will happen and to what degree. At first glance, the notion that computationally-explicit highly-quantitative models should really be used for qualitative understanding appears to be a glaring contradiction. However, this is simply not the case—it is well appreciated even in the most quantitative of sciences (e.g., physics) that many computational and mathematical models are meant to provide qualitative insights.

Second, models will contain a set of assumptions that are not empirically grounded—to include those that represent hypothesized but not yet fully understood processes. Mimetic social influence is a case in point, whereby we have evidence that it exists but the details of how it works with respect to other model assumptions are highly speculative in most simulation models. This limitation reflects the nature of simulation modeling. By definition, a simulation model must implement explicit assumptions about all hypothesized processes in the model. Otherwise, the simulation would not run, so to speak. When considering this issue, it is important to realize that one of the primary outcomes of simulation modeling is the uncovering of gaps in both theory and data.

John Sterman's central thesis on the use of simulation in public health is apropos: simulation provides a useful approach for learning in a complex world.40 Some phenomena are complex and difficult to understand and thus call for the simulation approach. Learning is not gained from precise predictions and certainly does not require all assumptions to be known. It simply requires agreement as a field that we need new and sometimes more appropriate tools for understanding the core issues in population health.41


Collectively, the work presented above was funded in part by the following: Defense Threat Reduction Agency Comprehensive National Incident Management System Contract HDTRA1- 11-D-0016-0001; National Institutes of Health Grants 5U01GM070694-11, 1R01GM109718, and 1R21HD067570-01; Grant No. 60466 from the Robert Wood Johnson Foundation; The Network on Inequalities, Complex Systems, and Health (HHSN276200800013C); Columbia University Mailman School of Public Health Epidemiology Merit Fellowship to Mark Orr; and an Institute for Integrative Health TIIH Scholar Award to George A. Kaplan. We thank Nathan Osgood, Ronald Mintz, Dylan Knowles, David Plaut, and members of the Social and Decision Analytics Lab and the Network Dynamics and Simulation Science Laboratory (NDSSL) at Virginia Tech for their suggestions and comments. The opinions presented herein are those of the authors and do not necessarily reflect the official position of the Agency for Healthcare Research and Quality, the National Institutes of Health, or the U.S. Department of Health and Human Services.

Authors' Affiliations

Mark G. Orr, PhD, is Associate Research Professor at Virginia Tech; Bryan Lewis, PhD, MPH, is an Assistant Research Professor at the Virginia Bioinformatics Institute at Virginia Tech; Kathryn Ziemer, PhD, is a Postdoctoral Associate at the Social and Decision Analytics Laboratory, Virginia Bioinfomatics Institute, Virginia Tech; and Sallie Keller, PhD, is Professor of Statistics and Director of the Social and Decision Analytics Laboratory, Virginia Bioinformatics Institute, Virginia Tech.

Address correspondence to: Mark Orr, Research Associate Professor, Social and Decision Analytics Laboratory, Virginia Tech-National Capital Region, 900 North Glebe Road, Arlington, VA 22203; email morr9@vbi.vt.edu.


  1. Senge PM. Statistical estimation of feedback models. Simulation 1977;28:177-84.
  2. Morris M, Kretzschmar M. Concurrent partnerships and transmission dynamics in networks. Soc Net 1995;17:299-318.
  3. Weinstein MC, Coxson PG, Williams LW, et al. Forecasting coronary heart disease incidence, mortality, and cost: the Coronary Heart Disease Policy Model. Am J Public Health 1987;77:1417-26.
  4. Complex systems approaches to population health. May 2007 conference; University of Michigan. Available at http://sitemaker.umich.edu/complexsystemspopulationhealth/home. Accessed May 5, 2015.
  5. Systems thinking (special issue). Am J Public Health 2006 Mar;96(3).
  6. Annual Institute on Systems Science and Health. Bethesda, MD: Office of Behavioral and Social Sciences Research, National Institutes of Health. Available at http://obssr.od.nih.gov/training_and_education/issh/. Accessed May 5, 2015.
  7. Calhoun JG, Ramiah K, Weist EM, et al. Development of a core competency model for the master of public health degree. Am J Public Health 2008;98:1598-607.
  8. Blower SM, Gershengorn HB, Grant RM. A tale of two futures: HIV and antiretroviral therapy in San Francisco. Science 2000;287:650-4.
  9. Bahr DB, Browning RC, Wyatt HR, et al. Exploiting social networks to mitigate the obesity epidemic. Obesity 2009;17:723-8.
  10. Levy DT, Pearson JL, Villanti AC, et al. Modeling the future effects of a menthol ban on smoking prevalence and smoking—attributable deaths in the United States. Am J Public Health 2011;101:1236-40.
  11. Fishbein M, Ajzen I. Predicting and changing behavior: the reasoned action approach. New York, NY: Taylor and Francis, Psychology Press; 2010.
  12. Janz N, Becker M. The health belief model: a decade later. Health Educ Q 1984;11:1-47.
  13. Bandura A. Social foundations of thought and action: a social cognitive theory. Englewood Cliffs, NJ: Prentice Hall; 1986.
  14. Weinstein N. Testing four competing theories of health—protective behavior. Health Psychol 1993;12:324-33.
  15. Painter JE, Borba CPC, Hynes M, et al. The use of theory in health behavior research from 2000 to 2005: a systematic review. Ann Behav Med 2008;35:358-62.
  16. Orr MG, Thrush R, Plaut DC. The Theory of Reasoned Action as parallel constraint satisfaction: towards a dynamic computational model of health behavior. Plos One 2013;8:e62409.
  17. Gillmore MR, Archibald ME, Morrison DM, et al. Teen sexual behavior: applicability of the theory of reasoned action. J Marriage Fam 2002;64:885-97.
  18. Rumelhart DE, McClelland JL, Group PR, eds. Parallel distributed processing: explorations in the microstructure of cognition. Vol 2: Psychological and biological models. Cambridge, MA: MIT Press; 1986.
  19. McClelland JL, Rumelhart DE, Group PR, eds. Parallel distributed processing: Explorations in the microstructure of cognition. Vol 1: Foundations. Cambridge, MA: MIT Press; 1986.
  20. Read SJ, Miller LC, eds. Connectionist models of social reasoning and social behavior. Mahwah, NJ: Lawrence Erlbaum Associates; 1998.
  21. Orr MG, Plaut DC. Complex systems and health behavior change: insights from cognitive science. Am J Health Behav 2014;38:404-13.
  22. Galea S, ed. Macrosocial determinants of population health. New York: Springer; 2007.
  23. Orr MG, Galea S, Riddle M, et al. Reducing racial disparities in obesity: simulating the effects of improved education and social network influence on diet behavior. Ann Epidemiol 2014;24:563-9.
  24. Ogden CL, Carroll MD, Kit BK, et al. Prevalence of obesity in the United States, 2009-2010. Hyattsville, MD: National Center for Health Statistics; 2012.
  25. Finkelstein EA, Trogdon JG, Cohen JW, et al. Annual medical spending attributable to obesity: payer- and service-specific estimates. Health Aff 2009;28:w822-31.
  26. Flegal KM, Carroll MD, Kit BK, et al. Prevalence of obesity and trends in the distribution of body mass index among US adults, 1999-2010. JAMA 2012;307:491-7.
  27. Chao DL, Matrajt L, Basta NE, et al. Planning for the control of pandemic influenza A (H1N1) in Los Angeles County and the United States. Am J Epidemiol 2011;173:1121-30.
  28. Eisenberg JNS, Scott JC, Porco T. Integrating disease control strategies: balancing water sanitation and hygiene interventions to reduce diarrheal disease burden. Am J Public Health 2007;97:846-52.
  29. Goldstein E, Apolloni A, Lewis B, et al. Distribution of vaccine/antivirals and the 'least spread line' in a stratified population. J R Soc Interface 2010;7:755-64.
  30. Halloran ME, Ferguson NM, Eubank S, et al. Modeling targeted layered containment of an influenza pandemic in the United States. Proc Natl Acad Sci 2008;105:4639-44.
  31. Kumar S, Grefenstette JJ, Galloway D, et al. Policies to reduce influenza in the workplace: Impact assessments using an agent-based model. Am J Public Health 2013;103:1406-11.
  32. Marathe A, Lewis B, Barrett C, et al. Comparing effectiveness of top-down and bottom-up strategies in containing influenza. PLoS One 2011;6:e25149.
  33. Marathe A, Lewis B, Chen J, et al. Sensitivity of household transmission to household contact structure and size. PLoS One 2011;6:e22461.
  34. Barrett CL, Bisset KR, Eubank SG, et al. EpiSimdemics: an efficient algorithm for simulating the spread of infectious disease over large realistic social networks. Proceedings of the 2008 ACM/IEEE Conference on Supercomputing; 2008: IEEE Press. p. 37.
  35. Bisset KR, Chen J, Feng X, et al. EpiFast: a fast algorithm for large scale realistic epidemic simulations on distributed memory systems. Proceedings of the 23rd International Conference on Supercomputing; 2009: ACM. p. 430-9.
  36. Eubank S, Guclu H, Kumar VS, et al. Modelling disease outbreaks in realistic urban social networks. Nature 2004;429:180-4.
  37. Sterman JD. Business dynamics: systems thinking and modeling for a complex world. New York: McGraw-Hill; 2000.
  38. Miller JH, Page SE. Complex adaptive systems: an introduction to complex models of social life. Princeton: Princeton University Press; 2007.
  39. DeAngelis DL, Gross LJ, Huston MA, et al. Landscape modeling for Everglades ecosystem restoration. Ecosystems 1998;1:64-75.
  40. Sterman JD. Learning from evidence in a complex world. Am J Public Health 2006;96:505-14.
  41. Diez Roux A. Complex systems thinking and current impasses in health disparities research. Am J Public Health 2011;101:1627-34.

a. The free software platform Netlogo serves this purpose very well. Go to https://ccl.northwestern.edu/netlogo/.

b. Go to http://obssr.od.nih.gov/training_and_education/issh/.

Mark G. Orr Mark G. Orr, PhD, is an Associate Research Professor at Virginia Tech. He was originally trained as a cognitive psychologist at the University of Illinois at Chicago. Dr. Orr received augmentation to this training with postdoctoral fellowships in computational modeling (Carnegie Mellon), neuroscience (Albert Einstein College of Medicine), and epidemiology/complex systems (Columbia University). Over the past decade, he has become heavily involved in understanding dynamic processes and drivers of risky behavior and decisionmaking, primarily in a public health context, at the individual and population levels.
Bryan Lewis Bryan Lewis, PhD, MPH, is an Assistant Research Professor at the Virginia Bioinformatics Institute at Virginia Tech. His work focuses on the use of highly-detailed computer simulations of human society to support public health decisionmaking and policies.
Kathryn Ziemer Kathryn Ziemer, PhD, is a Postdoctoral Associate at the Social and Decision Analytics Laboratory at Virginia Tech. Her research focuses on social cognitive theory, attitude formation and change, decisionmaking, and emotion regulation. Dr. Ziemer received her PhD in Counseling Psychology from the University of Maryland in 2014. Her dissertation involved creating a novel intervention using concepts from social cognitive theory and emotion regulation theory to improve the psychological and physical outcomes of individuals with chronic pain. More recently, she has focused on novel methods for measuring attitudes and the diffusion of attitudes across social networks.
Sallie Keller Sallie Keller, Ph.D., is Professor of Statistics and Director of the Social and Decision Analytics Laboratory at the Virginia Bioinformatics Institute, Virginia Tech. Formerly, she was Professor of Statistics and Academic Vice-President and Provost at the University of Waterloo, Director of the IDA Science and Technology Policy Institute, and Professor of Statistics and the William and Stephanie Sick Dean of Engineering at Rice University. She has served as a member of the National Academy of Sciences Board on Mathematical Sciences and Their Applications, chaired the Committee on Applied and Theoretical Statistics, and currently is a member of the Committee on National Statistics. She is a national associate of the National Academy of Sciences, fellow of the American Association for the Advancement of Science, elected member of the International Statistics Institute, and member of the JASON advisory group. Dr. Keller is a fellow and past president of the American Statistical Association.


Page last reviewed July 2015
Page originally created September 2015
Internet Citation: Mathematical and Computational Simulation of the Behavioral and Social Drivers of Population Health. Content last reviewed July 2015. Agency for Healthcare Research and Quality, Rockville, MD. http://www.ahrq.gov/professionals/education/curriculum-tools/population-health/orr.html