Disease Model

Todo

Motivate the development of the disease model. We’re trying to understand the impact of interventions.

Here we’ll produce a data-free disease model focusing on core Vivarium concepts. You can find more complicated versions of the components built here in the vivarium_public_health library. Those components must additionally deal with manipulating complex data which makes understanding what’s going on more complicated.

After this tutorial, you should be well poised to begin working with and examining those components.

Setup

I’m assuming you’ve read through the material in getting started and are working in your vivarium/examples package. If not, you should go there first.

Todo

package setup with __init__ and stuff

Building a population

In many ways, this is a bad place to start. The population component is one of the more complicated components in the simulation as it typically is responsible for bootstrapping some of the more interesting features in vivarium.engine.

We need a population, though, so we’ll start with one here and defer explanation of some of the more complex pieces/systems until later.

File: ~/code/vivarium/examples/disease_model/population.py
  1# docs-start: imports
  2from datetime import timedelta
  3from typing import Any
  4
  5import pandas as pd
  6
  7from vivarium.engine import Component
  8from vivarium.engine.examples.disease_model import Mortality
  9from vivarium.engine.framework.engine import Builder
 10from vivarium.engine.framework.event import Event
 11from vivarium.engine.framework.population import SimulantData
 12
 13# docs-end: imports
 14
 15
 16class BasePopulation(Component):
 17    """Generates a base population with a uniform distribution of age and sex."""
 18
 19    ##############
 20    # Properties #
 21    ##############
 22
 23    # docs-start: configuration_defaults
 24    @property
 25    def configuration_defaults(self) -> dict[str, Any]:
 26        """A set of default configuration values for this component.
 27
 28        These can be overwritten in the simulation model specification or by
 29        providing override values when constructing an interactive simulation.
 30        """
 31        return {
 32            "population": {
 33                # The range of ages to be generated in the initial population
 34                "age_start": 0,
 35                "age_end": 100,
 36                # Note: There is also a 'population_size' key.
 37            },
 38        }
 39
 40    # docs-end: configuration_defaults
 41
 42    # docs-start: sub_components
 43    @property
 44    def sub_components(self) -> list[Component]:
 45        return [Mortality()]
 46
 47    # docs-end: sub_components
 48
 49    #####################
 50    # Lifecycle methods #
 51    #####################
 52
 53    # noinspection PyAttributeOutsideInit
 54    # docs-start: setup
 55    def setup(self, builder: Builder) -> None:
 56        """Performs this component's simulation setup.
 57
 58        The ``setup`` method is automatically called by the simulation
 59        framework. The framework passes in a ``builder`` object which
 60        provides access to a variety of framework subsystems and metadata.
 61
 62        Parameters
 63        ----------
 64        builder
 65            Access to simulation tools and subsystems.
 66        """
 67        self.config = builder.configuration
 68
 69        # docs-start: crn
 70        self.with_common_random_numbers = bool(self.config.randomness.key_columns)
 71        self.register = builder.randomness.register_simulants
 72        if (
 73            self.with_common_random_numbers
 74            and not ["entrance_time", "age"] == self.config.randomness.key_columns
 75        ):
 76            raise ValueError(
 77                "If running with CRN, you must specify ['entrance_time', 'age'] as"
 78                "the randomness key columns."
 79            )
 80        # docs-end: crn
 81
 82        # docs-start: randomness
 83        self.age_randomness = builder.randomness.get_stream(
 84            "age_initialization",
 85            initializes_crn_attributes=self.with_common_random_numbers,
 86        )
 87        self.sex_randomness = builder.randomness.get_stream("sex_initialization")
 88        # docs-end: randomness
 89
 90        # docs-start: initializers
 91        builder.population.register_initializer(
 92            initializer=self.initialize_entrance_time_and_age,
 93            columns=["entrance_time", "age"],
 94            required_resources=[self.age_randomness],
 95        )
 96        builder.population.register_initializer(
 97            initializer=self.initialize_sex,
 98            columns="sex",
 99            required_resources=[self.sex_randomness],
100        )
101        # docs-end: initializers
102
103    # docs-end: setup
104
105    ########################
106    # Event-driven methods #
107    ########################
108
109    # docs-start: initialize_entrance_time_and_age
110    def initialize_entrance_time_and_age(self, pop_data: SimulantData) -> None:
111        # docs-start: ages
112        age_start = pop_data.user_data.get("age_start", self.config.population.age_start)
113        age_end = pop_data.user_data.get("age_end", self.config.population.age_end)
114
115        if age_start == age_end:
116            if not isinstance(pop_data.creation_window, (pd.Timedelta, timedelta)):
117                raise TypeError(
118                    f"Expected creation_window to be a Timedelta, got {type(pop_data.creation_window)}"
119                )
120            age_window = pop_data.creation_window / pd.Timedelta(days=365)
121        else:
122            age_window = age_end - age_start
123
124        age_draw = self.age_randomness.get_draw(pop_data.index)
125        age = age_start + age_draw * age_window
126        # docs-end: ages
127        # docs-start: population_dataframe
128        population = pd.DataFrame(
129            {
130                "entrance_time": pop_data.creation_time,
131                "age": age.values,
132            },
133            index=pop_data.index,
134        )
135        # docs-end: population_dataframe
136
137        # docs-start: crn_registration
138        if self.with_common_random_numbers:
139            self.register(population)
140        # docs-end: crn_registration
141
142        # docs-start: update_entrance_time_and_age
143        self.population_view.initialize(population)
144        # docs-end: update_entrance_time_and_age
145
146    # docs-end: initialize_entrance_time_and_age
147
148    # docs-start: initialize_sex
149    def initialize_sex(self, pop_data: SimulantData) -> None:
150        self.population_view.initialize(
151            pd.Series(
152                self.sex_randomness.choice(pop_data.index, ["Male", "Female"]), name="sex"
153            )
154        )
155
156    # docs-end: initialize_sex
157
158    # docs-start: on_time_step
159    def on_time_step(self, event: Event) -> None:
160        """Updates simulant age on every time step.
161
162        Parameters
163        ----------
164        event
165            An event object emitted by the simulation containing an index
166            representing the simulants affected by the event and timing
167            information.
168        """
169        living_index = self.population_view.get_filtered_index(
170            event.index, query="is_alive == True"
171        )
172        if not isinstance(event.step_size, (pd.Timedelta, timedelta)):
173            raise TypeError(
174                f"Expected step_size to be a Timedelta, got {type(event.step_size)}"
175            )
176        delta = event.step_size / pd.Timedelta(days=365)
177        self.population_view.update(
178            "age",
179            lambda age: age.loc[living_index] + delta,
180        )
181
182    # docs-end: on_time_step

There are a lot of things here. Let’s take them piece by piece.

Imports

from datetime import timedelta
from typing import Any

import pandas as pd

from vivarium.engine import Component
from vivarium.engine.examples.disease_model import Mortality
from vivarium.engine.framework.engine import Builder
from vivarium.engine.framework.event import Event
from vivarium.engine.framework.population import SimulantData

It’s typical to import all required objects at the top of each module. In this case, we are importing pandas and the Vivarium Component class because they are used explicitly throughout the file. Further, we import several objects from python’s typing package as well as three classes from the core Vivarium framework which are used solely for typing information in method signatures.

Note

Providing type hints in Python is totally optional, but if you’re using a modern python IDE or plugins for traditional text editors, they can offer you completion options and easy access to interface documentation. It also enables the use of other static analysis tools like mypy.

BasePopulation Instantiation

We define a class called BasePopulation that inherits from the Vivarium Component. This inheritance is what makes a class a proper Vivarium component and all the affordances that come with that.

Default Configuration

    @property
    def configuration_defaults(self) -> dict[str, Any]:
        """A set of default configuration values for this component.

        These can be overwritten in the simulation model specification or by
        providing override values when constructing an interactive simulation.
        """
        return {
            "population": {
                # The range of ages to be generated in the initial population
                "age_start": 0,
                "age_end": 100,
                # Note: There is also a 'population_size' key.
            },
        }

You’ll see this sort of pattern repeated in many Vivarium components.

We declare a configuration block as a property for components. Vivarium has a cascading configuration system that aggregates configuration data from many locations. The configuration is essentially a declaration of the parameter space for the simulation.

The most important thing to understand is that configuration values are given default values provided by the components and that they can be overriden with a higher level system like a command line argument later.

This component specifically declares defaults for the age range for the initial population of simulants. It also notes that there is a ‘population_size’ key. This key has a default value set by Vivarium’s population management system.

Sub-components

    @property
    def sub_components(self) -> list[Component]:
        return [Mortality()]

This property is a list of components that are managed by this component. In this case, we see that when BasePopulation is set up, it will also set up an instance of the Mortality component.

The __init__() method

Though Vivarium components are specific implementations of Python classes you’ll notice that many of the classes have very sparse __init__ methods. Indeed, this BasePopulation class does not even have one defined at this level (though there is one in the Component parent class it inherits from).

Due to the way the simulation bootstraps itself, the __init__ method is usually only used to assign names to generic components and muck with the configuration_defaults a bit. We’ll see more of this later.

The setup method

Instead of the __init__ method, most of the component initialization takes place in the setup method.

The signature for the setup method is the same in every component. When the framework is constructing the simulation it looks for a setup method on each component and calls that method with a Builder instance.

Note

The Builder

The builder object is essentially the simulation toolbox. It provides access to several simulation subsystems:

  • builder.configuration : A dictionary-like representation of all of the parameters in the simulation.

  • builder.lookup : A service for generating interpolated lookup tables. We won’t use these in this tutorial.

  • builder.value : The value pipeline system. In many ways this is the heart of any Vivarium simulation. We’ll discuss this in great detail as we go.

  • builder.event : Access to Vivarium’s event system. The primary use is to register listeners for 'time_step' events.

  • builder.population : The population management system. Registers population initializers (functions that fill in initial state information about simulants), gives access to views of the simulation state, and mediates updates to the simulation state. It also provides access to functionality for generating new simulants (e.g. via birth or migration), though we won’t use that feature in this tutorial.

  • builder.randomness : Vivarium uses a variance reduction technique called Common Random Numbers to perform counterfactual analysis. In order for this to work, the simulation provides a centralized source of randomness.

  • builder.time : The simulation clock.

  • builder.components : The component management system. Primarily used for registering subcomponents for setup.

  • builder.results : The results management system. This provides access to stratification and observation registration functions.

Let’s step through the setup method and examine what’s happening.

    def setup(self, builder: Builder) -> None:
        """Performs this component's simulation setup.

        The ``setup`` method is automatically called by the simulation
        framework. The framework passes in a ``builder`` object which
        provides access to a variety of framework subsystems and metadata.

        Parameters
        ----------
        builder
            Access to simulation tools and subsystems.
        """
        self.config = builder.configuration

        # docs-start: crn
        self.with_common_random_numbers = bool(self.config.randomness.key_columns)
        self.register = builder.randomness.register_simulants
        if (
            self.with_common_random_numbers
            and not ["entrance_time", "age"] == self.config.randomness.key_columns
        ):
            raise ValueError(
                "If running with CRN, you must specify ['entrance_time', 'age'] as"
                "the randomness key columns."
            )
        # docs-end: crn

        # docs-start: randomness
        self.age_randomness = builder.randomness.get_stream(
            "age_initialization",
            initializes_crn_attributes=self.with_common_random_numbers,
        )
        self.sex_randomness = builder.randomness.get_stream("sex_initialization")
        # docs-end: randomness

        # docs-start: initializers
        builder.population.register_initializer(
            initializer=self.initialize_entrance_time_and_age,
            columns=["entrance_time", "age"],
            required_resources=[self.age_randomness],
        )
        builder.population.register_initializer(
            initializer=self.initialize_sex,
            columns="sex",
            required_resources=[self.sex_randomness],
        )
        # docs-end: initializers

To start, we simply grab a copy of the simulation configuration. This is essentially a dictionary that supports .-access notation.

The next handful of lines interact with Vivarium’s randomness system. Several things are happening here.

First, we deal with the topic of Common Random Numbers, a variance reduction technique employed by the Vivarium framework to make it easier to perform counterfactual analysis. It’s not important to have a full grasp of this system at this point.

    self.with_common_random_numbers = bool(self.config.randomness.key_columns)
    self.register = builder.randomness.register_simulants
    if (
        self.with_common_random_numbers
        and not ["entrance_time", "age"] == self.config.randomness.key_columns
    ):
        raise ValueError(
            "If running with CRN, you must specify ['entrance_time', 'age'] as"
            "the randomness key columns."
        )

Note

Common Random Numbers

The idea behind Common Random Numbers (or CRN) is to enable comparison between two simulations running under slightly different conditions. Conceptually, we achieve this by guaranteeing that the same events occur to the same people at the same time across simulations with the same random seed.

For example, suppose we have two simulations of the world. We model the world as it is in the first simulation and we introduce a vaccine for the flu in the second simulation. Unless the model explicitly encodes the causal relationship between flu vaccination and vehicle traffic patterns, the person who died in a vehicle accident on the 43rd time step in the first simulation will also die in a vehicle accident on the 43rd time step in the second simulation.

In practice, what the CRN system requires is a way to uniquely identify simulants across simulations. We need to randomly generate some simulant characteristics in a repeatable fashion and then use those characteristics to identify the simulants in the randomness system later. This is (typically) only handled by the population component. It’s vitally important to get right when doing counterfactual analysis, but it’s not especially important that you understand the mechanics of the implementation.

In this component we’re using some information about the configuration of the randomness system to let us know whether or not we care about using CRN. We’ll explore this later when we’re looking at running simulations with interventions.

Next, we grab actual randomness streams from the framework.

    self.age_randomness = builder.randomness.get_stream(
        "age_initialization",
        initializes_crn_attributes=self.with_common_random_numbers,
    )
    self.sex_randomness = builder.randomness.get_stream("sex_initialization")

get_stream is the only call most components make to the randomness system. The best way to think about randomness streams is as decision points in your simulation. Any time you need to answer a question that requires a random number, you should be using a randomness stream linked to that question.

Here we have the questions “What age are my simulants when they enter the simulation?” and “What sex are my simulants?”; we have assigned their corresponding randomness streams to age_randomness and sex_randomness attributes, respectively.

For age_randomness, the initializes_crn_attributes argument tells the stream that the simulants you’re asking this question about won’t already be registered with the randomness system; this is the bootstrapping part. Here we’re using the 'entrance_time' and 'age' to identify a simulant and so we need a stream to initialize ages with. There is should really only be one of these initialization streams in a simulation.

The 'sex_randomness' is a much more typical example of how to interact with the randomness system - we are simply getting the stream.

Finally, we register two initializers with the population manager. This tells vivarium which initializers to call when creating the component as well as which columns (if any) each initializer is responsible for creating.

    builder.population.register_initializer(
        initializer=self.initialize_entrance_time_and_age,
        columns=["entrance_time", "age"],
        required_resources=[self.age_randomness],
    )
    builder.population.register_initializer(
        initializer=self.initialize_sex,
        columns="sex",
        required_resources=[self.sex_randomness],
    )

Note that each initializer registration requires us to specify which columns that initializer is creating (if any), the initializer method itself, and any required resources. In this case, each initializer depends on a randomness stream. The system will ensure that these required resources are set up before calling the initializer methods.

That was a lot of stuff

As I mentioned at the top, the population component is one of the more complicated pieces of any simulation. It’s not important to grasp everything right now. We’ll see many of the same patterns repeated in the setup method of other components later. The unique things here are worth coming back to at a later point once you have more familiarity with the framework conventions.

The initializers

Two methods are registered as initializers. The primary purpose of initializer methods is to generate the initial population. Specifically (for this class), the initialize_entrance_time_and_age method is registered to create the ‘entrance_time’ and ‘age’ columns (with age_randomness as a dependency) and the initialize_sex method is registered to create the ‘sex’ column (with sex_randomness as a dependency).

Note

The Population State Table

When we talk about columns in the context of Vivarium, we are typically talking about simulant attributes. All attributes for all simulants can be represented by a single pandas.DataFrame where each simulant is a row and each attribute is a column. This dataframe is referred to as the population state table (or simply “state table”).

As previously mentioned, this class is a proper Vivarium Component. Among other things, this means that much of the setup happens automatically during the simulation’s Setup lifecycle phase. There are several methods available to define for a component’s setup, depending on what you want to happen when: on_post_setup() on_time_step_prepare(), on_time_step(), on_time_step_cleanup(), on_collect_metrics(), and on_simulation_end(). The framework looks for any of these methods during the setup phase and calls them if they are defined. Further, the framework calls any registered initializer methods during population creation. The fact that initialize_entrance_time_and_age and initialize_sex were registered guarantees that they will be called during the population initialization phase of the simulation.

Initializer methods are called by the population manager whenever simulants are created. For our purposes, this happens only once at the very beginning of the simulation. Typically, we’d task another component with responsibility for managing other ways simulants might enter (we might, for instance, have a Migration component that knows about how and when people enter and exit our location of interest or a Fertility component that handles new simulants being born).

Let’s inspect the two initializer methods line by line as we did with setup.

The initialize_entrance_time_and_age method

def initialize_entrance_time_and_age(self, pop_data: SimulantData) -> None:
    # docs-start: ages
    age_start = pop_data.user_data.get("age_start", self.config.population.age_start)
    age_end = pop_data.user_data.get("age_end", self.config.population.age_end)

    if age_start == age_end:
        if not isinstance(pop_data.creation_window, (pd.Timedelta, timedelta)):
            raise TypeError(
                f"Expected creation_window to be a Timedelta, got {type(pop_data.creation_window)}"
            )
        age_window = pop_data.creation_window / pd.Timedelta(days=365)
    else:
        age_window = age_end - age_start

    age_draw = self.age_randomness.get_draw(pop_data.index)
    age = age_start + age_draw * age_window
    # docs-end: ages
    # docs-start: population_dataframe
    population = pd.DataFrame(
        {
            "entrance_time": pop_data.creation_time,
            "age": age.values,
        },
        index=pop_data.index,
    )
    # docs-end: population_dataframe

    # docs-start: crn_registration
    if self.with_common_random_numbers:
        self.register(population)
    # docs-end: crn_registration

    # docs-start: update_entrance_time_and_age
    self.population_view.initialize(population)
    # docs-end: update_entrance_time_and_age

First, we see that this method takes in a special argument that we don’t provide. This argument, pop_data, is an instance of SimulantData containing a handful of information useful when initializing simulants.

Note

SimulantData

This simple structure only has four attributes (used here in the generic Python sense of the word).

  • index : The population table index of the simulants being initialized.

  • user_data : A (potentially empty) dictionary generated by the user in components that directly create simulants.

  • creation_time : The current simulation time. A pandas.Timestamp.

  • creation_window : The size of the time step over which the simulants are created. A pandas.Timedelta.

The most interesting thing that that the BasePopulation component does is manage the age of our simulants. Back in the configuration_defaults property we specified an 'age_start' and 'age_end'. Here we use these to generate the age distribution of our initial population.

    age_start = pop_data.user_data.get("age_start", self.config.population.age_start)
    age_end = pop_data.user_data.get("age_end", self.config.population.age_end)

    if age_start == age_end:
        if not isinstance(pop_data.creation_window, (pd.Timedelta, timedelta)):
            raise TypeError(
                f"Expected creation_window to be a Timedelta, got {type(pop_data.creation_window)}"
            )
        age_window = pop_data.creation_window / pd.Timedelta(days=365)
    else:
        age_window = age_end - age_start

    age_draw = self.age_randomness.get_draw(pop_data.index)
    age = age_start + age_draw * age_window

We’ve built in support for two different kinds of populations based on the 'age_start' and 'age_end' specified in the configuration. If we get the same 'age_start' and 'age_end', we have a cohort, and so we smear out ages within the width of a single time step (the pop_data.creation_window). Otherwise, we assume our population is uniformly distributed within the age window bounded by 'age_start' and 'age_end'. You can use demographic data here to generate arbitrarily complex starting populations.

The only thing really of note here is the call to self.age_randomness.get_draw. If we recall from the setup method, self.age_randomness is an instance of a RandomnessStream which supports several convenience methods for interacting with random numbers. get_draw takes in an index representing particular simulants and returns a pandas.Series with a uniformly drawn random number for each simulant in the index.

Note

The Population Index

The population state table we described before has an index that uniquely identifies each simulant. This index is used in several places in the simulation to look up information, calculate simulant-specific values, and update information about the simulants’ state.

With the simulant ages defined, we then create a dataframe of simulant ages and entrance times.

    population = pd.DataFrame(
        {
            "entrance_time": pop_data.creation_time,
            "age": age.values,
        },
        index=pop_data.index,
    )

We then come back to the question of whether or not we’re using common random numbers in our system. In the setup method, our criteria for using common random numbers was that 'entrance_time' and 'age' were specified as the randomness key_columns in the configuration. These key_columns are what the randomness system uses to uniquely identify simulants across simulations.

Note that if we are using CRN, we must generate these columns before any other calls are made to the randomness system with the population index. We then register these simulants with the randomness system using self.register, a reference to the register_simulants method in the randomness management system. This is responsible for mapping the attributes of interest (here 'entrance_time' and 'age') to a particular set of random numbers that will be used across simulations with the same random seed.

    if self.with_common_random_numbers:
        self.register(population)

Once registered, we can generate the remaining attributes of our simulants with guarantees around reproducibility.

If we’re not using CRN, we simply don’t register the simulants and move on.

In either case, we are hanging on to a table representing some attributes of our new simulants. However, this table does not matter yet because the simulation’s population system doesn’t know anything about it. We must first inform the simulation by passing in the DataFrame to our population view's initialize method.

Note

Population Views

A PopulationView is a read/write interface to the population state table. It provides methods for reading attributes, initializing private columns, and updating private column data over the course of a simulation.

Warning

The data passed to the population view’s initialize method must have an index that is a subset of the pop_data.index. During initial population creation every simulant should be covered, but when simulants are added later (e.g. via fertility) the initializer may legitimately receive only a subset of the new simulants. Passing an index that contains simulants not in pop_data.index will cause hard-to-debug errors.

    self.population_view.initialize(population)

The initialize_sex method

def initialize_sex(self, pop_data: SimulantData) -> None:
    self.population_view.initialize(
        pd.Series(
            self.sex_randomness.choice(pop_data.index, ["Male", "Female"]), name="sex"
        )
    )

Thankfully, the initialize_sex method is much simpler than the previous initializer. Here, we simply call another randomness stream convenience method self.sex_randomness.choice to randomly assign a sex to each simulant. We then initialize the population via population view with these new values just like before.

The on_time_step method

The last piece of our population component is the 'time_step' listener method on_time_step.

def on_time_step(self, event: Event) -> None:
    """Updates simulant age on every time step.

    Parameters
    ----------
    event
        An event object emitted by the simulation containing an index
        representing the simulants affected by the event and timing
        information.
    """
    living_index = self.population_view.get_filtered_index(
        event.index, query="is_alive == True"
    )
    if not isinstance(event.step_size, (pd.Timedelta, timedelta)):
        raise TypeError(
            f"Expected step_size to be a Timedelta, got {type(event.step_size)}"
        )
    delta = event.step_size / pd.Timedelta(days=365)
    self.population_view.update(
        "age",
        lambda age: age.loc[living_index] + delta,
    )

This method takes in an ~ argument provided by the simulation. This is very similar to the SimulantData argument provided to initializer methods. It carries around some information about what’s happening in the event.

Note

Event

The event also has four attributes.

  • index : The population table index of the simulants responding to the event.

  • user_data : A (potentially empty) dictionary generated by the user in components that directly events.

  • time : The current simulation time. A pandas.Timestamp.

  • step_size : The size of the time step we’re about to take. A pandas.Timedelta.

It also supports some method for generating new events that we don’t care about here.

In order to age our simulants, we call the population view’s update() method with the column name and a modifier function. The modifier receives the current values of the private column as a Series and returns the updated values. The update method handles reading the current data from the component’s private columns and writing the result back to the state table.

Note

Private Columns vs. Attributes

The key distinction between attributes and private columns is that attributes are dynamically-calculated values while private columns are static values that act as the source of attribute pipelines. Attributes are read-only and are modified by components registering modifiers to their corresponding attribute pipelines. Private columns, on the other hand, are read and written by components directly - but only by the component that created them in the first place.

Refer to the population management documentation for more details.

The methods available to read attributes are:

The methods available to update private columns are: - initialize() - update()

Examining our work

Now that we’ve done all this hard work, let’s see what it gives us.

from vivarium.engine import InteractiveContext
from vivarium.engine.examples.disease_model.population import BasePopulation

config = {'randomness': {'key_columns': ['entrance_time', 'age']}}

sim = InteractiveContext(components=[BasePopulation()], configuration=config)

print(sim.get_population(['age', 'sex']).head())
         age     sex
0  13.806776  Female
1  59.172893    Male
2  11.030887  Female
3  27.723191  Female
4  51.052188  Female
         age     sex
0  13.806776  Female
1  59.172893    Male
2  11.030887  Female
3  27.723191  Female
4  51.052188  Female

Great! We generate a population with a non-trivial age and sex distribution. Let’s see what happens when our simulation takes a time step.

sim.step()
print(sim.get_population(['age', 'sex']).head())
         age     sex
0  13.809516  Female
1  59.175633    Male
2  11.033627  Female
3  27.725931  Female
4  51.054928  Female
         age     sex
0  13.809516  Female
1  59.175633    Male
2  11.033627  Female
3  27.725931  Female
4  51.054928  Female

Everyone gets older by exactly one time step! We could just keep taking steps in our simulation and people would continue getting infinitely older. This, of course, does not reflect how the world goes. Time to introduce the grim reaper.

Mortality

Now that we have demonstrated that population generation and aging works, let’s investigate the Mortality component. Note that Mortality is a sub-component of the BasePopulation component and comes for free when we request BasePopulation via the model specification; there is no need to add Mortality separately.

File: ~/code/vivarium/examples/disease_model/mortality.py
  1from __future__ import annotations
  2
  3from typing import Any
  4
  5import numpy as np
  6import pandas as pd
  7
  8from vivarium.engine import Component
  9from vivarium.engine.framework.engine import Builder
 10from vivarium.engine.framework.event import Event
 11from vivarium.engine.framework.population import SimulantData
 12
 13
 14class Mortality(Component):
 15
 16    ##############
 17    # Properties #
 18    ##############
 19
 20    # docs-start: configuration_defaults
 21    @property
 22    def configuration_defaults(self) -> dict[str, Any]:
 23        """A set of default configuration values for this component.
 24
 25        These can be overwritten in the simulation model specification or by
 26        providing override values when constructing an interactive simulation.
 27        """
 28        return {self.name: {"data_sources": {"mortality_rate": 0.01}}}
 29
 30    # docs-end: configuration_defaults
 31
 32    #####################
 33    # Lifecycle methods #
 34    #####################
 35
 36    # noinspection PyAttributeOutsideInit
 37    # docs-start: setup
 38    def setup(self, builder: Builder) -> None:
 39        """Performs this component's simulation setup.
 40
 41        The ``setup`` method is automatically called by the simulation
 42        framework. The framework passes in a ``builder`` object which
 43        provides access to a variety of framework subsystems and metadata.
 44
 45        Parameters
 46        ----------
 47        builder
 48            Access to simulation tools and subsystems.
 49        """
 50        self.randomness = builder.randomness.get_stream("mortality")
 51        builder.value.register_rate_producer(
 52            "mortality_rate", source=self.build_lookup_table(builder, "mortality_rate")
 53        )
 54        builder.population.register_initializer(
 55            initializer=self.initialize_is_alive, columns="is_alive", required_resources=[]
 56        )
 57
 58    # docs-end: setup
 59
 60    ########################
 61    # Event-driven methods #
 62    ########################
 63
 64    # docs-start: initialize_is_alive
 65    def initialize_is_alive(self, pop_data: SimulantData) -> None:
 66        """Called by the simulation whenever new simulants are added.
 67
 68        This component is responsible for creating and filling the 'is_alive' column
 69        in the population state table.
 70
 71        Parameters
 72        ----------
 73        pop_data
 74            A record containing the index of the new simulants, the
 75            start of the time step the simulants are added on, the width
 76            of the time step, and the age boundaries for the simulants to
 77            generate.
 78        """
 79        self.population_view.initialize(
 80            pd.Series(True, index=pop_data.index, name="is_alive")
 81        )
 82
 83    # docs-end: initialize_is_alive
 84
 85    # docs-start: on_time_step
 86    def on_time_step(self, event: Event) -> None:
 87        """Determines who dies each time step.
 88
 89        Parameters
 90        ----------
 91        event
 92            An event object emitted by the simulation containing an index
 93            representing the simulants affected by the event and timing
 94            information.
 95        """
 96        effective_rate = self.population_view.get(event.index, "mortality_rate")
 97        effective_probability = 1 - np.exp(-effective_rate)
 98        draw = self.randomness.get_draw(event.index)
 99        affected_simulants = draw < effective_probability
100        self.population_view.update(
101            "is_alive", lambda _: pd.Series(False, index=event.index[affected_simulants])
102        )
103
104    # docs-end: on_time_step

The purpose of this component is to determine who dies every time step based on a mortality rate. You’ll see many of the same framework features we used in the BasePopulation component used again here and a few new things.

Let’s dive in.

Default Configuration

Since we’re building our disease model without data to inform it, we’ll expose all the important bits of the model as parameters in the configuration.

    @property
    def configuration_defaults(self) -> dict[str, Any]:
        """A set of default configuration values for this component.

        These can be overwritten in the simulation model specification or by
        providing override values when constructing an interactive simulation.
        """
        return {self.name: {"data_sources": {"mortality_rate": 0.01}}}

Here we’re specifying the overall mortality rate in our simulation. Rates have units! We’ll phrase our model with rates specified in terms of events per person-year. So here we’re specifying a uniform mortality rate of 0.01 deaths per person-year. This is obviously not realistic, but using toy data like this is often extremely useful in validating a model.

The setup method

There is not a whole lot going on in this setup method, but there is one new concept we should discuss.

    def setup(self, builder: Builder) -> None:
        """Performs this component's simulation setup.

        The ``setup`` method is automatically called by the simulation
        framework. The framework passes in a ``builder`` object which
        provides access to a variety of framework subsystems and metadata.

        Parameters
        ----------
        builder
            Access to simulation tools and subsystems.
        """
        self.randomness = builder.randomness.get_stream("mortality")
        builder.value.register_rate_producer(
            "mortality_rate", source=self.build_lookup_table(builder, "mortality_rate")
        )
        builder.population.register_initializer(
            initializer=self.initialize_is_alive, columns="is_alive", required_resources=[]
        )

The first line simply adds a useful class attribute: the mortality randomness stream (which is used to answer the question “which simulants died at this time step?”).

The next bit is the main feature of note: the introduction of the values system. The values system provides a way of distributing the computation of a value over multiple components. This can be a bit difficult to grasp, but is vital to the way we think about components in Vivarium. The best way to understand this system is by example.

Note

Values vs Attributes

A value is generic and is simply something that is computed from a :term: pipeline <pipeline>. An attribute is a specific kind of value that is simulant-specific, stored in the population state table, and is computed from a attribute pipeline. Most values in vivarium are attributes.

In our current context we register a named attribute pipeline into the simulation called 'mortality_rate' via the builder.value.register_rate_producer method. The source for a value is always a callable which typically takes in a pandas.Index as its only argument. In this case, the source is a LookupTable, which is callable, so meets this requirement. Other things are possible, but not necessary for our current use case.

The 'mortality_rate' source is then responsible for returning a pandas.Series containing a base mortality rate for each simulant in the index to the values system. Other components may register themselves as modifiers to this base rate. We’ll see more of this once we get to the disease modeling portion of the tutorial.

The value system will coordinate how the base value is modified behind the scenes and return the results of all computations whenever the pipeline is called (in the on_time_step method in this case - see below).

Finally, we register an initializer method which is responsible for creating an 'is_alive' column in the state table.

The initialize_is_alive method

def initialize_is_alive(self, pop_data: SimulantData) -> None:
    """Called by the simulation whenever new simulants are added.

    This component is responsible for creating and filling the 'is_alive' column
    in the population state table.

    Parameters
    ----------
    pop_data
        A record containing the index of the new simulants, the
        start of the time step the simulants are added on, the width
        of the time step, and the age boundaries for the simulants to
        generate.
    """
    self.population_view.initialize(
        pd.Series(True, index=pop_data.index, name="is_alive")
    )

This very simple initializer method simply creates an 'is_alive' column in the state table and sets it to True for all simulants being initialized. Note again that we need to call the population view’s initialize method to actually modify the state table.

Notice also that when registering this method, we did not specify any required resources (since every simulant is set as alive regardless of anything else).

The on_time_step method

Similar to how we aged simulants in the population component, we determine which simulants die during 'time_step' events.

def on_time_step(self, event: Event) -> None:
    """Determines who dies each time step.

    Parameters
    ----------
    event
        An event object emitted by the simulation containing an index
        representing the simulants affected by the event and timing
        information.
    """
    effective_rate = self.population_view.get(event.index, "mortality_rate")
    effective_probability = 1 - np.exp(-effective_rate)
    draw = self.randomness.get_draw(event.index)
    affected_simulants = draw < effective_probability
    self.population_view.update(
        "is_alive", lambda _: pd.Series(False, index=event.index[affected_simulants])
    )

The very first thing we do is get the 'mortality_rate' attribute (which is calculated from the attribute pipeline we constructed during setup); these are the effective mortality rate for each person in the simulation. Right now this will just be the base mortality rate, but we’ll see how this changes once we bring in a disease. Importantly for now though, the pipeline is automatically rescaling the rate down to the size of the time steps we’re taking.

We then determine who died this time step. We turn our mortality rate into a probability of death in the given time step by assuming deaths are exponentially distributed and using the inverse distribution function. We then draw a uniformly distributed random number for each person and determine who died by comparing that number to the computed probability of death for the individual.

Finally, we update the state table is_alive column with the newly dead simulants.

Supplying a base mortality rate

As discussed above, the source for the 'mortality_rate' value is a LookupTable defined in component’s configuration.

In an actual simulation, we’d inform the base mortality rate with data specific to the age, sex, location, year (and potentially other demographic factors) that represent each simulant. We might disaggregate or interpolate our data here as well. Which is all to say, the source of a pipeline can do some pretty complicated stuff.

Did it work?

It’s a good time to check and make sure that what we did works. We’ve got a mortality rate of 0.01 deaths per person-year and we’re taking 1 day time steps, so we give ourselves a relatively large population this time so we can see the impact of our mortality component without taking too many steps.

from vivarium.engine import InteractiveContext
from vivarium.engine.examples.disease_model.population import BasePopulation

config = {
   'population': {
      'population_size': 100_000
   },
   'randomness': {
      'key_columns': ['entrance_time', 'age']
   }
}

sim = InteractiveContext(components=[BasePopulation()], configuration=config)
print(sim.get_population(['age', 'sex', 'mortality_rate', 'is_alive']).head())
         age     sex  mortality_rate  is_alive
0  13.806776  Female        0.000027      True
1  59.172893    Male        0.000027      True
2  11.030887  Female        0.000027      True
3  27.723191  Female        0.000027      True
4  51.052188  Female        0.000027      True
         age     sex  mortality_rate  is_alive
0  13.806776  Female        0.000027      True
1  59.172893    Male        0.000027      True
2  11.030887  Female        0.000027      True
3  27.723191  Female        0.000027      True
4  51.052188  Female        0.000027      True

Note that aside from modifying the population size in the config, we haven’t actually done anything different than before. Indeed, the ages and sexes of the first five simulants are the same. Here, however, we are not subsetting the dataframe to only show the 'age' and 'sex' columns, however, and so we see various others (notably, the 'mortality_rate' and 'is_alive' columns created by the Mortality component).

As we haven’t taken a time step yet, everyone should still be alive.

print(sim.get_population("is_alive").value_counts())
is_alive
True     100000
Name: count, dtype: int64
is_alive
True    100000
Name: count, dtype: int64

Now let’s run our simulation for a while and see what happens.

sim.take_steps(365)  # Run for one year with one day time steps
sim.get_population("is_alive").value_counts()
is_alive
True     99023
False      977
Name: count, dtype: int64

We simulated somewhere between 99,023 (if everyone died in the first time step) and 100,000 (if everyone died in the last time step) living person-years and saw 977 deaths. This means our empirical mortality rate is somewhere close to 0.0098 deaths per person-year, very close to the 0.01 rate we provided.

Disease

Todo

disease

Risk

Todo

risk

Intervention

Todo

interventions

Observer

We’ve spent some time showing how we can look at the population state table to see how it changes during an interactive simulation. However, we also typically want the simulation itself to record more sophisticated output. Further, we frequently work in non-interactive (or even distributed) environments where we simply don’t have access to the simulation object and so would like to write our output to disk. These recorded outputs (i.e. results) are referred to in vivarium as observations and it is the job of observers to register them to the simulation. Observers are vivarium components that are created by the user and added to the simulation via the model specification.

This example’s observers are shown below.

File: ~/code/vivarium/examples/disease_model/observer.py
 1from typing import Any
 2
 3import pandas as pd
 4
 5from vivarium.engine.framework.engine import Builder
 6from vivarium.engine.framework.event import Event
 7from vivarium.engine.framework.population import SimulantData
 8from vivarium.engine.framework.results import Observer
 9
10
11class DeathsObserver(Observer):
12    """Observes the number of deaths."""
13
14    #################
15    # Setup methods #
16    #################
17
18    def setup(self, builder: Builder) -> None:
19        builder.population.register_initializer(
20            initializer=self.initialize_previous_alive,
21            columns="previous_alive",
22            required_resources=["is_alive"],
23        )
24
25    def register_observations(self, builder: Builder) -> None:
26        builder.results.register_adding_observation(
27            name="dead",
28            requires_attributes=["is_alive", "previous_alive"],
29            pop_filter="previous_alive == True and is_alive == False",
30        )
31
32    ########################
33    # Event-driven methods #
34    ########################
35
36    def initialize_previous_alive(self, pop_data: SimulantData) -> None:
37        """Initialize simulants as alive"""
38        self.population_view.initialize(
39            pd.Series(True, index=pop_data.index, name="previous_alive")
40        )
41
42    def on_time_step_prepare(self, event: Event) -> None:
43        """Update the previous deaths column to the current deaths."""
44        previous_alive = self.population_view.get(event.index, "is_alive")
45        previous_alive.name = "previous_alive"
46        self.population_view.update("previous_alive", lambda _: previous_alive)
47
48
49class YllsObserver(Observer):
50    """Observes the years of lives lost."""
51
52    ##############
53    # Properties #
54    ##############
55
56    @property
57    def configuration_defaults(self) -> dict[str, Any]:
58        config = super().configuration_defaults
59        config["mortality"] = {"life_expectancy": 80.0}
60        return config
61
62    #####################
63    # Lifecycle methods #
64    #####################
65
66    def setup(self, builder: Builder) -> None:
67        self.life_expectancy = float(builder.configuration.mortality.life_expectancy)
68
69    #################
70    # Setup methods #
71    #################
72
73    def register_observations(self, builder: Builder) -> None:
74        builder.results.register_adding_observation(
75            name="ylls",
76            requires_attributes=["age", "is_alive", "previous_alive"],
77            pop_filter="previous_alive == True and is_alive == False",
78            aggregator=self.calculate_ylls,
79        )
80
81    def calculate_ylls(self, df: pd.DataFrame) -> float:
82        return float((self.life_expectancy - df["age"]).sum())

There are two observers that have each registered a single observation to the simulation: deaths and years of life lost (YLLs). It is important to note that neither of those observations are population state table columns; they are more complex results that require some computation to determine.

Note that the deaths observer actually creates a private column called 'previous_alive'. The purpose of this column is to distinguish newly-dead simulants (for adding purposes) from those that died in previous time steps. We update this column in the on_time_step_prepare method of the observer.

Why didn’t we update the 'previous_alive' values at the same time as the 'is_alive' values in the Mortality component’s on_time_step method? The reason is that the deaths observer records the number of deaths that occurred during the previous time step during the collect_metrics phase. By updating the 'is_alive' column during the time_step phase (which occurs before collect_metrics) and the 'previous_alive' column during the time_step_prepare phase (which occurs after collect_metrics), we ensure that the observer can distinguish which simulants died specifically during the previous time step.

In an interactive setting, we can access these observations via the sim.get_results() command. This will return a dictionary of all observations up to this point in the simulation.

from vivarium.engine import InteractiveContext
from vivarium.engine.examples.disease_model.population import BasePopulation
from vivarium.engine.examples.disease_model.observer import DeathsObserver, YllsObserver

config = {
    'population': {
        'population_size': 100_000
    },
    'randomness': {
        'key_columns': ['entrance_time', 'age']
    }
}

sim = InteractiveContext(
   components=[
      BasePopulation(),
      DeathsObserver(),
      YllsObserver(),
   ],
   configuration=config
)
sim.take_steps(365)  # Run for one year with one day time steps

print(sim.get_results()["dead"])
print(sim.get_results()["ylls"])
  stratification  value
0            all  977.0

  stratification         value
0            all  27720.319912

We see that after 365 days of simulation, 977 simlants have died and there has been a total of 27,720 years of life lost.

Note

The observer is responsible for recording observations in memory, but it is the responsibility of the user to write them to disk when in an interactive environment. When running a full simulation from the command line (i.e. in a non-interactive environment), the vivarium engine itself will automatically write the results to disk at the end of the simulation.

Running from the command line

Todo

running from the command line

Exploring some results

Todo

exploring some results