Disease Model
Todo
Motivate the development of the disease model. We’re trying to understand the impact of interventions.
Here we’ll produce a data-free disease model focusing on core Vivarium concepts. You can find more complicated versions of the components built here in the vivarium_public_health library. Those components must additionally deal with manipulating complex data which makes understanding what’s going on more complicated.
After this tutorial, you should be well poised to begin working with and examining those components.
Setup
I’m assuming you’ve read through the material in
getting started and are working in your
vivarium/examples package. If not, you should go there first.
Todo
package setup with __init__ and stuff
Building a population
In many ways, this is a bad place to start. The population component
is one of the more complicated components in the simulation as it typically is
responsible for bootstrapping some of the more interesting features in
vivarium.engine.
We need a population, though, so we’ll start with one here and defer explanation of some of the more complex pieces/systems until later.
~/code/vivarium/examples/disease_model/population.py 1# docs-start: imports
2from datetime import timedelta
3from typing import Any
4
5import pandas as pd
6
7from vivarium.engine import Component
8from vivarium.engine.examples.disease_model import Mortality
9from vivarium.engine.framework.engine import Builder
10from vivarium.engine.framework.event import Event
11from vivarium.engine.framework.population import SimulantData
12
13# docs-end: imports
14
15
16class BasePopulation(Component):
17 """Generates a base population with a uniform distribution of age and sex."""
18
19 ##############
20 # Properties #
21 ##############
22
23 # docs-start: configuration_defaults
24 @property
25 def configuration_defaults(self) -> dict[str, Any]:
26 """A set of default configuration values for this component.
27
28 These can be overwritten in the simulation model specification or by
29 providing override values when constructing an interactive simulation.
30 """
31 return {
32 "population": {
33 # The range of ages to be generated in the initial population
34 "age_start": 0,
35 "age_end": 100,
36 # Note: There is also a 'population_size' key.
37 },
38 }
39
40 # docs-end: configuration_defaults
41
42 # docs-start: sub_components
43 @property
44 def sub_components(self) -> list[Component]:
45 return [Mortality()]
46
47 # docs-end: sub_components
48
49 #####################
50 # Lifecycle methods #
51 #####################
52
53 # noinspection PyAttributeOutsideInit
54 # docs-start: setup
55 def setup(self, builder: Builder) -> None:
56 """Performs this component's simulation setup.
57
58 The ``setup`` method is automatically called by the simulation
59 framework. The framework passes in a ``builder`` object which
60 provides access to a variety of framework subsystems and metadata.
61
62 Parameters
63 ----------
64 builder
65 Access to simulation tools and subsystems.
66 """
67 self.config = builder.configuration
68
69 # docs-start: crn
70 self.with_common_random_numbers = bool(self.config.randomness.key_columns)
71 self.register = builder.randomness.register_simulants
72 if (
73 self.with_common_random_numbers
74 and not ["entrance_time", "age"] == self.config.randomness.key_columns
75 ):
76 raise ValueError(
77 "If running with CRN, you must specify ['entrance_time', 'age'] as"
78 "the randomness key columns."
79 )
80 # docs-end: crn
81
82 # docs-start: randomness
83 self.age_randomness = builder.randomness.get_stream(
84 "age_initialization",
85 initializes_crn_attributes=self.with_common_random_numbers,
86 )
87 self.sex_randomness = builder.randomness.get_stream("sex_initialization")
88 # docs-end: randomness
89
90 # docs-start: initializers
91 builder.population.register_initializer(
92 initializer=self.initialize_entrance_time_and_age,
93 columns=["entrance_time", "age"],
94 required_resources=[self.age_randomness],
95 )
96 builder.population.register_initializer(
97 initializer=self.initialize_sex,
98 columns="sex",
99 required_resources=[self.sex_randomness],
100 )
101 # docs-end: initializers
102
103 # docs-end: setup
104
105 ########################
106 # Event-driven methods #
107 ########################
108
109 # docs-start: initialize_entrance_time_and_age
110 def initialize_entrance_time_and_age(self, pop_data: SimulantData) -> None:
111 # docs-start: ages
112 age_start = pop_data.user_data.get("age_start", self.config.population.age_start)
113 age_end = pop_data.user_data.get("age_end", self.config.population.age_end)
114
115 if age_start == age_end:
116 if not isinstance(pop_data.creation_window, (pd.Timedelta, timedelta)):
117 raise TypeError(
118 f"Expected creation_window to be a Timedelta, got {type(pop_data.creation_window)}"
119 )
120 age_window = pop_data.creation_window / pd.Timedelta(days=365)
121 else:
122 age_window = age_end - age_start
123
124 age_draw = self.age_randomness.get_draw(pop_data.index)
125 age = age_start + age_draw * age_window
126 # docs-end: ages
127 # docs-start: population_dataframe
128 population = pd.DataFrame(
129 {
130 "entrance_time": pop_data.creation_time,
131 "age": age.values,
132 },
133 index=pop_data.index,
134 )
135 # docs-end: population_dataframe
136
137 # docs-start: crn_registration
138 if self.with_common_random_numbers:
139 self.register(population)
140 # docs-end: crn_registration
141
142 # docs-start: update_entrance_time_and_age
143 self.population_view.initialize(population)
144 # docs-end: update_entrance_time_and_age
145
146 # docs-end: initialize_entrance_time_and_age
147
148 # docs-start: initialize_sex
149 def initialize_sex(self, pop_data: SimulantData) -> None:
150 self.population_view.initialize(
151 pd.Series(
152 self.sex_randomness.choice(pop_data.index, ["Male", "Female"]), name="sex"
153 )
154 )
155
156 # docs-end: initialize_sex
157
158 # docs-start: on_time_step
159 def on_time_step(self, event: Event) -> None:
160 """Updates simulant age on every time step.
161
162 Parameters
163 ----------
164 event
165 An event object emitted by the simulation containing an index
166 representing the simulants affected by the event and timing
167 information.
168 """
169 living_index = self.population_view.get_filtered_index(
170 event.index, query="is_alive == True"
171 )
172 if not isinstance(event.step_size, (pd.Timedelta, timedelta)):
173 raise TypeError(
174 f"Expected step_size to be a Timedelta, got {type(event.step_size)}"
175 )
176 delta = event.step_size / pd.Timedelta(days=365)
177 self.population_view.update(
178 "age",
179 lambda age: age.loc[living_index] + delta,
180 )
181
182 # docs-end: on_time_step
There are a lot of things here. Let’s take them piece by piece.
Imports
from datetime import timedelta
from typing import Any
import pandas as pd
from vivarium.engine import Component
from vivarium.engine.examples.disease_model import Mortality
from vivarium.engine.framework.engine import Builder
from vivarium.engine.framework.event import Event
from vivarium.engine.framework.population import SimulantData
It’s typical to import all required objects at the top of each module. In this case,
we are importing pandas and the Vivarium
Component class because they are used
explicitly throughout the file. Further, we import several objects from python’s
typing package as well as three classes from the core Vivarium framework
which are used solely for typing
information in method signatures.
BasePopulation Instantiation
We define a class called BasePopulation that inherits from the Vivarium
Component. This inheritance is what
makes a class a proper Vivarium component and all the affordances that
come with that.
Default Configuration
@property
def configuration_defaults(self) -> dict[str, Any]:
"""A set of default configuration values for this component.
These can be overwritten in the simulation model specification or by
providing override values when constructing an interactive simulation.
"""
return {
"population": {
# The range of ages to be generated in the initial population
"age_start": 0,
"age_end": 100,
# Note: There is also a 'population_size' key.
},
}
You’ll see this sort of pattern repeated in many Vivarium components.
We declare a configuration block as a property for components. Vivarium has a cascading configuration system that aggregates configuration data from many locations. The configuration is essentially a declaration of the parameter space for the simulation.
The most important thing to understand is that configuration values are given default values provided by the components and that they can be overriden with a higher level system like a command line argument later.
This component specifically declares defaults for the age range for the initial population of simulants. It also notes that there is a ‘population_size’ key. This key has a default value set by Vivarium’s population management system.
Sub-components
@property
def sub_components(self) -> list[Component]:
return [Mortality()]
This property is a list of components that are managed by this component. In this
case, we see that when BasePopulation is set up, it will also set up an instance
of the Mortality component.
The __init__() method
Though Vivarium components are specific implementations of Python
classes you’ll notice
that many of the classes have very sparse __init__ methods. Indeed, this
BasePopulation class does not even have one defined at this level (though
there is one in the Component parent class it inherits from).
Due to the way the simulation bootstraps itself, the __init__ method is
usually only used to assign names to generic components and muck with
the configuration_defaults a bit. We’ll see more of this later.
The setup method
Instead of the __init__ method, most of the component initialization
takes place in the setup method.
The signature for the setup method is the same in every component.
When the framework is constructing the simulation it looks for a setup
method on each component and calls that method with a
Builder instance.
Note
The Builder
The builder object is essentially the simulation toolbox. It provides
access to several simulation subsystems:
builder.configuration: A dictionary-like representation of all of the parameters in the simulation.builder.lookup: A service for generating interpolated lookup tables. We won’t use these in this tutorial.builder.value: The value pipeline system. In many ways this is the heart of any Vivarium simulation. We’ll discuss this in great detail as we go.builder.event: Access to Vivarium’s event system. The primary use is to register listeners for'time_step'events.builder.population: The population management system. Registers population initializers (functions that fill in initial state information about simulants), gives access to views of the simulation state, and mediates updates to the simulation state. It also provides access to functionality for generating new simulants (e.g. via birth or migration), though we won’t use that feature in this tutorial.builder.randomness: Vivarium uses a variance reduction technique called Common Random Numbers to perform counterfactual analysis. In order for this to work, the simulation provides a centralized source of randomness.builder.time: The simulation clock.builder.components: The component management system. Primarily used for registering subcomponents for setup.builder.results: The results management system. This provides access to stratification and observation registration functions.
Let’s step through the setup method and examine what’s happening.
def setup(self, builder: Builder) -> None:
"""Performs this component's simulation setup.
The ``setup`` method is automatically called by the simulation
framework. The framework passes in a ``builder`` object which
provides access to a variety of framework subsystems and metadata.
Parameters
----------
builder
Access to simulation tools and subsystems.
"""
self.config = builder.configuration
# docs-start: crn
self.with_common_random_numbers = bool(self.config.randomness.key_columns)
self.register = builder.randomness.register_simulants
if (
self.with_common_random_numbers
and not ["entrance_time", "age"] == self.config.randomness.key_columns
):
raise ValueError(
"If running with CRN, you must specify ['entrance_time', 'age'] as"
"the randomness key columns."
)
# docs-end: crn
# docs-start: randomness
self.age_randomness = builder.randomness.get_stream(
"age_initialization",
initializes_crn_attributes=self.with_common_random_numbers,
)
self.sex_randomness = builder.randomness.get_stream("sex_initialization")
# docs-end: randomness
# docs-start: initializers
builder.population.register_initializer(
initializer=self.initialize_entrance_time_and_age,
columns=["entrance_time", "age"],
required_resources=[self.age_randomness],
)
builder.population.register_initializer(
initializer=self.initialize_sex,
columns="sex",
required_resources=[self.sex_randomness],
)
# docs-end: initializers
To start, we simply grab a copy of the simulation
configuration. This is essentially
a dictionary that supports .-access notation.
The next handful of lines interact with Vivarium’s
randomness system.
Several things are happening here.
First, we deal with the topic of Common Random Numbers, a variance reduction technique employed by the Vivarium framework to make it easier to perform counterfactual analysis. It’s not important to have a full grasp of this system at this point.
self.with_common_random_numbers = bool(self.config.randomness.key_columns)
self.register = builder.randomness.register_simulants
if (
self.with_common_random_numbers
and not ["entrance_time", "age"] == self.config.randomness.key_columns
):
raise ValueError(
"If running with CRN, you must specify ['entrance_time', 'age'] as"
"the randomness key columns."
)
Note
Common Random Numbers
The idea behind Common Random Numbers (or CRN) is to enable comparison between two simulations running under slightly different conditions. Conceptually, we achieve this by guaranteeing that the same events occur to the same people at the same time across simulations with the same random seed.
For example, suppose we have two simulations of the world. We model the world as it is in the first simulation and we introduce a vaccine for the flu in the second simulation. Unless the model explicitly encodes the causal relationship between flu vaccination and vehicle traffic patterns, the person who died in a vehicle accident on the 43rd time step in the first simulation will also die in a vehicle accident on the 43rd time step in the second simulation.
In practice, what the CRN system requires is a way to uniquely identify simulants across simulations. We need to randomly generate some simulant characteristics in a repeatable fashion and then use those characteristics to identify the simulants in the randomness system later. This is (typically) only handled by the population component. It’s vitally important to get right when doing counterfactual analysis, but it’s not especially important that you understand the mechanics of the implementation.
In this component we’re using some information about the configuration of the randomness system to let us know whether or not we care about using CRN. We’ll explore this later when we’re looking at running simulations with interventions.
Next, we grab actual randomness streams
from the framework.
self.age_randomness = builder.randomness.get_stream(
"age_initialization",
initializes_crn_attributes=self.with_common_random_numbers,
)
self.sex_randomness = builder.randomness.get_stream("sex_initialization")
get_stream is the only call most components make to the randomness system.
The best way to think about randomness streams is as decision points in your
simulation. Any time you need to answer a question that requires a random
number, you should be using a randomness stream linked to that question.
Here we have the questions “What age are my simulants when they enter the
simulation?” and “What sex are my simulants?”; we have assigned their corresponding
randomness streams to age_randomness and sex_randomness attributes, respectively.
For age_randomness, the initializes_crn_attributes argument
tells the stream that the simulants you’re asking this question about won’t already
be registered with the randomness system; this is the bootstrapping part. Here we’re
using the 'entrance_time' and 'age' to identify a simulant and so we need a
stream to initialize ages with. There is should really only be one of these
initialization streams in a simulation.
The 'sex_randomness' is a much more typical example of how to interact
with the randomness system - we are simply getting the stream.
Finally, we register two initializers with the population manager. This tells vivarium which initializers to call when creating the component as well as which columns (if any) each initializer is responsible for creating.
builder.population.register_initializer(
initializer=self.initialize_entrance_time_and_age,
columns=["entrance_time", "age"],
required_resources=[self.age_randomness],
)
builder.population.register_initializer(
initializer=self.initialize_sex,
columns="sex",
required_resources=[self.sex_randomness],
)
Note that each initializer registration requires us to specify which columns that initializer is creating (if any), the initializer method itself, and any required resources. In this case, each initializer depends on a randomness stream. The system will ensure that these required resources are set up before calling the initializer methods.
That was a lot of stuff
As I mentioned at the top, the population component is one of the more
complicated pieces of any simulation. It’s not important to grasp everything
right now. We’ll see many of the same patterns repeated in the setup
method of other components later. The unique things here are worth coming
back to at a later point once you have more familiarity with the framework
conventions.
The initializers
Two methods are registered as initializers. The primary purpose of initializer methods
is to generate the initial population. Specifically (for this class), the
initialize_entrance_time_and_age method is registered to create the ‘entrance_time’
and ‘age’ columns (with age_randomness as a dependency) and the
initialize_sex method is registered to create the ‘sex’ column (with
sex_randomness as a dependency).
Note
The Population State Table
When we talk about columns in the context of Vivarium, we are typically
talking about simulant attributes. All attributes for all
simulants can be represented by a single pandas.DataFrame where each simulant
is a row and each attribute is a column. This dataframe is referred to as the
population state table (or simply “state table”).
As previously mentioned, this class is a proper Vivarium Component. Among
other things, this means that much of the setup happens automatically during the
simulation’s Setup lifecycle phase.
There are several methods available to define for a component’s setup, depending
on what you want to happen when: on_post_setup()
on_time_step_prepare(), on_time_step(), on_time_step_cleanup(),
on_collect_metrics(), and on_simulation_end(). The framework looks for
any of these methods during the setup phase and calls them if they are defined.
Further, the framework calls any registered initializer methods during population
creation. The fact that initialize_entrance_time_and_age and initialize_sex
were registered guarantees that they will be called during the population initialization
phase of the simulation.
Initializer methods are called by the population manager whenever simulants
are created. For our purposes, this happens only once at the very beginning of
the simulation. Typically, we’d task another component with responsibility for
managing other ways simulants might enter (we might, for instance, have a
Migration component that knows about how and when people enter and exit
our location of interest or a Fertility component that handles new simulants
being born).
Let’s inspect the two initializer methods line by line as we did with setup.
The initialize_entrance_time_and_age method
def initialize_entrance_time_and_age(self, pop_data: SimulantData) -> None:
# docs-start: ages
age_start = pop_data.user_data.get("age_start", self.config.population.age_start)
age_end = pop_data.user_data.get("age_end", self.config.population.age_end)
if age_start == age_end:
if not isinstance(pop_data.creation_window, (pd.Timedelta, timedelta)):
raise TypeError(
f"Expected creation_window to be a Timedelta, got {type(pop_data.creation_window)}"
)
age_window = pop_data.creation_window / pd.Timedelta(days=365)
else:
age_window = age_end - age_start
age_draw = self.age_randomness.get_draw(pop_data.index)
age = age_start + age_draw * age_window
# docs-end: ages
# docs-start: population_dataframe
population = pd.DataFrame(
{
"entrance_time": pop_data.creation_time,
"age": age.values,
},
index=pop_data.index,
)
# docs-end: population_dataframe
# docs-start: crn_registration
if self.with_common_random_numbers:
self.register(population)
# docs-end: crn_registration
# docs-start: update_entrance_time_and_age
self.population_view.initialize(population)
# docs-end: update_entrance_time_and_age
First, we see that this method takes in a special argument that we don’t provide.
This argument, pop_data, is an instance of
SimulantData containing a
handful of information useful when initializing simulants.
Note
SimulantData
This simple structure only has four attributes (used here in the generic Python sense of the word).
index: The population table index of the simulants being initialized.user_data: A (potentially empty) dictionary generated by the user in components that directly create simulants.creation_time: The current simulation time. Apandas.Timestamp.creation_window: The size of the time step over which the simulants are created. Apandas.Timedelta.
The most interesting thing that that the BasePopulation component does
is manage the age of our simulants. Back in the configuration_defaults
property we specified an 'age_start' and 'age_end'. Here we use these
to generate the age distribution of our initial population.
age_start = pop_data.user_data.get("age_start", self.config.population.age_start)
age_end = pop_data.user_data.get("age_end", self.config.population.age_end)
if age_start == age_end:
if not isinstance(pop_data.creation_window, (pd.Timedelta, timedelta)):
raise TypeError(
f"Expected creation_window to be a Timedelta, got {type(pop_data.creation_window)}"
)
age_window = pop_data.creation_window / pd.Timedelta(days=365)
else:
age_window = age_end - age_start
age_draw = self.age_randomness.get_draw(pop_data.index)
age = age_start + age_draw * age_window
We’ve built in support for two different kinds of populations based on the
'age_start' and 'age_end' specified in the configuration. If we get
the same 'age_start' and 'age_end', we have a cohort, and so we smear
out ages within the width of a single time step (the pop_data.creation_window).
Otherwise, we assume our population is uniformly distributed within the age
window bounded by 'age_start' and 'age_end'. You can use demographic
data here to generate arbitrarily complex starting populations.
The only thing really of note here is the call to
self.age_randomness.get_draw. If we recall from the setup method,
self.age_randomness is an instance of a
RandomnessStream which supports several
convenience methods for interacting with random numbers. get_draw takes
in an index representing particular simulants and returns a
pandas.Series with a uniformly drawn random number for each simulant
in the index.
Note
The Population Index
The population state table we described before has an index that uniquely identifies each simulant. This index is used in several places in the simulation to look up information, calculate simulant-specific values, and update information about the simulants’ state.
With the simulant ages defined, we then create a dataframe of simulant ages and entrance times.
population = pd.DataFrame(
{
"entrance_time": pop_data.creation_time,
"age": age.values,
},
index=pop_data.index,
)
We then come back to the question of whether or not we’re using common
random numbers in our system. In the setup method, our criteria for
using common random numbers was that 'entrance_time' and 'age'
were specified as the randomness key_columns in the configuration.
These key_columns are what the randomness system uses to uniquely
identify simulants across simulations.
Note that if we are using CRN, we must generate these columns before any other calls
are made to the randomness system with the population index. We then
register these simulants with the randomness system using self.register,
a reference to the register_simulants method in the randomness management
system. This is responsible for mapping the attributes of interest (here
'entrance_time' and 'age') to a particular set of random numbers
that will be used across simulations with the same random seed.
if self.with_common_random_numbers:
self.register(population)
Once registered, we can generate the remaining attributes of our simulants with guarantees around reproducibility.
If we’re not using CRN, we simply don’t register the simulants and move on.
In either case, we are hanging on to a table representing some attributes of
our new simulants. However, this table does not matter yet because the
simulation’s population system doesn’t know anything about it. We must first
inform the simulation by passing in the DataFrame to our
population view's
initialize method.
Note
Population Views
A PopulationView is a
read/write interface to the population state table. It provides methods for
reading attributes, initializing private columns, and updating private column
data over the course of a simulation.
Warning
The data passed to the population view’s initialize method must have an
index that is a subset of the pop_data.index. During initial population
creation every simulant should be covered, but when simulants are added later
(e.g. via fertility) the initializer may legitimately receive only a subset of
the new simulants. Passing an index that contains simulants not in pop_data.index
will cause hard-to-debug errors.
self.population_view.initialize(population)
The initialize_sex method
def initialize_sex(self, pop_data: SimulantData) -> None:
self.population_view.initialize(
pd.Series(
self.sex_randomness.choice(pop_data.index, ["Male", "Female"]), name="sex"
)
)
Thankfully, the initialize_sex method is much simpler than the previous
initializer. Here, we simply call another randomness stream convenience method
self.sex_randomness.choice to randomly assign a sex to each simulant. We then
initialize the population via population view with these new values just like before.
The on_time_step method
The last piece of our population component is the 'time_step' listener
method on_time_step.
def on_time_step(self, event: Event) -> None:
"""Updates simulant age on every time step.
Parameters
----------
event
An event object emitted by the simulation containing an index
representing the simulants affected by the event and timing
information.
"""
living_index = self.population_view.get_filtered_index(
event.index, query="is_alive == True"
)
if not isinstance(event.step_size, (pd.Timedelta, timedelta)):
raise TypeError(
f"Expected step_size to be a Timedelta, got {type(event.step_size)}"
)
delta = event.step_size / pd.Timedelta(days=365)
self.population_view.update(
"age",
lambda age: age.loc[living_index] + delta,
)
This method takes in an ~ argument
provided by the simulation. This is very similar to the SimulantData
argument provided to initializer methods. It carries around
some information about what’s happening in the event.
Note
Event
The event also has four attributes.
index: The population table index of the simulants responding to the event.user_data: A (potentially empty) dictionary generated by the user in components that directly events.time: The current simulation time. Apandas.Timestamp.step_size: The size of the time step we’re about to take. Apandas.Timedelta.
It also supports some method for generating new events that we don’t care about here.
In order to age our simulants, we call the population view’s
update()
method with the column name and a modifier function. The modifier receives
the current values of the private column as a Series and
returns the updated values. The update method handles reading the current
data from the component’s private columns and writing the result back to the
state table.
Note
Private Columns vs. Attributes
The key distinction between attributes and private columns is that attributes are dynamically-calculated values while private columns are static values that act as the source of attribute pipelines. Attributes are read-only and are modified by components registering modifiers to their corresponding attribute pipelines. Private columns, on the other hand, are read and written by components directly - but only by the component that created them in the first place.
Refer to the population management documentation for more details.
- The methods available to read attributes are:
-
The methods available to update private columns are: -
initialize()-update()
Examining our work
Now that we’ve done all this hard work, let’s see what it gives us.
from vivarium.engine import InteractiveContext
from vivarium.engine.examples.disease_model.population import BasePopulation
config = {'randomness': {'key_columns': ['entrance_time', 'age']}}
sim = InteractiveContext(components=[BasePopulation()], configuration=config)
print(sim.get_population(['age', 'sex']).head())
age sex
0 13.806776 Female
1 59.172893 Male
2 11.030887 Female
3 27.723191 Female
4 51.052188 Female
age sex
0 13.806776 Female
1 59.172893 Male
2 11.030887 Female
3 27.723191 Female
4 51.052188 Female
Great! We generate a population with a non-trivial age and sex distribution. Let’s see what happens when our simulation takes a time step.
sim.step()
print(sim.get_population(['age', 'sex']).head())
age sex
0 13.809516 Female
1 59.175633 Male
2 11.033627 Female
3 27.725931 Female
4 51.054928 Female
age sex
0 13.809516 Female
1 59.175633 Male
2 11.033627 Female
3 27.725931 Female
4 51.054928 Female
Everyone gets older by exactly one time step! We could just keep taking steps in our simulation and people would continue getting infinitely older. This, of course, does not reflect how the world goes. Time to introduce the grim reaper.
Mortality
Now that we have demonstrated that population generation and aging works, let’s investigate the Mortality component. Note that Mortality is a sub-component of the BasePopulation component and comes for free when we request BasePopulation via the model specification; there is no need to add Mortality separately.
~/code/vivarium/examples/disease_model/mortality.py 1from __future__ import annotations
2
3from typing import Any
4
5import numpy as np
6import pandas as pd
7
8from vivarium.engine import Component
9from vivarium.engine.framework.engine import Builder
10from vivarium.engine.framework.event import Event
11from vivarium.engine.framework.population import SimulantData
12
13
14class Mortality(Component):
15
16 ##############
17 # Properties #
18 ##############
19
20 # docs-start: configuration_defaults
21 @property
22 def configuration_defaults(self) -> dict[str, Any]:
23 """A set of default configuration values for this component.
24
25 These can be overwritten in the simulation model specification or by
26 providing override values when constructing an interactive simulation.
27 """
28 return {self.name: {"data_sources": {"mortality_rate": 0.01}}}
29
30 # docs-end: configuration_defaults
31
32 #####################
33 # Lifecycle methods #
34 #####################
35
36 # noinspection PyAttributeOutsideInit
37 # docs-start: setup
38 def setup(self, builder: Builder) -> None:
39 """Performs this component's simulation setup.
40
41 The ``setup`` method is automatically called by the simulation
42 framework. The framework passes in a ``builder`` object which
43 provides access to a variety of framework subsystems and metadata.
44
45 Parameters
46 ----------
47 builder
48 Access to simulation tools and subsystems.
49 """
50 self.randomness = builder.randomness.get_stream("mortality")
51 builder.value.register_rate_producer(
52 "mortality_rate", source=self.build_lookup_table(builder, "mortality_rate")
53 )
54 builder.population.register_initializer(
55 initializer=self.initialize_is_alive, columns="is_alive", required_resources=[]
56 )
57
58 # docs-end: setup
59
60 ########################
61 # Event-driven methods #
62 ########################
63
64 # docs-start: initialize_is_alive
65 def initialize_is_alive(self, pop_data: SimulantData) -> None:
66 """Called by the simulation whenever new simulants are added.
67
68 This component is responsible for creating and filling the 'is_alive' column
69 in the population state table.
70
71 Parameters
72 ----------
73 pop_data
74 A record containing the index of the new simulants, the
75 start of the time step the simulants are added on, the width
76 of the time step, and the age boundaries for the simulants to
77 generate.
78 """
79 self.population_view.initialize(
80 pd.Series(True, index=pop_data.index, name="is_alive")
81 )
82
83 # docs-end: initialize_is_alive
84
85 # docs-start: on_time_step
86 def on_time_step(self, event: Event) -> None:
87 """Determines who dies each time step.
88
89 Parameters
90 ----------
91 event
92 An event object emitted by the simulation containing an index
93 representing the simulants affected by the event and timing
94 information.
95 """
96 effective_rate = self.population_view.get(event.index, "mortality_rate")
97 effective_probability = 1 - np.exp(-effective_rate)
98 draw = self.randomness.get_draw(event.index)
99 affected_simulants = draw < effective_probability
100 self.population_view.update(
101 "is_alive", lambda _: pd.Series(False, index=event.index[affected_simulants])
102 )
103
104 # docs-end: on_time_step
The purpose of this component is to determine who dies every time step based
on a mortality rate. You’ll see many of the same framework features we used
in the BasePopulation component used again here and a few new things.
Let’s dive in.
Default Configuration
Since we’re building our disease model without data to inform it, we’ll expose all the important bits of the model as parameters in the configuration.
@property
def configuration_defaults(self) -> dict[str, Any]:
"""A set of default configuration values for this component.
These can be overwritten in the simulation model specification or by
providing override values when constructing an interactive simulation.
"""
return {self.name: {"data_sources": {"mortality_rate": 0.01}}}
Here we’re specifying the overall mortality rate in our simulation. Rates have units! We’ll phrase our model with rates specified in terms of events per person-year. So here we’re specifying a uniform mortality rate of 0.01 deaths per person-year. This is obviously not realistic, but using toy data like this is often extremely useful in validating a model.
The setup method
There is not a whole lot going on in this setup method, but there is one new concept we should discuss.
def setup(self, builder: Builder) -> None:
"""Performs this component's simulation setup.
The ``setup`` method is automatically called by the simulation
framework. The framework passes in a ``builder`` object which
provides access to a variety of framework subsystems and metadata.
Parameters
----------
builder
Access to simulation tools and subsystems.
"""
self.randomness = builder.randomness.get_stream("mortality")
builder.value.register_rate_producer(
"mortality_rate", source=self.build_lookup_table(builder, "mortality_rate")
)
builder.population.register_initializer(
initializer=self.initialize_is_alive, columns="is_alive", required_resources=[]
)
The first line simply adds a useful class attribute: the mortality randomness stream (which is used to answer the question “which simulants died at this time step?”).
The next bit is the main feature of note: the introduction of the
values system.
The values system provides a way of distributing the computation of a
value over multiple components. This can be a bit difficult to grasp,
but is vital to the way we think about components in Vivarium. The best
way to understand this system is by example.
Note
Values vs Attributes
A value is generic and is simply something that is computed from a :term: pipeline <pipeline>. An attribute is a specific kind of value that is simulant-specific, stored in the population state table, and is computed from a attribute pipeline. Most values in vivarium are attributes.
In our current context we register a named attribute pipeline into the
simulation called 'mortality_rate' via the builder.value.register_rate_producer
method. The source for a value is always a callable which typically takes in a
pandas.Index as its only argument. In this case, the source is a LookupTable,
which is callable, so meets this requirement. Other things are possible, but not
necessary for our current use case.
The 'mortality_rate' source is then responsible for returning a
pandas.Series containing a base mortality rate for each simulant
in the index to the values system. Other components may register themselves
as modifiers to this base rate. We’ll see more of this once we get to the
disease modeling portion of the tutorial.
The value system will coordinate how the base value is modified behind the
scenes and return the results of all computations whenever the pipeline is
called (in the on_time_step method in this case - see below).
Finally, we register an initializer method which is responsible for creating an
'is_alive' column in the state table.
The initialize_is_alive method
def initialize_is_alive(self, pop_data: SimulantData) -> None:
"""Called by the simulation whenever new simulants are added.
This component is responsible for creating and filling the 'is_alive' column
in the population state table.
Parameters
----------
pop_data
A record containing the index of the new simulants, the
start of the time step the simulants are added on, the width
of the time step, and the age boundaries for the simulants to
generate.
"""
self.population_view.initialize(
pd.Series(True, index=pop_data.index, name="is_alive")
)
This very simple initializer method simply creates an 'is_alive' column in the state
table and sets it to True for all simulants being initialized. Note again
that we need to call the population view’s initialize method to actually modify
the state table.
Notice also that when registering this method, we did not specify any required resources (since every simulant is set as alive regardless of anything else).
The on_time_step method
Similar to how we aged simulants in the population component, we determine which
simulants die during 'time_step' events.
def on_time_step(self, event: Event) -> None:
"""Determines who dies each time step.
Parameters
----------
event
An event object emitted by the simulation containing an index
representing the simulants affected by the event and timing
information.
"""
effective_rate = self.population_view.get(event.index, "mortality_rate")
effective_probability = 1 - np.exp(-effective_rate)
draw = self.randomness.get_draw(event.index)
affected_simulants = draw < effective_probability
self.population_view.update(
"is_alive", lambda _: pd.Series(False, index=event.index[affected_simulants])
)
The very first thing we do is get the 'mortality_rate' attribute (which is calculated
from the attribute pipeline we constructed during setup); these are the effective
mortality rate for each person in the simulation.
Right now this will just be the base mortality rate, but we’ll see how
this changes once we bring in a disease. Importantly for now though, the
pipeline is automatically rescaling the rate down to the size of the time
steps we’re taking.
We then determine who died this time step. We turn our mortality rate into a probability of death in the given time step by assuming deaths are exponentially distributed and using the inverse distribution function. We then draw a uniformly distributed random number for each person and determine who died by comparing that number to the computed probability of death for the individual.
Finally, we update the state table is_alive column with the newly dead simulants.
Supplying a base mortality rate
As discussed above, the source for the 'mortality_rate' value is a LookupTable
defined in component’s configuration.
In an actual simulation, we’d inform the base mortality rate with data specific to the age, sex, location, year (and potentially other demographic factors) that represent each simulant. We might disaggregate or interpolate our data here as well. Which is all to say, the source of a pipeline can do some pretty complicated stuff.
Did it work?
It’s a good time to check and make sure that what we did works. We’ve got a mortality rate of 0.01 deaths per person-year and we’re taking 1 day time steps, so we give ourselves a relatively large population this time so we can see the impact of our mortality component without taking too many steps.
from vivarium.engine import InteractiveContext
from vivarium.engine.examples.disease_model.population import BasePopulation
config = {
'population': {
'population_size': 100_000
},
'randomness': {
'key_columns': ['entrance_time', 'age']
}
}
sim = InteractiveContext(components=[BasePopulation()], configuration=config)
print(sim.get_population(['age', 'sex', 'mortality_rate', 'is_alive']).head())
age sex mortality_rate is_alive
0 13.806776 Female 0.000027 True
1 59.172893 Male 0.000027 True
2 11.030887 Female 0.000027 True
3 27.723191 Female 0.000027 True
4 51.052188 Female 0.000027 True
age sex mortality_rate is_alive
0 13.806776 Female 0.000027 True
1 59.172893 Male 0.000027 True
2 11.030887 Female 0.000027 True
3 27.723191 Female 0.000027 True
4 51.052188 Female 0.000027 True
Note that aside from modifying the population size in the config, we haven’t actually
done anything different than before. Indeed, the ages and sexes of the first five
simulants are the same. Here, however, we are not subsetting the dataframe to only
show the 'age' and 'sex' columns, however, and so we see various others
(notably, the 'mortality_rate' and 'is_alive' columns created by the Mortality
component).
As we haven’t taken a time step yet, everyone should still be alive.
print(sim.get_population("is_alive").value_counts())
is_alive
True 100000
Name: count, dtype: int64
is_alive
True 100000
Name: count, dtype: int64
Now let’s run our simulation for a while and see what happens.
sim.take_steps(365) # Run for one year with one day time steps
sim.get_population("is_alive").value_counts()
is_alive
True 99023
False 977
Name: count, dtype: int64
We simulated somewhere between 99,023 (if everyone died in the first time step) and 100,000 (if everyone died in the last time step) living person-years and saw 977 deaths. This means our empirical mortality rate is somewhere close to 0.0098 deaths per person-year, very close to the 0.01 rate we provided.
Disease
Todo
disease
Risk
Todo
risk
Intervention
Todo
interventions
Observer
We’ve spent some time showing how we can look at the population state table to see
how it changes during an interactive simulation. However, we also typically want
the simulation itself to record more sophisticated output. Further, we frequently
work in non-interactive (or even distributed) environments where we simply don’t
have access to the simulation object and so would like to write our output to disk.
These recorded outputs (i.e. results) are referred to in vivarium as observations
and it is the job of observers to register them to the simulation.
Observers are vivarium
components that are created by the user
and added to the simulation via the model specification.
This example’s observers are shown below.
~/code/vivarium/examples/disease_model/observer.py 1from typing import Any
2
3import pandas as pd
4
5from vivarium.engine.framework.engine import Builder
6from vivarium.engine.framework.event import Event
7from vivarium.engine.framework.population import SimulantData
8from vivarium.engine.framework.results import Observer
9
10
11class DeathsObserver(Observer):
12 """Observes the number of deaths."""
13
14 #################
15 # Setup methods #
16 #################
17
18 def setup(self, builder: Builder) -> None:
19 builder.population.register_initializer(
20 initializer=self.initialize_previous_alive,
21 columns="previous_alive",
22 required_resources=["is_alive"],
23 )
24
25 def register_observations(self, builder: Builder) -> None:
26 builder.results.register_adding_observation(
27 name="dead",
28 requires_attributes=["is_alive", "previous_alive"],
29 pop_filter="previous_alive == True and is_alive == False",
30 )
31
32 ########################
33 # Event-driven methods #
34 ########################
35
36 def initialize_previous_alive(self, pop_data: SimulantData) -> None:
37 """Initialize simulants as alive"""
38 self.population_view.initialize(
39 pd.Series(True, index=pop_data.index, name="previous_alive")
40 )
41
42 def on_time_step_prepare(self, event: Event) -> None:
43 """Update the previous deaths column to the current deaths."""
44 previous_alive = self.population_view.get(event.index, "is_alive")
45 previous_alive.name = "previous_alive"
46 self.population_view.update("previous_alive", lambda _: previous_alive)
47
48
49class YllsObserver(Observer):
50 """Observes the years of lives lost."""
51
52 ##############
53 # Properties #
54 ##############
55
56 @property
57 def configuration_defaults(self) -> dict[str, Any]:
58 config = super().configuration_defaults
59 config["mortality"] = {"life_expectancy": 80.0}
60 return config
61
62 #####################
63 # Lifecycle methods #
64 #####################
65
66 def setup(self, builder: Builder) -> None:
67 self.life_expectancy = float(builder.configuration.mortality.life_expectancy)
68
69 #################
70 # Setup methods #
71 #################
72
73 def register_observations(self, builder: Builder) -> None:
74 builder.results.register_adding_observation(
75 name="ylls",
76 requires_attributes=["age", "is_alive", "previous_alive"],
77 pop_filter="previous_alive == True and is_alive == False",
78 aggregator=self.calculate_ylls,
79 )
80
81 def calculate_ylls(self, df: pd.DataFrame) -> float:
82 return float((self.life_expectancy - df["age"]).sum())
There are two observers that have each registered a single observation to the simulation: deaths and years of life lost (YLLs). It is important to note that neither of those observations are population state table columns; they are more complex results that require some computation to determine.
Note that the deaths observer actually creates a private column called 'previous_alive'.
The purpose of this column is to distinguish newly-dead simulants (for adding purposes)
from those that died in previous time steps. We update this column in the
on_time_step_prepare method of the observer.
Why didn’t we update the 'previous_alive' values at the same time
as the 'is_alive' values in the Mortality component’s on_time_step method?
The reason is that the deaths observer records the number of deaths that occurred during
the previous time step during the collect_metrics phase. By updating
the 'is_alive' column during the time_step phase (which occurs before
collect_metrics) and the 'previous_alive' column during the time_step_prepare
phase (which occurs after collect_metrics), we ensure that the observer
can distinguish which simulants died specifically during the previous time step.
In an interactive setting, we can access these observations via the
sim.get_results() command. This will return a dictionary of all
observations up to this point in the simulation.
from vivarium.engine import InteractiveContext
from vivarium.engine.examples.disease_model.population import BasePopulation
from vivarium.engine.examples.disease_model.observer import DeathsObserver, YllsObserver
config = {
'population': {
'population_size': 100_000
},
'randomness': {
'key_columns': ['entrance_time', 'age']
}
}
sim = InteractiveContext(
components=[
BasePopulation(),
DeathsObserver(),
YllsObserver(),
],
configuration=config
)
sim.take_steps(365) # Run for one year with one day time steps
print(sim.get_results()["dead"])
print(sim.get_results()["ylls"])
stratification value
0 all 977.0
stratification value
0 all 27720.319912
We see that after 365 days of simulation, 977 simlants have died and there has been a total of 27,720 years of life lost.
Note
The observer is responsible for recording observations in memory, but it is the responsibility of the user to write them to disk when in an interactive environment. When running a full simulation from the command line (i.e. in a non-interactive environment), the vivarium engine itself will automatically write the results to disk at the end of the simulation.
Running from the command line
Todo
running from the command line
Exploring some results
Todo
exploring some results