
Getting started¶
Installation¶
We use some functions of our toolbox for inference and forecast of the spread of the Coronavirus. We supply this toolbox via a github submodule, which can to be initialized while cloning the repository. Alternatively our toolbox can be installed with pip.
git clone --recurse-submodules git@github.com:Priesemann-Group/covid19_npis_europe.git
Next install all required python libaries. Tensorflow isn’t put in the requirements.txt to allow the installation of another distribution (tensorflow-gpu for instance)
pip install tensorflow==2.4.1
pip install -r requirements.txt
Now we should be setup and ready to run our code or look into the source code.
Running the simulation(s)¶
If you want to run our code there is a work flow you can follow.
- Create or generate a dataset.
We supply multiple scripts to generate different datasets. All of them can be found in the scripts/data_generators/ folder. All of them create the data files inside the data folder. You can run them easily with
python script.py
Alternatively you create your own dataset, we wrote are short guide which can help you get your data inserted into our model see here.
- Load dataset
Before we can start to fit with our model we have to load our data files. There are multiple ways to do this but all of them rely on the
covid19_npis.ModelParams
class. Have a look into the constructors to see all possibilities.The easiest way is to load a complete data folder e.g. data/Germany_bundesländer (generated with the Germany_bundesländer.py script).
modelParams = covid19_npis.ModelParams.from_folder("./data/Germany_bundesländer/")
- Generate model with data
Now according to the dimensions of our data (i.e. time,number of countries…) we create our model. This is done by passing the modelParams to our pymc4 model instance.
this_model = covid19_npis.model.main_model(modelParams)
- Sampling
Sampling is done with the
pymc4.sample()
function and our model from above. The sampling function generates anarviz.InferenceData
object, which we can later use for the plotting or for other sample stats.# A typical sample function call begin_time = time.time() log.info("Start sampling") trace = pm.sample( this_model, burn_in=200, num_samples=100, num_chains=2, xla=True, step_size=0.01, ) end_time = time.time() log.info("running time: {:.1f}s".format(end_time - begin_time)) Best practise is to measure the time the sampling takes and to save the trace after sampling. # Save the trace name, fpath = covid19_npis.utils.save_trace( trace, modelParams, fpath="./traces", )
- Plotting
Todo
Understanding our model¶
We supply our model which we used in our publication (wip). If you want to know how it works in detail have a look into our Methods section in the publication and the documentation here. You can also use our functions to create your own model but that could take some effort.
We suggest you start with the covid19_npis.model.main_model
and work your way threw from top to bottom. It is always helpful to have the tensorflow documentation. opened. We use tf.einsum
so you should have a look at Einstein notation if you don’t know it by heart yet.
Model¶
Main model¶
Disease spread¶
-
covid19_npis.model.disease_spread.
InfectionModel
(N, E_0_t, R_t, C, gen_kernel)[source]¶ This function combines a variety of different steps:
Converts the given
values to an exponential distributed initial
with an length of
this can be seen in
_construct_E_0_t()
.Calculates
for each time step using the given contact matrix
:
Calculates the
arrays i.e. new infectious for each age group and country, with the efficient reproduction matrix
, the susceptible pool
, the population size
and the generation interval
. This is done recursive for every time step.
- Parameters
E_0 – Initial number of infectious.
Dimensions: batch_dims, country, age_groupR_t – Reproduction number matrix.
Dimensions: time, batch_dims, country, age_groupN – Total population per country
Dimensions: country, age_groupC – inter-age-group Contact-Matrix (see 8)
Dimensions: country, age_group, age_groupgen_kernel – Normalized PDF of the generation interval
Dimensions: batch_dims(?), l
- Returns
Sample from distribution of new, daily cases
-
covid19_npis.model.disease_spread.
construct_generation_interval
(name='g', mu_k=120.0, mu_theta=0.04, theta_k=8.0, theta_theta=0.1, l=16)[source]¶ Generates the generation interval with two underlying gamma distributions for mu and theta
whereby the underlying distribution are modeled as follows
- Parameters
name (string) – Name of the distribution for trace and debugging.
mu_k (number, optional) – Concentration/k parameter for underlying gamma distribution of mu (
).
Default: 120mu_theta (number, optional) – Scale/theta parameter for underlying gamma distribution of mu (
).
Default: 0.04theta_k (number, optional) – Concentration/k parameter for underlying gamma distribution of theta (
).
Default: 8theta_theta (number, optional) – Scale/theta parameter for underlying gamma distribution of theta (
).
Default: 0.1l (number, optional) – Length of generation interval i.e
in the formula above
Default: 16
- Returns
Normalized generation interval pdf
-
covid19_npis.model.disease_spread.
construct_E_0_t
(modelParams, len_gen_interv_kernel, R_t, mean_gen_interv, mean_test_delay=10)[source]¶ Generates a prior for E_0_t, based on the observed number of cases during the first 5 days. Currently it is implemented to take the first value of R_t, and multiply the inverse of R_t with first observed values until the begin of the simulation is reached. This is then used as a prior for a lognormal distribution which set the E_0_t.
- Parameters
modelParams (
covid19_npis.ModelParams
) – Instance of modelParams, mainly used for number of age groups and number of countries.len_gen_interv_kernel (number) – …some description
R_t (tf.tensor) – Time dependent reproduction number tensor
.
Dimensions: time, batch, country, age groupmean_gen_interv (countries) – …some description
mean_test_delay (number, optional) – …some description
Default: 10
- Returns
- E_0_t:
some description
Dimensions: time, batch, country, age_group
-
covid19_npis.model.disease_spread.
construct_delay_kernel
(name, modelParams, loc, scale, length_kernel)[source]¶ Constructs delay
in hierarchical manner:
- Parameters
name – Name of the delay distribution
modelParams (
covid19_npis.ModelParams
) – Instance of modelParams, mainly used for number of age groups and number of countries.loc – Location of the hierarchical Lognormal distribution for the mean of the delay.
scale – Theta parameter for now#
length_kernel – Length of the delay kernel in days.
- Returns
Generator for gamma probability density function.
Dimensions: batch, country, kernel(time)
Todo
Think about sigma distribution and how to parameterize it. Also implement that.
Reproduction number¶
-
covid19_npis.model.reproduction_number.
_fsigmoid
(t, l, d)[source]¶ Calculates and returns
- Parameters
t – Time, “variable”
l – Length of the change point, determines scale
d – Date of the change point, determines location
-
covid19_npis.model.reproduction_number.
_create_distributions
(modelParams)[source]¶ Returns a dict of distributions for further processing/sampling with the following priors:
- Parameters
modelParams (
covid19_npis.ModelParams
) – Instance of modelParams, mainly used for number of age groups and number of countries.- Returns
interventions, distributions
-
covid19_npis.model.reproduction_number.
construct_R_t
(name, modelParams, R_0, include_noise=True)[source]¶ Constructs the time dependent reproduction number
for every country and age group. There are a lot of things happening here be sure to check our paper for more indepth explanations!
We build the effectivity in an hierarchical manner in the unbounded space:
The length of the change point depends on the intervention and whether the strength is increasing or decreasing:
The date of the begin of the intervention is also allowed to vary slightly around the date
given by the Oxford government response tracker:
And finally the time dependent reproduction number
:
We also sometimes call the time dependent reproduction number R_t!
- Parameters
name (str) – Name of the distribution (gets added to trace).
modelParams (
covid19_npis.ModelParams
) – Instance of modelParams, mainly used for number of age groups and number of countries.R_0 (tf.tensor) – Initial reproduction number. Should be constructed using
construct_R_0()
orconstruct_R_0_old()
.
Dimensions: batch, country, age group
- Returns
Time dependent reproduction number tensor
.
Dimensions: time, batch, country, age group
-
covid19_npis.model.reproduction_number.
construct_R_0
(name, modelParams, loc, scale, hn_scale)[source]¶ Constructs R_0 in the following hierarchical manner:
- Parameters
name (str) – Name of the distribution (gets added to trace).
modelParams (
covid19_npis.ModelParams
) – Instance of modelParams, mainly used for number of age groups and number of countries.loc (number) – Location parameter of the R^*_0 Normal distribution.
scale (number) – Scale paramter of the R^*_0 Normal distribution.
hn_scale (number) – Scale parameter of the sigma_{R^*, text{country}} HaflNormal distribution.
- Returns
R_0 tensor
Dimensions: batch, country, age_group
-
covid19_npis.model.reproduction_number.
construct_lambda_0
(name, modelParams, loc, scale, hn_scale)[source]¶ Constructs lambda_0 in the following hierarchical manner:
- Parameters
name (str) – Name of the distribution (gets added to trace).
modelParams (
covid19_npis.ModelParams
) – Instance of modelParams, mainly used for number of age groups and number of countries.loc (number) – Location parameter of the R^*_0 Normal distribution.
scale (number) – Scale paramter of the R^*_0 Normal distribution.
hn_scale (number) – Scale parameter of the sigma_{R^*, text{country}} HaflNormal distribution.
- Returns
R_0 tensor
Dimensions: batch, country, age_group
-
covid19_npis.model.reproduction_number.
construct_R_0_old
(name, modelParams, mean, beta)[source]¶ Old constructor of
using a gamma distribution:
- Parameters
name (string) – Name of the distribution for trace and debugging.
modelParams (
covid19_npis.ModelParams
) – Instance of modelParams, mainly used for number of age groups and number of countries.mean – Mean
of the gamma distribution.
beta – Rate
of the gamma distribution.
- Returns
R_0 tensor
Dimensions: batch, country, age_group
Number of tests¶
-
covid19_npis.model.number_of_tests.
weekly_modulation
(name, modelParams, cases)[source]¶ Adds a weekly modulation of the number of new cases:
The modulation is assumed to be the same for all age-groups within one country and determined by the “weight” and “offset” parameters. The weight follows a sigmoidal distribution with normal prior of “weight_cross”. The “offset” follows a VonMises distribution centered around 0 (Mondays) and a wide SD (concentration parameter = 2).
- Parameters
name (str or None,) – The name of the cases to be modulated (gets added to trace).
modelParams (
covid19_npis.ModelParams
) – Instance of modelParams, mainly used for number of age groups and number of countries.cases (tf.tensor) – The input array of daily new cases for countries and age groups
- Returns
cases_modulated
- Return type
tf.tensor
Todo
check prior parameters
different modulations across: age, country?
check: are (cumulative) case numbers same as in unmodulated case? need some kind of normalization?
store and plot parameters at end
-
covid19_npis.model.number_of_tests.
generate_testing
(name_total, name_positive, modelParams, new_E_t)[source]¶ High level function for generating/simulating testing behaviour.
Constructs B splines Delay cases
- Parameters
name_total (str,) – Name for the total tests performed
name_positive (str,) – Name for the positive tests performed
modelParams (
covid19_npis.ModelParams
) – Instance of modelParams, mainly used for number of age groups and number of countries.new_E_t (tf.Tensor) – New cases
Dimensions: batch, time, country, age_group
- Returns
(
,
Total and positive tests by age group and country
Dimensions: (batch, time, country, age_group) x 2
Todo
Add more documenation for this function
-
covid19_npis.model.number_of_tests.
_calc_positive_tests
(new_E_t_delayed, phi_plus, phi_age)[source]¶ - Parameters
name (str) – Name of the variable for the new positive cases
in the trace.
new_E_t_delayed (tf.Tensor) – New cases with reporting delay
Dimensions: batch, time, country, age_groupphi_plus (tf.Tensor) – Fraction of positive tests
Dimensions: batch, time, countryphi_age (tf.Tensor) – Fraction of positive tests
Dimensions: batch, age_group
- Returns
Dimensions: batch, time, country, age_group
-
covid19_npis.model.number_of_tests.
_calc_total_number_of_tests_performed
(new_E_t_delayed, phi_tests_reported, phi_plus, eta, xi)[source]¶ - Parameters
name (str) – Name of the variable for the total number of tests performed
in the trace.
new_E_t_delayed (tf.Tensor) – New cases with reporting delay
Dimensions: batch, time, country, age_groupphi_tests_reported (tf.Tensor) – Difference in fraction for different countries
Dimensions: batch, countryphi_plus (tf.Tensor) – Fraction of positive tests
Dimensions: batch, time, countryeta (tf.Tensor) – Number of traced persons per case with subsequent negative test per case
Dimensions: batch, time, countryxi (tf.Tensor) – Base rate of testing per day that leads to negative tests
Dimensions: batch, time, country
- Returns
Dimensions: batch, time, country, age_group
-
covid19_npis.model.number_of_tests.
_construct_phi_tests_reported
(name, modelParams, mu_loc=1.0, mu_scale=1.0, sigma_scale=1.0)[source]¶ Construct the different of the fraction of tests for each country in the following hierarchical manner:
- Parameters
name (str) – Name of the variable
. Will also appear in the trace with this name.
modelParams (
covid19_npis.ModelParams
) – Instance of modelParams, mainly used for number of age groups and number of countries.mu_loc (optional) – Location parameter for the Normal distribution
.
Default: 1.0mu_scale (optional) – Scale parameter for the Normal distribution
.
Default: 1.0sigma_scale (optional) – Scale parameter for the
HalfCauchy distribution.
Default: 1.0
- Returns
Dimensions: batch, country
-
covid19_npis.model.number_of_tests.
_construct_phi_age
(name, modelParams, sigma_scale=0.2)[source]¶ Fraction of positive tests
- Parameters
name (str) – Name of the variable
. Will also appear in the trace with this name.
modelParams (
covid19_npis.ModelParams
) – Instance of modelParams, mainly used for number of age groups and number of countries.sigma_scale – Scale parameter for the HalfNormal distribution
Default: 0.2
- Returns
Dimensions: batch, age_group
-
covid19_npis.model.number_of_tests.
_construct_reporting_delay
(name, modelParams, m_ast, mu_loc=1.5, mu_scale=0.4, theta_sigma_scale=0.2, m_sigma_scale=3.0)[source]¶ - Parameters
name (str) – Name of the reporting delay variable
modelParams (
covid19_npis.ModelParams
) – Instance of modelParams, mainly used for number of age groups and number of countries.m_ast (tf.Tensor) –
Dimensions: batch, country, splinemu_loc (optional) – Location parameter for the Normal distribution
Default: 1.5mu_scale (optional) – Scale parameter for the Normal distribution
Default: 0.4theta_sigma_scale (optional) – Scale parameter for the HalfNorml distribution
Default: 0.2m_sigma_scale (optional) – Scale parameter for the HalfNorml distribution
Default: 3.0
- Returns
Dimensions: batch, country, spline
-
covid19_npis.model.number_of_tests.
_calc_reporting_delay_kernel
(name, m, theta, length_kernel=14)[source]¶ Calculates the pdf for the gamma reporting kernel.
- Parameters
name – Name of the reporting delay kernel
m –
Dimensions: batch, time, countrytheta –
Dimensions: batch, countrylength_kernel (optional) – Length of the kernel in days
Default: 14 days
- Returns
Dimensions: batch,country, kernel, time
-
covid19_npis.model.number_of_tests.
construct_testing_state
(name_phi, name_eta, name_xi, name_m_ast, modelParams, num_knots, mu_cross_loc=0.0, mu_cross_scale=10.0, m_mu_loc=12.0, m_mu_scale=2.0, sigma_cross_scale=10.0, m_sigma_scale=1.0)[source]¶ where
with the distributions parametarized in the following hierarchical manner:
at last we transform the variables
- Parameters
name_phi (str) – Name of the fraction of positive tests variable
name_eta (str) – Name of the number of traced persons variable
name_xi (str) – Name of the base tests rate variable
name_m_ast (str) – Name of the testing delay variable
modelParams (
covid19_npis.ModelParams
) – Instance of modelParams, mainly used for number of age groups and number of countries.num_knots – Number of knots for the Bspline dimension.
mu_cross_loc (optional) – Location parameter for the three Normal distributions
Default: 0.0mu_cross_scale (optional) – Scale parameter for the three Normal distributions
Default: 10.0m_mu_loc (optional) – Location parameter for the Normal distribution
Default: 12.0m_mu_scale (optional) – Scale parameter for the Normal distribution
Default: 2.0sigma_cross_scale (optional) – Scale parameter for the three HalfCauchy distributions
Default: 10.0m_sigma_scale (optional) – Scale parameter for the HalfNormal distribution
Default: 1.0
- Returns
Testing state tuple
Dimensions: 4 x (batch, country, spline),
-
covid19_npis.model.number_of_tests.
construct_Bsplines_basis
(modelParams)[source]¶ Function to construct the basis functions for all BSplines, should only be called once. Uses splipy python library.
- Parameters
modelParams (
covid19_npis.ModelParams
) – Instance of modelParams, mainly used for number of age groups and number of countries.degree (optional) – degree corresponds to exponent of the splines i.e. degree of three corresponds to a cubic spline.
Default: 3knots (list, optional) – Knots array used for constructing the BSplines.
Default: one knot every 7 days
- Returns
Dimensions: time, knots?
-
covid19_npis.model.number_of_tests.
_calculate_Bsplines
(coef, basis)[source]¶ Calculates the Bsplines given the basis functions B and the coefficients x.
- Parameters
coef – Coefficients
.
Dimensions: …,country, splinebasis – Basis functions tensor
Dimensions: time, spline
- Returns
Dimensions: …,time, country
Deaths¶
-
covid19_npis.model.deaths.
_construct_reporting_delay
(name, modelParams, theta_sigma_scale=0.3, theta_mu_loc=1.5, theta_mu_scale=0.3, m_sigma_scale=4.0, m_mu_loc=21.0, m_mu_scale=2.0)[source]¶ - Parameters
name (str) – Name of the reporting delay variable
modelParams (
covid19_npis.ModelParams
) – Instance of modelParams, mainly used for number of age groups and number of countries.theta_sigma_scale (optional) – Scale parameter for the Normal distribution
Default: 0.3theta_mu_loc (optional) – Location parameter for the Normal distribution
Default: 1.5theta_mu_scale (optional) – Scale parameter for the HalfNormal distribution
Default: 0.3m_sigma_scale (optional) – Scale parameter for the HalfNormal distribution
Default: 4.0m_mu_loc (optional) – Location parameter for the Normal distribution
Default: 21.0m_mu_scale (optional) – Scale parameter for the Normal distribution
Default: 2.0
- Returns
(m, theta)
Dimensions: (batch, country) x 2
-
covid19_npis.model.deaths.
_calc_Phi_IFR
(name, modelParams, alpha_loc=0.119, alpha_scale=0.003, beta_loc=-7.53, beta_scale=0.4)[source]¶ Calculates and construct the IFR and Phi_IFR:
- Parameters
name (str) – Name of the infection fatatlity ratio variable
modelParams (
covid19_npis.ModelParams
) – Instance of modelParams, mainly used for number of age groups and number of countries.alpha_loc (optional) –
Default: 0.119alpha_scale (optional) –
Default: 0.003beta_loc (optional) –
Default: -7.53beta_scale (optional) –
Default: 0.4
- Returns
Phi_IFR
Dimensions: batch, country, age brackets
-
covid19_npis.model.deaths.
calc_delayed_deaths
(name, new_cases, Phi_IFR, m, theta, length_kernel=40)[source]¶ Calculates delayed deahs from IFR and delay kernel.
- Parameters
name (str) – Name of the delayed deaths variable
new_cases (tf.Tensor) – New cases without reporting delay
Dimensions: batch, time, country, age_groupPhi_IFR (tf.Tensor) – Infection fatality ratio of the age brackets
Dimensions: batch, country, age_groupm (tf.Tensor) – Median fatality delay for the delay kernel
Dimensions: batch, countrytheta (tf.Tensor) – Scale fatality delay for the delay kernel
Dimensions: batchlength_kernel (optional) – Length of the kernel in days
Default: 14 days
- Returns
Dimensions: batch, time, country, age_group
Utility¶
-
covid19_npis.model.utils.
gamma
(x, alpha, beta)[source]¶ Returns a gamma kernel evaluated at x. The implementation is the same as defined in the tfp.gamma distribution which is probably quiet numerically stable. :param x: :param alpha: :param beta:
-
covid19_npis.model.utils.
positive_axes
(axes, ndim)[source]¶ Given a list of axes, returns them written as positive numbers
-
covid19_npis.model.utils.
match_axes
(tensor, target_axes, ndim=None)[source]¶ Extend and transpose dimensions, such that the dimension i of tensor is at the position target_axes[i]. Missing dimension are filled with size 1 dimensions. This is therefore a generalization of tf.expand_dims and tf.transpose and implemented using these. If ndim is None, the number of the dimensions of the result is the minimum fullfilling the requirements of target_axes
- Parameters
tensor (tf.Tensor) – The input tensor with len(tensor.dims) == len(target_axes)
target_axes (list of ints) – Target positions of the dimensions. Can be negative.
- Returns
The transposed and expanded tensor.
- Return type
tensor
-
covid19_npis.model.utils.
einsum_indexed
(tensor1, tensor2, inner1=(), inner2=(), outer1=(), outer2=(), vec1=(), vec2=(), targ_outer1=(), targ_outer2=())[source]¶ Calling tf.einsum with indices instead of a string. For example einsum_indexed(t1, t2, inner1=1, inner2=0, outer1=0, outer2=1) corresponds to the tf.einsum string “ab…,bc…->ac…” (Matrix product) and a matrix vector product “…ab,…b,->…a” is parameterized by einsum_indexed(t1, t2, inner1=-1, inner2=-1, vec1=-2)
- Parameters
tensor1 (tensor) – Input tensor 1
tensor2 (tensor) – Input tensor 2
inner1 (int or list) – The axes in tensor 1 over which a inner product is taken
inner2 (int or list) – The axes indices in tensor 2 over which a inner product is taken
outer1 (int or list) – The axes indices in tensor 1 over which a outer product is taken
outer2 (int or list) – The axes indices in tensor 2 over which a outer product is taken
vec1 (int or list) – The axes indices of the matrix in a matrix-vector product which are “staying” in the result. This is for the case where tensor1 corresponds to the matrix.
vec2 (int or list) – The axes indices of the matrix in a matrix-vector product which are “staying” in the result. This is for the case where tensor2 corresponds to the matrix.
targ_outer1 (int or list) – The axes indices in the result where the outer product axes of tensor 1 is mapped to. If omitted, the position is inferred such that the order stays the same, and, if equal, the indices of tensor 1 are to the left of the indices of tensor2 for outer products.
targ_outer2 (int or list) – The axes indices in the result where the outer product axes of tensor 2 is mapped to. If omitted, the position is inferred such that the order stays the same, and, if equal, the indices of tensor 1 are to the left of the indices of tensor2 for outer products.
- Returns
- Return type
tensor
-
covid19_npis.model.utils.
concatenate_axes
(tensor, axis1, axis2)[source]¶ Concatenates two consecutive axess
-
covid19_npis.model.utils.
slice_of_axis
(tensor, axis, begin, end)[source]¶ Returns the tensor where the axis axis is sliced from begin to end
-
covid19_npis.model.utils.
convolution_with_fixed_kernel
(data, kernel, data_time_axis, filter_axes_data=())[source]¶ Convolve data with a time independent kernel. The returned shape is equal to the shape of data. In order avoid constructing a time_length x time_length kernel, the data is decomposed in overlapping frames, with a stride of padding, allowing to construct a only padding x time_length sized kernel.
- Parameters
data (tensor) – The input tensor
kernel (tensor) – Has as shape filter_axes x time. filter_axes can be several axes, where in each dimension a difference kernel is located
data_time_axis (int) – the axis of data which corresponds to the time axis
filter_axes_data (tuple) – the axes of data, to which the filter_axes of kernel should be mapped to. Each of this dimension is therefore subject to a different filter
- Returns
- Return type
A convolved tensor with the same shape as data.
-
covid19_npis.model.utils.
convolution_with_varying_kernel
(data, kernel, data_time_axis, filter_axes_data=())[source]¶ Convolve data with a time dependent kernel. The returned shape is equal to the shape of data. In this implementation, the kernel will be augmented by a time_data axis, and then the inner product with the date will be taken. This is not an optimal implementation, as the most of the entries of the kernel inner product matrix will be zero.
- Parameters
data (tensor) – The input tensor
kernel (tensor) – Has as shape filter_axes x time_kernel x time_data. filter_axes can be several axes, where in each dimension a difference kernel is located
data_time_axis (int) – the axis of data which corresponds to the time axis
filter_axes_data (tuple) – the axes of data, to which the filter_axes of kernel should be mapped to. Each of this dimension is therefore subject to a different filter
- Returns
- Return type
A convolved tensor with the same shape as data.
Plot¶
There are multiple stages involved before one can start to plot the obtained data.

For model description, see Model.
Trace data can be converted with
covid19_npis.data.convert_trace_to_pandas_list()
.Plotting WIP
Data converter¶
-
covid19_npis.data.
convert_trace_to_dataframe_list
(trace, sample_state)[source]¶ Converts the pymc4 arviz trace to multiple pandas dataframes. Also sets the right labels for the dimensions i.e splits data by country and age group.
Do not look too much into this function if you want to keep your sanity!
- Parameters
trace (arivz InferenceData) –
sample_state (pymc4 sample state) –
- Returns
Multiindex dataframe containing all samples by chain and other dimensions defined in config.py
- Return type
list of pd.DataFrame
-
covid19_npis.data.
convert_trace_to_dataframe
(trace, sample_state, key, data_type=None)[source]¶ Converts the pymc4 arviz trace for a single key to a pandas dataframes. Also sets the right labels for the dimensions i.e splits data by country and age group.
Do not look too much into this function if you want to keep your sanity!
- Parameters
trace (arivz InferenceData) –
sample_state (pymc4 sample state) –
key (str) – Name of variable in modelParams
data_type (str) – Type of trace, gets detected automatically normally. Possible values are: “posterior”, “prior_predictive”, “posterior_predictive”. Overwrites automatic behaviour! default: None
- Returns
Multiindex dataframe containing all samples by chain and other dimensions defined in modelParams.py
- Return type
pd.DataFrame
Distribution¶
-
covid19_npis.plot.distributions.
distribution
(trace, sample_state, key, dir_save=None, plot_age_groups_together=True, force_matrix=False)[source]¶ High level plotting function for distributions, plots prior and posterior if they are given in the trace.
- Parameters
trace (arivz.InferenceData) – Raw data from pymc4 sampling, can contain both posterior data and prior data. Or only one of both!
sample_state (pymc4 sample state) – Used mainly for shape labels
key (str) – Name of the variable to plot
dir_save (str, optional) – where to save the the figures (expecting a folder). Does not save if None
Default: Noneforce_matrix (bool, optional) – Forces matrix plotting behaviour for last two dimensions
Default: False
- Returns
one figure for each country
- Return type
array of mpl figures
-
covid19_npis.plot.distributions.
distribution_matrix
(trace, sample_state, key, dir_save=None)[source]¶ High level function to create a distribution plot for matrix like variables e.g. ‘C’. Uses last two dimensions for matrix like plot.
- Parameters
trace (arivz.InferenceData) – Raw data from pymc4 sampling, can contain both posterior data and prior data. Or only one of both!
sample_state (pymc4 sample state) – Used mainly for shape labels
key (str) – Name of the variable to plot
dir_save (str, optional) – where to save the the figures (expecting a folder). Does not save if None
Default: None
- Returns
- Return type
axes
-
covid19_npis.plot.distributions.
_distribution
(array_posterior, array_prior, dist_name, dist_math, suffix='', ax=None)[source]¶ Low level function to plots posterior and prior from arrays.
- Parameters
array_prior (array_posterior,) – Sampling data as array, should be filtered beforehand. If none it does not get plotted!
dist_name (str) – name of distribution for plotting
dist_math (str) – math of distribution for plotting
suffix (str,optional) – Suffix for the plot title e.g. “age_group_1”
Default: “”ax (mpl axes element, optional) – Plot into an existing axes element
Default:None
-
covid19_npis.plot.distributions.
_plot_prior
(x, ax=None, **kwargs)[source]¶ Low level plotting function, plots the prior as line for sampling data by using kernel density estimation. For more references see scipy documentation.
It is highly recommended to pass an axis otherwise the xlim may be a bit wonky.
- Parameters
x – Input values, from sampling
ax (mpl axes element, optional) – Plot into an existing axes element
Default:None
kwargs (dict, optional) – Directly passed to plotting mpl.
-
covid19_npis.plot.distributions.
_plot_posterior
(x, bins=50, ax=None, **kwargs)[source]¶ Low level plotting function to plot an sampling data as histogram.
Time series¶
-
covid19_npis.plot.time_series.
timeseries
(trace, sample_state, key, sampling_type='posterior', plot_chain_separated=False, plot_age_groups_together=True, dir_save=None, observed=None)[source]¶ High level plotting fucntion to create time series for a a give variable, i.e. plot for every additional dimension. Can only be done for variables with a time or date in shape_labels!
Does NOT plot observed cases, these have to be added manually for now.
- Parameters
trace_prior (trace_posterior,) – Raw data from pymc4 sampling
sample_state (pymc4 sample stae) –
key (str) – Name of the timeseries variable to plot. Same name as in the model definitions.
sampling_type (str, optional) – Name of the type (group) in the arviz inference data.
Default: posteriordir_save (str, optional) – where to save the the figures (expecting a folder). Does not save if None
Default: Noneobserved (pd.DataFrame, optional) – modelParams dataframe for the corresponding observed values for the variable e.g. modelParams.pos_tests_dataframe
-
covid19_npis.plot.time_series.
_timeseries
(x, y, ax=None, what='data', draw_ci_95=None, draw_ci_75=None, draw_ci_50=None, **kwargs)[source]¶ low-level function to plot anything that has a date on the x-axis.
- Parameters
x (array of datetime.datetime) – times for the x axis
y (array, 1d or 2d) – data to plot. if 2d, we plot the CI as fill_between (if CI enabled in rc params) if 2d, then first dim is realization and second dim is time matching x if 1d then first tim is time matching x
ax (mpl axes element, optional) – plot into an existing axes element. default: None
what (str, optional) – what type of data is provided in x. sets the style used for plotting: * data for data points * fcast for model forecast (prediction) * model for model reproduction of data (past)
kwargs (dict, optional) – directly passed to plotting mpl.
- Returns
- Return type
ax
Data & ModelParams¶
We apply a utility/abstraction layer to our data to simplify plotting and other operations later.
Before we can construct our model Parameters, we have to import and manipulate/restructure our data a bit as follows:

Data¶
-
class
covid19_npis.data.
Country
(path_to_folder)[source]¶ Country data class! Contains death, new_cases/positive tests, daily tests, interventions and config data for a specific country. Retrieves this data from a gives folder. There are the following specifications for the data:
- new_cases.csv
Time/Date column has to be named “date” or “time”
Age group columns have to be named consistent between different data and countries
- interventions.csv
Time/Date column has to be named “date” or “time”
Different intervention as additional columns with intervention name as column name
- tests.csv
Time/Date column has to be named “date” or “time”
Daily performed tests column with name “tests”
- deaths.csv
Time/Date column has to be named “date” or “time”
Daily deaths column has to be named “deaths”
Optional: Daily deaths per age group same column names as in new_cases
- population.csv
Age column named “age”
Column Number of people per age named “PopTotal”
- config.json, dict:
name : “country_name”
- age_groupsdict
“column_name” : [age_lower, age_upper]
Also calculates change points and interventions automatically on init.
- Parameters
path_to_folder (string) – Filepath to the folder, which holds all the data for the country! Should be something like “../data/Germany”. That is new_cases.csv, interventions.csv, population.csv
-
create_change_points
(df)[source]¶ Create change points for a single intervention and also adds interventions if they do not exist yet.
- Parameters
df (pandas.DataFrame) – Single intervention column with datetime index.
- Returns
Change points dict
{name:[cps]}
-
classmethod
add_intervention
(name, num_stages)[source]¶ Constructs and adds intervention to the class attributes if that intervention does not exist yet! This is done by name check.
- Parameters
name (string) – Name of the intervention
time_series (pandas.DataFrame) – Intervention indexs as time series with datetime index!
-
classmethod
set_intervention_alpha_prior
(name, prior_loc, prior_scale)[source]¶ Manual set prior for effectivity alpha for a intervention via the name. That is it set prior_alpha_loc and prior_alpha_scale of a Intervention instance.
- Parameters
name (string) – Name of intervention
prior_loc (number) –
prior_scale (number) –
-
class
covid19_npis.data.
Intervention
(name, num_stages, prior_alpha_loc=0.1, prior_alpha_scale=0.1)[source]¶ - Parameters
name (string) – Name of the intervention
num_stages (int,) – Number of different stages the intervention can have.
prior_alpha_loc (number, optional) –
prior_alpha_scale (number, optional) –
-
class
covid19_npis.data.
Change_point
(date_data, gamma_max)[source]¶ - Parameters
prior_date_loc (number) – Mean of prior distribution for the location (date) of the change point.
gamma_max – Gamma max value for change point
length (number, optional) – Length of change point
prior_date_scale (number, optional) – Scale of prior distribution for the location (date) of the change point.
ModelParams¶
-
class
covid19_npis.
ModelParams
(countries, offset_sim_data=20, minimal_daily_cases=40, min_offset_sim_death_data=40, minimal_daily_deaths=10, spline_degree=3, spline_stride=7, dtype='float32')[source]¶ This is a class for all model parameters. It is mainly used to have a convenient to access data in model wide parameters e.g. start date for simulation.
This class also contains the data used for fitting. dataframe is the original dataframe. data_tensor is a tensor in the correct shape (time x countries x age) with values replace by nans when no data is available.
- Parameters
countries (list,
covid19_npis.data.Country
) – Data objects for multiple countries
-
classmethod
from_folder
(fpath, **kwargs)[source]¶ Create modelParams class from folder containing differet regions or countrys
-
property
countries
¶ Data objectes for each country.
- Returns
List of all country object
-
property
data_summary
¶ Data summary for modelParams object.
-
property
gamma_data_tensor
¶ Creates a ragged tensor with dimension intervention, country, change_points The change points dimension can have different sizes.
-
property
date_data_tensor
¶ Creates a tensor with dimension intervention, country, change_points Padded with 0.0 for none existing change points
-
property
pos_tests_dataframe
¶ New cases as multiColumn dataframe level 0 = country/region and level 1 = age group.
-
property
pos_tests_data_array
¶ Numpy Array of daily new cases / positive tests for countries/regions and age groups.
- Returns
Dimensions: time, country, agegroup- Return type
-
property
pos_tests_data_tensor
¶ Tensor of daily new cases / positive tests for countries/regions and age groups.
- Returns
Dimensions: time, country, agegroup- Return type
-
property
total_tests_dataframe
¶ Dataframe of total tests in all countries. Datetime index and country columns as Multiindex.
-
property
total_tests_data_tensor
¶ returns:
Dimensions: time, country :rtype: tf.Tensor
-
property
deaths_dataframe
¶ Dataframe of deaths in all countries. Datetime index and country columns as Multiindex.
-
property
deaths_data_tensor
¶ returns:
Dimensions: time, country :rtype: tf.Tensor
-
property
N_dataframe
¶ Dataframe of population in all countries. Datetime index and country columns as Multiindex.
-
property
N_data_tensor
¶ Creates the population tensor with automatically calculated age strata/brackets.
Dimensions: country, age_groups
-
property
N_data_tensor_total
¶ Creates the population tensor for every age.
Dimensions: country, age
-
property
indices_begin_data
¶ Returns the index of every country when the first case is reported. It could be that for some countries, the index is later than self.offset_sim_data.
-
property
length_data
¶ returns: Length of the inserted/loaded data in days :rtype: number
-
property
length_sim
¶ returns: Length of the simulation in days. :rtype: number
-
property
spline_basis
¶ Calculates B-spline basis.
- Returns
- Return type
Dimensions: modelParams.length_sim, modelParams.num_splines
Contributing¶
Code formatting¶
We use black https://github.com/psf/black as automatic code formatter. Please run your code through it before you open a pull request.
We do not check for formatting in the testing (travis) but have a config in the repository that uses black as a pre-commit hook.
This snippet should get you up and running:
conda install -c conda-forge black
conda install -c conda-forge pre-commit
pre-commit install
Try to stick to PEP 8. You can use type annotations if you want, but it is not necessary or encouraged.
Testing¶
We use travis and pytest. To check your changes locally:
python -m pytest --log-level=INFO --log-cli-level=INFO
It would be great if anything that is added to the code-base has an according test in the tests
folder. We are not there yet, but it is on the todo. Be encouraged to add tests :)
Documentation¶
The documentation is built using Sphinx from the docstrings. To test it before submitting, navigate with a terminal to the docs/ directory. Install (if necessary) the required python packages for the documentation and compile the documentation.
cd docs
pip install -r piprequirements.txt
make html
The documentation can now be be accessed locally in docs/_build/html/index.html
. As an example for the docstring formatting you can look at the documentation of covid19_npis.model.disease_spread()
. We try to use the numpydoc style.
Debugging¶
This is a small list of debug code snippets.
Debugging nans with tensorflow¶
It is a little problematic, because some nans occur during the runtime without being an error. Often these are cases where an operation has different implementations based on the value of the input, because it would otherwise lead to a loss of precision.
Therefore we wrote some patches, which put try-except blocks around these code parts and if a error occurs, disable check_numeric for this part.
For patching tensorflow_probability (replace the variables by the correct path):
cd scripts/debugging_patches
patch -d {$CONDA_PATH}/envs/{$ENVIRONMENT_NAME}/ -p 0 < filter_nan_errors1.patch
patch -d {$CONDA_PATH}/envs/{$ENVIRONMENT_NAME}/ -p 0 < filter_nan_errors2.patch
patch -d {$CONDA_PATH}/envs/{$ENVIRONMENT_NAME}/ -p 0 < filter_nan_errors3.patch
And then uncomment these line of codes in the run_script. Check_numerics has to enabled only before the optimization, not before the initial sampling, because the nan occuring during the sampling of the gamma distribution hasn’t been patched.
tf.config.run_functions_eagerly(True)
tf.debugging.enable_check_numerics(stack_height_limit=30, path_length_limit=50)
For debugging the VI, it is reasonable to increase the step size, to run more quickly into errors
Basic usage of logger¶
# Change to debug mode i.e all log.debug is printed
logging.basicConfig(level=logging.DEBUG)
# Use log.debug instead of print
log.debug(f"My var {var}")
Force cpu or other device¶
my_devices = tf.config.experimental.list_physical_devices(device_type="CPU")
tf.config.experimental.set_visible_devices(devices=my_devices, device_type="CPU")
tf.config.set_visible_devices([], "GPU")
How to build a dataset for our model¶
To use our model you may want to create your own dataset. In the following we try to guide you through the process of creating your own dataset. Feel free to take a look into our script. we use to create our dataset.
We use a hierarchical for our data as for our model. To add new country or region to our model we first create a folder containing the data.
mkdir test_country
Next we create a config.json file inside this folder. The json has to contain a unique name for the country/region and the age group brackets. You can add any number of age groups, the name of the groups should be the same across all countries though! We use four different age groups for most of our analysis as follows.
{
"name": "test_country",
"age_groups": {
"age_group_0" : [0,34],
"age_group_1" : [35,59],
"age_group_2" : [60,79],
"age_group_3" : [80,100]
}
}
- config.json, dict:
name : “country_name”
- age_groupsdict
“column_name” : [age_lower, age_upper]
Population data¶
Each dataset for a country/region needs to contain population data for every age from 0 to 100. The data should be saved as population.csv! Most of the population data can be found on the UN website.
age |
PopTotal |
---|---|
0 |
831175 |
1 |
312190 |
… |
… |
Age column named “age”
Column Number of people per age named “PopTotal”
New cases/ Positive tests data¶
We supply the number of positive tested persons per day and age group as a csv file for our country/region. The file has to be named “new_cases.csv” and has to contain the same column names as defined in the config.json! That is the age groups. Date Format should be “%d.%m.%y”.
date |
age_group_0 |
age_group_1 |
age_group_2 |
age_group_3 |
---|---|---|---|---|
01.01.20 |
103 |
110 |
13 |
130 |
02.01.20 |
103 |
103 |
103 |
103 |
… |
… |
… |
… |
… |
Time/Date column has to be named “date” or “time”
Age group columns have to be named consistent between different data and countries!
Total tests data¶
The number of total tests performed per day in the country/region is also supplied as a csv file called “tests.csv”. The format should be as follows:
date |
tests |
---|---|
01.01.20 |
10323 |
02.01.20 |
13032 |
… |
… |
Time/Date column has to be named “date” or “time”
Daily performed tests column with name “tests”
Number of deaths data¶
The number of deaths per day in the country/region also supplied as csv file nameed “deaths.csv”.
date |
deaths |
---|---|
01.01.20 |
10 |
02.01.20 |
35 |
… |
… |
Time/Date column has to be named “date” or “time”
Daily deaths column has to be named “deaths”
Optional(not working yet): Daily deaths per age group same column names as in new_cases
Interventions data¶
The intervention is also added as csv file. The file has to be named “interventions.csv” and can contain any number of interventions. We use the the oxford response tracker for this purpose, but you can also construct your own time series.
You can call/name the interventions whatever you like. The index should be an integer though.
date |
school_closing |
cancel_events |
curfew |
… |
---|---|---|---|---|
01.01.20 |
1 |
0 |
0 |
… |
02.01.20 |
1 |
0 |
0 |
… |
03.01.20 |
1 |
2 |
3 |
… |
04.01.20 |
2 |
2 |
3 |
… |
05.01.20 |
2 |
1 |
0 |
… |
… |
… |
… |
… |
… |
Time/Date column has to be named “date” or “time”
Different intervention as additional columns with intervention name as column name