How to build a dataset for our model

To use our model you may want to create your own dataset. In the following we try to guide you through the process of creating your own dataset. Feel free to take a look into our script. we use to create our dataset.

We use a hierarchical for our data as for our model. To add new country or region to our model we first create a folder containing the data.

mkdir test_country

Next we create a config.json file inside this folder. The json has to contain a unique name for the country/region and the age group brackets. You can add any number of age groups, the name of the groups should be the same across all countries though! We use four different age groups for most of our analysis as follows.

{
    "name": "test_country",
    "age_groups": {
            "age_group_0" : [0,34],
            "age_group_1" : [35,59],
            "age_group_2" : [60,79],
            "age_group_3" : [80,100]
    }
}
  • config.json, dict:
    • name : “country_name”

    • age_groupsdict
      • “column_name” : [age_lower, age_upper]

Population data

Each dataset for a country/region needs to contain population data for every age from 0 to 100. The data should be saved as population.csv! Most of the population data can be found on the UN website.

age

PopTotal

0

831175

1

312190

  • Age column named “age”

  • Column Number of people per age named “PopTotal”

New cases/ Positive tests data

We supply the number of positive tested persons per day and age group as a csv file for our country/region. The file has to be named “new_cases.csv” and has to contain the same column names as defined in the config.json! That is the age groups. Date Format should be “%d.%m.%y”.

date

age_group_0

age_group_1

age_group_2

age_group_3

01.01.20

103

110

13

130

02.01.20

103

103

103

103

  • Time/Date column has to be named “date” or “time”

  • Age group columns have to be named consistent between different data and countries!

Total tests data

The number of total tests performed per day in the country/region is also supplied as a csv file called “tests.csv”. The format should be as follows:

date

tests

01.01.20

10323

02.01.20

13032

  • Time/Date column has to be named “date” or “time”

  • Daily performed tests column with name “tests”

Number of deaths data

The number of deaths per day in the country/region also supplied as csv file nameed “deaths.csv”.

date

deaths

01.01.20

10

02.01.20

35

  • Time/Date column has to be named “date” or “time”

  • Daily deaths column has to be named “deaths”

  • Optional(not working yet): Daily deaths per age group same column names as in new_cases

Interventions data

The intervention is also added as csv file. The file has to be named “interventions.csv” and can contain any number of interventions. We use the the oxford response tracker for this purpose, but you can also construct your own time series.

You can call/name the interventions whatever you like. The index should be an integer though.

date

school_closing

cancel_events

curfew

01.01.20

1

0

0

02.01.20

1

0

0

03.01.20

1

2

3

04.01.20

2

2

3

05.01.20

2

1

0

  • Time/Date column has to be named “date” or “time”

  • Different intervention as additional columns with intervention name as column name