9  COVID-19 Reports

Author

Thieu Nguyen

9.0.1 Introduction

Use the SyntheaMass data and rerun the code from this website to practice analyzing healthcare data using Python. I modified some code blocks to handle analysis involving several related tables in the healthcare database.

In this post, we will learn how to:

  • Define specific groups of patients, such as those who contracted COVID-19, were hospitalized, admitted to the ICU, placed on ventilators, and more

  • Calculate age, age at death, time points of COVID-19 events, symptom timelines, and hospital stay lengths

  • Aggregate metrics such as counts, means, and sums of symptoms and COVID-19 outcomes

  • Explore lab values Plot key metrics

  • Use Python functions to simplify code blocks in the report by calling external .py files such as analysis.py

9.0.2 Loading necessary python packages and Data

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import seaborn as sns
import datetime
# Ensure all columns are shown
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

Load the SyntheaMass data from the CSV files.

# Load the data
conditions = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/conditions.csv")
patients = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/patients.csv")
observations = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/observations.csv")
care_plans = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/careplans.csv")
encounters = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/encounters.csv")
devices = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/devices.csv")
supplies = pd.read_csv('/Users/nnthieu/SyntheaData/SyntheaCovid19/supplies.csv')
procedures = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/procedures.csv")
medications = pd.read_csv("/Users/nnthieu/SyntheaData/SyntheaCovid19/medications.csv")

9.0.3 Define COVID-19 Patients

covid_patient_ids = conditions[conditions.CODE == 840539006].PATIENT.unique()
inpatient_ids = encounters[(encounters.REASONCODE == 840539006) & (encounters.CODE == 1505002)].PATIENT.unique()
isolation_ids = care_plans[(care_plans.CODE == 736376001) & (care_plans.REASONCODE == 840539006)].PATIENT.unique()
icu_patient_ids = encounters[(encounters.REASONCODE == 840539006) & (encounters.CODE == 305351004)].PATIENT.unique()
vent_ids = procedures[procedures.CODE == 26763009].PATIENT.unique()
deceased_ids = patients[patients.DEATHDATE.notna()].Id.unique()
negative_covid_patient_ids = observations[(observations.CODE == '94531-1') & 
                                    (observations.VALUE == 'Not detected (qualifier value)')].PATIENT.unique()
completed_isolation_patients = care_plans[(care_plans.CODE == 736376001) & (care_plans.STOP.notna()) & 
                                          (care_plans.REASONCODE == 840539006)].PATIENT.unique()
survivor_ids = np.union1d(completed_isolation_patients, negative_covid_patient_ids)

# Filter patients who have COVID-19
covid_patients = patients[patients.Id.isin(covid_patient_ids)]  # Correct column name to 'Id'
len(covid_patient_ids)
88166

The number of inpatient survivors

np.intersect1d(inpatient_ids, survivor_ids).shape
(14654,)

The number of inpatient non-survivors

np.intersect1d(inpatient_ids, deceased_ids).shape[0]
3548

Mortality rate of covid-19 patients

round((np.intersect1d(covid_patient_ids, deceased_ids).shape[0]/len(covid_patient_ids))* 100, 2)
4.13

Prepare a DataFrame with the number of patients in each category

# 1. Find the earliest COVID-19 start date
covid_start_date = pd.to_datetime(conditions[conditions['DESCRIPTION'] == 'COVID-19']['START'].min())

# 2. Set the maximum allowed COVID-19 date (80 days later)
covid_max_date = covid_start_date + pd.Timedelta(days=80)

# Ensure the 'START' column is in datetime format
conditions['START'] = pd.to_datetime(conditions['START'], errors='coerce')
observations['DATE'] = pd.to_datetime(observations['DATE'], errors='coerce')

# 3. Create a boolean mask for rows within the COVID-19 date range and for COVID-19 diagnosis
covid_date_mask = (conditions['START'] >= covid_start_date) & (conditions['START'] <= covid_max_date)

# 4. Filter `conditions` and `observations` based on the date range
covid_cons = conditions[covid_date_mask]
covid_obs = observations[
    (observations['DATE'] >= covid_start_date) & (observations['DATE'] <= covid_max_date)
]

# 5. Display first few rows
print(covid_cons.head(3))
covid_cons.shape
       START        STOP                               PATIENT  \
2 2020-03-01  2020-03-30  1ff7f10f-a204-4bb1-aa72-dd763fa99482   
3 2020-03-01  2020-03-01  1ff7f10f-a204-4bb1-aa72-dd763fa99482   
4 2020-03-01  2020-03-30  1ff7f10f-a204-4bb1-aa72-dd763fa99482   

                              ENCOUNTER       CODE         DESCRIPTION  
2  60584c6a-e26b-4176-a660-86ad84c7ceca  386661006     Fever (finding)  
3  60584c6a-e26b-4176-a660-86ad84c7ceca  840544004  Suspected COVID-19  
4  60584c6a-e26b-4176-a660-86ad84c7ceca  840539006            COVID-19  
(654569, 6)

9.0.4 Health outcomes experienced by COVID-19 patients

The following table shows different health outcomes that were experienced by COVID-19 patients during the course of the disease.

File analysis.py is written and saved in a local working folder.

write a function in analysis.py like import analysis analysis.outcome_table(inpatient_ids, survivor_ids, deceased_patients, covid_cons) to create a table as: outcome total percent of inpatient survivors percent survivors non survivors percent non survivors with outcome are values of covid_condition.DESCRIPTION containing (disorder)

def outcome_table(inpatient_ids, survivor_ids, deceased_ids, covid_cons):
    # Filter conditions containing "(disorder)" in the DESCRIPTION
    disorder_conditions = covid_cons[
        (covid_cons['DESCRIPTION'].str.contains(r'\(disorder\)', na=False)) &
        (~covid_cons['DESCRIPTION'].str.contains(
            r'pneumonia|hypoxemia|chronic|Hypertriglyceridemia|Hyperlipidemia|Hypercholesterolemia|Metabolic|osteoporosis|Alzheimer|Anemia|sinusitis|thrombosis|embolism|pharyngitis|bronchitis', case=False, na=False))
        ]

    # Initialize the result table
    table_rows = []

    for outcome in disorder_conditions['DESCRIPTION'].unique():
        # Filter patients with the specific outcome
        outcome_patients = disorder_conditions[disorder_conditions['DESCRIPTION']
                                               == outcome]['PATIENT'].unique()

        # Calculate metrics
        total = len(outcome_patients)
        inpatient = len(set(outcome_patients) & set(inpatient_ids))
        survivors = len(set(outcome_patients) & set(survivor_ids))
        non_survivors = len(set(outcome_patients) & set(deceased_ids))

        percent_inpatient = (inpatient / len(inpatient_ids)) * \
            100 if len(inpatient_ids) > 0 else 0
        percent_survivors = (survivors / len(survivor_ids)) * \
            100 if len(survivor_ids) > 0 else 0
        percent_non_survivors = (
            non_survivors / len(deceased_ids)) * 100 if len(deceased_ids) > 0 else 0

        # Append row to the table
        table_rows.append({
            'outcome': outcome,
            'total': total,
            'percent of inpatient': round(percent_inpatient, 2),
            'survivors': survivors,
            'percent survivors': round(percent_survivors, 2),
            'non survivors': non_survivors,
            'percent non survivors': round(percent_non_survivors, 2)
        })

    # Convert to DataFrame
    outcome_table = pd.DataFrame(table_rows).head(7)
    outcome_table = outcome_table.sort_values(by='total', ascending=False)
    outcome_table['outcome'] = outcome_table['outcome'].str.replace(
    r'\(disorder\)', '', regex=True).str.strip()

    return outcome_table

outcome_table(inpatient_ids, survivor_ids, deceased_ids, covid_cons)
outcome total percent of inpatient survivors percent survivors non survivors percent non survivors
1 Acute respiratory failure 8701 47.87 5237 5.99 3473 14.38
0 Sepsis caused by virus 6941 38.19 3419 3.91 3526 14.60
6 Acute respiratory distress syndrome 2400 13.20 85 0.10 2315 9.59
2 Septic shock 1746 9.61 0 0.00 1746 7.23
5 Heart failure 1431 7.87 122 0.14 1309 5.42
4 Injury of heart 1288 7.09 20 0.02 1268 5.25
3 Injury of kidney 1252 6.89 8 0.01 1244 5.15

9.0.5 Outcomes for all COVID-19 Patients

This code builds a new DataFrame for the purposes of display. The DataFrame contains the percentages of patients that experience a particular outcome. Percentages are then provided for only hospitalized patients, ICU admitted patients and ventilated patients.

covid_conditions = conditions[conditions.CODE == 840539006]

cp = covid_conditions.merge(patients, how='left', left_on='PATIENT', right_on='Id')
isolation_ids = care_plans[(care_plans.CODE == 736376001) & (care_plans.REASONCODE == 840539006)].PATIENT
cp['isolation'] = cp.Id.isin(isolation_ids)
cp['admit'] = cp.Id.isin(inpatient_ids)
cp['recovered'] = cp.Id.isin(survivor_ids)
cp['death'] = cp.DEATHDATE.notna()
icu_ids = encounters[encounters.CODE == 305351004].PATIENT
cp['icu_admit'] = cp.Id.isin(icu_ids)
vent_ids = procedures[procedures.CODE == 26763009].PATIENT
cp['ventilated'] = cp.Id.isin(vent_ids)
# Recode the boolean columns from True/False to text to read easily
hospitalized = (cp.admit == True)
icu = (cp.icu_admit == True)
vent = (cp.ventilated == True)
covid_count = cp.Id.size
row_filters = {'Home Isolation': (cp.isolation == True), 
               'Hospital Admission': hospitalized, 
               'ICU Admission': icu, 
               'Ventilated': vent, 
               'Recovered': (cp.recovered == True), 
               'Death': (cp.death == True)}

table_rows = []
for category, row_filter in row_filters.items():
    row = {'Outcome': category}
    row['All Patients'] = round(cp[row_filter].Id.size / covid_count, 2)
    row['Hospitalized'] = round(cp[row_filter & hospitalized].Id.size / hospitalized.value_counts()[True], 2)
    row['ICU Admitted'] = round(cp[row_filter & icu].Id.size / icu.value_counts()[True], 2)
    row['Ventilated'] = round(cp[row_filter & vent].Id.size / vent.value_counts()[True], 2)
    table_rows.append(row)
    
pd.DataFrame.from_records(table_rows)
Outcome All Patients Hospitalized ICU Admitted Ventilated
0 Home Isolation 0.80 0.03 0.03 0.03
1 Hospital Admission 0.21 1.00 1.00 1.00
2 ICU Admission 0.04 0.20 1.00 1.00
3 Ventilated 0.03 0.16 0.79 1.00
4 Recovered 0.96 0.81 0.32 0.15
5 Death 0.04 0.20 0.68 0.85

9.0.6 Outcomes for ICU Admitted Patients

Essentially a sub table from above, looking only at ICU patients.

icu_only = cp[cp.icu_admit == True]
vent = (icu_only.ventilated == True)
covid_count = icu_only.Id.size
row_filters = {'Ventilated': vent, 
               'Recovered': (icu_only.recovered == True), 
               'Death': (icu_only.death == True)}

table_rows = []
for category, row_filter in row_filters.items():
    row = {'Outcome': category}
    row['ICU Admitted'] = round(icu_only[row_filter].Id.size / covid_count, 2)
    row['Ventilated'] = round(icu_only[row_filter & vent].Id.size / vent.value_counts()[True], 2)
   
    table_rows.append(row)
    
pd.DataFrame.from_records(table_rows)
Outcome ICU Admitted Ventilated
0 Ventilated 0.79 1.00
1 Recovered 0.32 0.15
2 Death 0.68 0.85

Start to build a DataFrame that we can use to look at other conditions in relation to COVID-19

covid_info = cp[['PATIENT', 'recovered', 'death', 'START', 'DEATHDATE', 'BIRTHDATE', 'GENDER', 'admit', 'icu_admit']]
covid_info = covid_info.rename(columns={'START': 'covid_start'})
covid_info.head(2)
PATIENT recovered death covid_start DEATHDATE BIRTHDATE GENDER admit icu_admit
0 1ff7f10f-a204-4bb1-aa72-dd763fa99482 True False 2020-03-01 NaN 2017-08-24 M False False
1 9bcf6ed5-d808-44af-98a0-7d78a29ede72 True False 2020-03-13 NaN 2016-08-01 F False False

Grab all of the conditions starting after January 20, 2020. This is a hack to get only conditions that are related to COVID-19. We will end up merging these with the COVID patients.

covid_related_conditions = conditions[pd.to_datetime(conditions.START) > pd.to_datetime('2020-01-20')]

This DataFrame will contain all conditions for COVID-19 patients, where START can be compared to covid_start to see how long after the COVID-19 diagnosis something happened.

covid_patient_conditions = covid_info.merge(covid_related_conditions, on='PATIENT')
covid_patient_conditions.head(2)
PATIENT recovered death covid_start DEATHDATE BIRTHDATE GENDER admit icu_admit START STOP ENCOUNTER CODE DESCRIPTION
0 1ff7f10f-a204-4bb1-aa72-dd763fa99482 True False 2020-03-01 NaN 2017-08-24 M False False 2020-03-01 2020-03-30 60584c6a-e26b-4176-a660-86ad84c7ceca 386661006 Fever (finding)
1 1ff7f10f-a204-4bb1-aa72-dd763fa99482 True False 2020-03-01 NaN 2017-08-24 M False False 2020-03-01 2020-03-01 60584c6a-e26b-4176-a660-86ad84c7ceca 840544004 Suspected COVID-19

9.0.7 Lab values for COVID-19 patients

The following code presents lab values taken for COVID-19 patients. Values are separated into survivors and non survivors. The first block of code selects lab values of interest from all observations in the simulation.

lab_obs = observations[(observations.CODE == '48065-7') | (observations.CODE == '26881-3') | 
                          (observations.CODE == '2276-4') | (observations.CODE == '89579-7') |
                           (observations.CODE == '731-0') | (observations.CODE == '14804-9')
                      ]
lab_obs.head(3)
DATE PATIENT ENCOUNTER CODE DESCRIPTION VALUE UNITS TYPE
254 2020-02-19 bd1c4ffc-7f1d-4590-adbb-1d6533fb623e b7455838-3607-47f4-aaa5-fd89abea7d29 731-0 Lymphocytes [#/volume] in Blood by Automated c... 1.1 10*3/uL numeric
273 2020-02-19 bd1c4ffc-7f1d-4590-adbb-1d6533fb623e b7455838-3607-47f4-aaa5-fd89abea7d29 48065-7 Fibrin D-dimer FEU [Mass/volume] in Platelet p... 0.4 ug/mL numeric
274 2020-02-19 bd1c4ffc-7f1d-4590-adbb-1d6533fb623e b7455838-3607-47f4-aaa5-fd89abea7d29 2276-4 Ferritin [Mass/volume] in Serum or Plasma 332.4 ug/L numeric

Select COVID-19 conditions out of all conditions in the simulation

covid_conditions = conditions[conditions.CODE == 840539006]
covid_patients = covid_conditions.merge(patients, how='left', left_on='PATIENT', right_on='Id')
covid_patients['survivor'] = covid_patients.PATIENT.isin(survivor_ids)
covid_patients = covid_patients[['START', 'PATIENT', 'survivor', 'CODE']]

Calculate attributes needed to support the plot. Also coerce all lab values into a numeric data type.

covid_patients_obs = covid_patients.merge(lab_obs, on='PATIENT')
covid_patients_obs['START'] = pd.to_datetime(covid_patients_obs.START)
covid_patients_obs['DATE'] = pd.to_datetime(covid_patients_obs.DATE)
covid_patients_obs['lab_days'] = covid_patients_obs.DATE - covid_patients_obs.START
covid_patients_obs['days'] = covid_patients_obs.lab_days / np.timedelta64(1, 'D')
covid_patients_obs['VALUE'] = pd.to_numeric(covid_patients_obs['VALUE'], errors='coerce')
loinc_to_display = {'CODE_y = 48065-7': 'D-dimer', 'CODE_y = 2276-4': 'Serum Ferritin',
                    'CODE_y = 89579-7': 'High Sensitivity Cardiac Troponin I',
                    'CODE_y = 26881-3': 'IL-6', 'CODE_y = 731-0': 'Lymphocytes',
                    'CODE_y = 14804-9': 'Lactate dehydrogenase'}
catplt = sns.catplot(x="days", y="VALUE", hue="survivor", kind="box", col='CODE_y', 
            col_wrap=2, sharey=False, sharex=False, data=covid_patients_obs, palette=["C1", "C0"])

# Set the title for the entire figure
catplt.fig.suptitle("Labs Values of Covid-19 Patients", y=1.02)

for axis in catplt.fig.axes:
    axis.xaxis.set_major_formatter(ticker.FormatStrFormatter('%d'))
    axis.xaxis.set_major_locator(ticker.MultipleLocator(base=4))
    axis.set_title(loinc_to_display[axis.get_title()])
        
plt.show()

Line plot

loinc_to_display = {'CODE_y = 48065-7': 'D-dimer', 'CODE_y = 2276-4': 'Serum Ferritin',
                    'CODE_y = 89579-7': 'High Sensitivity Cardiac Troponin I',
                    'CODE_y = 26881-3': 'IL-6', 'CODE_y = 731-0': 'Lymphocytes',
                    'CODE_y = 14804-9': 'Lactate dehydrogenase'}
g = sns.FacetGrid(covid_patients_obs, col="CODE_y", col_wrap=2, sharey=False, height=3, aspect=2)
g = g.map_dataframe(sns.lineplot, x="days", y="VALUE", marker=".", hue="survivor", palette=["C1", "C0"], alpha=0.5)
g.fig.suptitle("Labs Values of Covid-19 Patients", y=1.02)

for axis in g.axes.flat:
    axis.xaxis.set_major_formatter(ticker.FormatStrFormatter('%d'))
    axis.xaxis.set_major_locator(ticker.MultipleLocator(base=4))
    title = axis.get_title().replace("DESCRIPTION = ", "").strip()
    if title in loinc_to_display:
        axis.set_title(loinc_to_display[title])
    else:
        axis.set_title(title)  
  
plt.show()

Line plot with a different style

loinc_to_display = {'CODE_y = 48065-7': 'D-dimer', 'CODE_y = 2276-4': 'Serum Ferritin',
                    'CODE_y = 89579-7': 'High Sensitivity Cardiac Troponin I',
                    'CODE_y = 26881-3': 'IL-6', 'CODE_y = 731-0': 'Lymphocytes',
                    'CODE_y = 14804-9': 'Lactate dehydrogenase'}
catplt = sns.catplot(x="days", y="VALUE", hue="survivor", kind="point", col='CODE_y', 
            col_wrap=2, sharey=False, sharex=False, data=covid_patients_obs, palette=["C1", "C0"])


# Set the title for the entire figure
catplt.fig.suptitle("Labs Values of Covid-19 Patients", y=1.02)

for axis in catplt.fig.axes:
    axis.xaxis.set_major_formatter(ticker.FormatStrFormatter('%d'))
    axis.xaxis.set_major_locator(ticker.MultipleLocator(base=4))
    axis.set_title(loinc_to_display[axis.title.get_text()])

plt.show()

Set up a new DataFrame with boolean columns representing various outcomes, like admit, recovery or death

First, create patient Ids for ‘isolation’, ‘admitted’, ‘icu_admitted’, ‘ventilated’, ‘recovered’ and ‘death’, then create related columns in the covid_patient_ids table.

cp = covid_conditions.merge(patients, how='left', left_on='PATIENT', right_on='Id')
isolation_ids = care_plans[(care_plans.CODE == 736376001) & (care_plans.REASONCODE == 840539006)].PATIENT

cp['isolation'] = cp.Id.isin(isolation_ids)
cp['admit'] = cp.Id.isin(inpatient_ids)
cp['recovered'] = cp.Id.isin(survivor_ids)
cp['death'] = cp.DEATHDATE.notna()
icu_ids = encounters[encounters.CODE == 305351004].PATIENT
cp['icu_admit'] = cp.Id.isin(icu_ids)
vent_ids = procedures[procedures.CODE == 26763009].PATIENT
cp['ventilated'] = cp.Id.isin(vent_ids)
cp.head(2)
START STOP PATIENT ENCOUNTER CODE DESCRIPTION Id BIRTHDATE DEATHDATE SSN DRIVERS PASSPORT PREFIX FIRST LAST SUFFIX MAIDEN MARITAL RACE ETHNICITY GENDER BIRTHPLACE ADDRESS CITY STATE COUNTY ZIP LAT LON HEALTHCARE_EXPENSES HEALTHCARE_COVERAGE isolation admit recovered death icu_admit ventilated
0 2020-03-01 2020-03-30 1ff7f10f-a204-4bb1-aa72-dd763fa99482 60584c6a-e26b-4176-a660-86ad84c7ceca 840539006 COVID-19 1ff7f10f-a204-4bb1-aa72-dd763fa99482 2017-08-24 NaN 999-68-6630 NaN NaN NaN Jacinto644 Kris249 NaN NaN NaN white nonhispanic M Beverly Massachusetts US 888 Hickle Ferry Suite 38 Springfield Massachusetts Hampden County 1106.0 42.151961 -72.598959 8446.49 1499.08 True False True False False False
1 2020-03-13 2020-04-14 9bcf6ed5-d808-44af-98a0-7d78a29ede72 a78e78d9-33bb-40bc-9e42-e47ff7e910cc 840539006 COVID-19 9bcf6ed5-d808-44af-98a0-7d78a29ede72 2016-08-01 NaN 999-15-5895 NaN NaN NaN Alva958 Krajcik437 NaN NaN NaN white nonhispanic F Boston Massachusetts US 1048 Skiles Trailer Walpole Massachusetts Norfolk County 2081.0 42.177370 -71.281353 94568.40 1870.72 True False True False False False

9.0.8 Symptoms for all COVID-19 Patients

Generates a DataFrame with percentages of co-occurring conditions

icu_patient_ids = encounters[(encounters.REASONCODE == 840539006) & (encounters.CODE == 305351004)].PATIENT
icu_patient_ids.shape
(3677,)
def symptom_table(inpatient_ids, survivor_ids, deceased_ids, covid_cons):
    # Filter conditions containing "(disorder)" in the DESCRIPTION
    disorder_conditions = covid_cons[
        (covid_cons['DESCRIPTION'].str.contains(r'\(finding\)', na=False)) &
        (~covid_cons['DESCRIPTION'].str.contains(
            r'pneumonia|hypoxemia|chronic|Hypertriglyceridemia|Hyperlipidemia|Hypercholesterolemia|Metabolic|osteoporosis|Alzheimer|Anemia|sinusitis|thrombosis|embolism|pharyngitis|bronchitis', case=False, na=False))
        ]

    # Initialize the result table
    table_rows = []

    for outcome in disorder_conditions['DESCRIPTION'].unique():
        # Filter patients with the specific outcome
        outcome_patients = disorder_conditions[disorder_conditions['DESCRIPTION']
                                               == outcome]['PATIENT'].unique()

        # Calculate metrics
        total = len(outcome_patients)
        inpatient = len(set(outcome_patients) & set(inpatient_ids))
        survivors = len(set(outcome_patients) & set(survivor_ids))
        non_survivors = len(set(outcome_patients) & set(deceased_ids))

        percent_inpatient = (inpatient / len(inpatient_ids)) * \
            100 if len(inpatient_ids) > 0 else 0
        percent_survivors = (survivors / len(survivor_ids)) * \
            100 if len(survivor_ids) > 0 else 0
        percent_non_survivors = (
            non_survivors / len(deceased_ids)) * 100 if len(deceased_ids) > 0 else 0

        # Append row to the table
        table_rows.append({
            'outcome': outcome,
            'total': total,
            'percent of inpatient': round(percent_inpatient, 2),
            'survivors': survivors,
            'percent survivors': round(percent_survivors, 2),
            'non survivors': non_survivors,
            'percent non survivors': round(percent_non_survivors, 2)
        })

    # Convert to DataFrame
    outcome_table = pd.DataFrame(table_rows).head(10)
    outcome_table = outcome_table.sort_values(by='total', ascending=False)
    outcome_table['outcome'] = outcome_table['outcome'].str.replace(
    r'\(finding\)', '', regex=True).str.strip()

    return outcome_table

symptom_table(inpatient_ids, survivor_ids, deceased_ids, covid_cons)
outcome total percent of inpatient survivors percent survivors non survivors percent non survivors
0 Fever 80865 92.07 77629 88.73 3317 13.73
1 Cough 61710 70.81 59170 67.63 2601 10.77
4 Loss of taste 46121 51.25 44299 50.64 1868 7.73
8 Fatigue 34951 39.86 33551 38.35 1432 5.93
2 Sputum finding 30653 34.78 29418 33.63 1264 5.23
7 Respiratory distress 18177 100.00 14654 16.75 3548 14.69
5 Dyspnea 17939 37.86 16621 19.00 1339 5.54
6 Wheezing 17939 37.86 16621 19.00 1339 5.54
3 Diarrhea symptom 3521 5.80 3319 3.79 208 0.86
9 Hemoptysis 874 2.56 781 0.89 95 0.39

9.0.9 Symptoms for ICU-admitted COVID-19 Patients

covid_conditions = covid_cons
icu_patient_ids = encounters[(encounters.REASONCODE == 840539006) & (encounters.CODE == 305351004)].PATIENT
icu_patients = covid_conditions[covid_conditions.PATIENT.isin(icu_patient_ids)]
len(icu_patient_ids)
3677
def calculate_symptom_stats(icu_patient_ids, survivor_ids, deceased_ids, covid_conditions):
    # Filter conditions for "(finding)" only
    finding_conditions = covid_conditions[
        covid_conditions['DESCRIPTION'].str.contains(r'\(finding\)', na=False)
    ]
    
    # Ensure there are symptoms to process
    if finding_conditions.empty:
        print("No symptoms found with '(finding)' in the description.")
        return pd.DataFrame(columns=[
            'symptom', 'icu_count', 'percent_icu',
            'survivor_count', 'percent_survivor',
            'deceased_count', 'percent_deceased'
        ])
    
    # Convert inputs to sets for faster lookup
    icu_set = set(icu_patient_ids)
    survivor_set = set(survivor_ids)
    deceased_set = set(deceased_ids)

    table_rows = []

    for symptom in finding_conditions['DESCRIPTION'].unique():
        # Get all patients with the symptom
        symptom_patients = finding_conditions[
            finding_conditions['DESCRIPTION'] == symptom
        ]['PATIENT'].unique()

        symptom_patients_set = set(symptom_patients)

        # Intersections
        icu_with_symptom = symptom_patients_set & icu_set
        survivors_with_symptom = icu_with_symptom & survivor_set
        deceased_with_symptom = icu_with_symptom & deceased_set

        icu_count = len(icu_with_symptom)
        survivor_count = len(survivors_with_symptom)
        deceased_count = len(deceased_with_symptom)

        # Safe percentage calculations
        percent_icu = (icu_count / len(icu_patient_ids)) * 100 if len(icu_patient_ids) > 0 else 0
        percent_survivor = (survivor_count / len(survivor_ids)) * 100 if len(survivor_ids) > 0 else 0
        percent_deceased = (deceased_count / len(deceased_ids)) * 100 if len(deceased_ids) > 0 else 0

        table_rows.append({
            'symptom': symptom.replace('(finding)', '').strip(),
            'icu_count': icu_count,
            'percent_icu': round(percent_icu, 2),
            'survivor_count': survivor_count,
            'percent_survivor': round(percent_survivor, 2),
            'deceased_count': deceased_count,
            'percent_deceased': round(percent_deceased, 2)
        })

    # Create and sort the DataFrame
    df = pd.DataFrame(table_rows)
    if df.empty:
        print("No matching symptoms found for ICU patients.")
        return pd.DataFrame(columns=[
            'symptom', 'icu_count', 'percent_icu',
            'survivor_count', 'percent_survivor',
            'deceased_count', 'percent_deceased'
        ])
    
    df = df.sort_values(by='icu_count', ascending=False).head(10)

    return df

# Call the function
calculate_symptom_stats(icu_patient_ids, survivor_ids, deceased_ids, covid_conditions)
symptom icu_count percent_icu survivor_count percent_survivor deceased_count percent_deceased
7 Respiratory distress 3677 100.00 1180 1.35 2498 10.34
0 Fever 3345 90.97 1068 1.22 2278 9.43
1 Cough 2637 71.72 853 0.98 1785 7.39
4 Loss of taste 1940 52.76 637 0.73 1303 5.40
8 Fatigue 1482 40.30 498 0.57 985 4.08
5 Dyspnea 1381 37.56 466 0.53 916 3.79
6 Wheezing 1381 37.56 466 0.53 916 3.79
2 Sputum finding 1271 34.57 401 0.46 870 3.60
11 Muscle pain 656 17.84 205 0.23 451 1.87
12 Joint pain 656 17.84 205 0.23 451 1.87

Create a DataFrame with columns that show a condition’s start and end in days relative to COVID-19 diagnosis. Also create a column that calculates the number of days between COVID-19 diagnosis and a person’s death.

covid_patient_conditions['start_days'] = (pd.to_datetime(covid_patient_conditions.START) - pd.to_datetime(covid_patient_conditions.covid_start)) / np.timedelta64(1, 'D')
covid_patient_conditions['end_days'] = (pd.to_datetime(covid_patient_conditions.STOP) - pd.to_datetime(covid_patient_conditions.covid_start)) / np.timedelta64(1, 'D')
covid_patient_conditions['death_days'] = (pd.to_datetime(covid_patient_conditions.DEATHDATE) - pd.to_datetime(covid_patient_conditions.covid_start)) / np.timedelta64(1, 'D')

covid_patient_conditions = covid_patient_conditions[covid_patient_conditions['DESCRIPTION'].str.contains(r'\(finding\)', na=False)]
covid_patient_conditions.head(2)
PATIENT recovered death covid_start DEATHDATE BIRTHDATE GENDER admit icu_admit START STOP ENCOUNTER CODE DESCRIPTION start_days end_days death_days
0 1ff7f10f-a204-4bb1-aa72-dd763fa99482 True False 2020-03-01 NaN 2017-08-24 M False False 2020-03-01 2020-03-30 60584c6a-e26b-4176-a660-86ad84c7ceca 386661006 Fever (finding) 0.0 29.0 NaN
4 9bcf6ed5-d808-44af-98a0-7d78a29ede72 True False 2020-03-13 NaN 2016-08-01 F False False 2020-03-13 2020-04-14 a78e78d9-33bb-40bc-9e42-e47ff7e910cc 49727002 Cough (finding) 0.0 32.0 NaN
# Step 1 & 2: Filter and remove duplicate patient-condition pairs
filtered = covid_patient_conditions[
    (covid_patient_conditions['death'] == False) &
    (covid_patient_conditions['icu_admit'] == True)
][['PATIENT', 'DESCRIPTION']].drop_duplicates()

# Step 3: Group by condition and count unique patients
df_survivor = filtered.groupby('DESCRIPTION') \
    .agg(survivor_count=('PATIENT', 'nunique')) \
    .reset_index()

# Step 4: Add percentage column
total_unique_patients = filtered['PATIENT'].nunique()
df_survivor['survivor_percent'] = round((df_survivor['survivor_count'] / total_unique_patients) * 100, 2)

# Step 5: Sort and take top 10
df_survivor = df_survivor.sort_values(by='survivor_count', ascending=False).head(15)
df_survivor
DESCRIPTION survivor_count survivor_percent
15 Respiratory distress (finding) 1179 100.00
6 Fever (finding) 1067 90.50
2 Cough (finding) 852 72.26
10 Loss of taste (finding) 637 54.03
5 Fatigue (finding) 497 42.15
19 Wheezing (finding) 465 39.44
4 Dyspnea (finding) 465 39.44
17 Sputum finding (finding) 401 34.01
9 Joint pain (finding) 205 17.39
11 Muscle pain (finding) 205 17.39
1 Chill (finding) 171 14.50
7 Headache (finding) 166 14.08
16 Sore throat symptom (finding) 139 11.79
13 Nausea (finding) 82 6.96
18 Vomiting symptom (finding) 82 6.96
# Step 1 & 2: Filter and remove duplicate patient-condition pairs
filtered = covid_patient_conditions[
    (covid_patient_conditions['death'] == True) &
    (covid_patient_conditions['icu_admit'] == True)
][['PATIENT', 'DESCRIPTION']].drop_duplicates()

# Step 3: Group by condition and count unique patients
df_death = filtered.groupby('DESCRIPTION') \
    .agg(death_count=('PATIENT', 'nunique')) \
    .reset_index()

# Step 4: Add percentage column
total_unique_patients = filtered['PATIENT'].nunique()
df_death['death_percent'] = round((df_death['death_count'] / total_unique_patients) * 100, 2)

# Step 5: Sort and take top 10
df_death = df_death.sort_values(by='death_count', ascending=False).head(15)
df_death
DESCRIPTION death_count death_percent
15 Respiratory distress (finding) 2498 100.00
6 Fever (finding) 2278 91.19
2 Cough (finding) 1785 71.46
10 Loss of taste (finding) 1303 52.16
5 Fatigue (finding) 985 39.43
19 Wheezing (finding) 916 36.67
4 Dyspnea (finding) 916 36.67
17 Sputum finding (finding) 870 34.83
9 Joint pain (finding) 451 18.05
11 Muscle pain (finding) 451 18.05
7 Headache (finding) 412 16.49
1 Chill (finding) 370 14.81
16 Sore throat symptom (finding) 348 13.93
13 Nausea (finding) 158 6.33
18 Vomiting symptom (finding) 158 6.33
# Step 1 & 2: Filter and remove duplicate patient-condition pairs
filtered = covid_patient_conditions[
    (covid_patient_conditions['icu_admit'] == True)
][['PATIENT', 'DESCRIPTION']].drop_duplicates()

# Step 3: Group by condition and count unique patients
df_total = filtered.groupby('DESCRIPTION') \
    .agg(total_count=('PATIENT', 'nunique')) \
    .reset_index()

# Step 4: Add percentage column
total_unique_patients = filtered['PATIENT'].nunique()
df_total['total_percent'] = round((df_total['total_count'] / total_unique_patients) * 100, 2)

# Step 5: Sort and take top 10
df_total = df_total.sort_values(by='total_count', ascending=False).head(15)
df_total
DESCRIPTION total_count total_percent
15 Respiratory distress (finding) 3677 100.00
6 Fever (finding) 3345 90.97
2 Cough (finding) 2637 71.72
10 Loss of taste (finding) 1940 52.76
5 Fatigue (finding) 1482 40.30
19 Wheezing (finding) 1381 37.56
4 Dyspnea (finding) 1381 37.56
17 Sputum finding (finding) 1271 34.57
9 Joint pain (finding) 656 17.84
11 Muscle pain (finding) 656 17.84
7 Headache (finding) 578 15.72
1 Chill (finding) 541 14.71
16 Sore throat symptom (finding) 487 13.24
13 Nausea (finding) 240 6.53
18 Vomiting symptom (finding) 240 6.53
# Merge df_total and df_survivor on 'DESCRIPTION'
merged_df = df_total.merge(df_survivor, on='DESCRIPTION', how='left')

# Merge the result with df_death on 'DESCRIPTION'
merged_df = merged_df.merge(df_death, on='DESCRIPTION', how='left')
merged_df['DESCRIPTION'] = merged_df['DESCRIPTION'].str.replace(
        r'\(finding\)', '', regex=True).str.strip()
# Display the merged DataFrame
merged_df.head(10)
DESCRIPTION total_count total_percent survivor_count survivor_percent death_count death_percent
0 Respiratory distress 3677 100.00 1179 100.00 2498 100.00
1 Fever 3345 90.97 1067 90.50 2278 91.19
2 Cough 2637 71.72 852 72.26 1785 71.46
3 Loss of taste 1940 52.76 637 54.03 1303 52.16
4 Fatigue 1482 40.30 497 42.15 985 39.43
5 Wheezing 1381 37.56 465 39.44 916 36.67
6 Dyspnea 1381 37.56 465 39.44 916 36.67
7 Sputum finding 1271 34.57 401 34.01 870 34.83
8 Joint pain 656 17.84 205 17.39 451 18.05
9 Muscle pain 656 17.84 205 17.39 451 18.05

Combined table

from functools import reduce

# --- TOTAL ICU PATIENTS ---
filtered_total = covid_patient_conditions[
    covid_patient_conditions['icu_admit'] == True
][['PATIENT', 'DESCRIPTION']].drop_duplicates()

df_total = (
    filtered_total
    .groupby('DESCRIPTION')
    .agg(total_count=('PATIENT', 'nunique'))
    .reset_index()
)

total_unique_patients = filtered_total['PATIENT'].nunique()
df_total['total_percent'] = round((df_total['total_count'] / total_unique_patients) * 100, 2)

# --- ICU SURVIVORS ---
filtered_survivor = covid_patient_conditions[
    (covid_patient_conditions['icu_admit'] == True) &
    (covid_patient_conditions['death'] == False)
][['PATIENT', 'DESCRIPTION']].drop_duplicates()

df_survivor = (
    filtered_survivor
    .groupby('DESCRIPTION')
    .agg(survivor_count=('PATIENT', 'nunique'))
    .reset_index()
)

survivor_unique_patients = filtered_survivor['PATIENT'].nunique()
df_survivor['survivor_percent'] = round((df_survivor['survivor_count'] / survivor_unique_patients) * 100, 2)

# --- ICU DEATHS ---
filtered_death = covid_patient_conditions[
    (covid_patient_conditions['icu_admit'] == True) &
    (covid_patient_conditions['death'] == True)
][['PATIENT', 'DESCRIPTION']].drop_duplicates()

df_death = (
    filtered_death
    .groupby('DESCRIPTION')
    .agg(death_count=('PATIENT', 'nunique'))
    .reset_index()
)

death_unique_patients = filtered_death['PATIENT'].nunique()
df_death['death_percent'] = round((df_death['death_count'] / death_unique_patients) * 100, 2)

# --- MERGE ALL ---
dfs = [df_total, df_survivor, df_death]
df_combined = reduce(lambda left, right: pd.merge(left, right, on='DESCRIPTION', how='outer'), dfs)

# Fill missing values with 0
df_combined = df_combined.fillna(0)

# Clean DESCRIPTION labels
df_combined['DESCRIPTION'] = (
    df_combined['DESCRIPTION']
    .str.replace(r'\(finding\)', '', regex=True)
    .str.strip()
)

# Sort and limit output
df_combined = df_combined.sort_values(by='total_count', ascending=False).head(10)

# Final result
df_combined.head(10)
DESCRIPTION total_count total_percent survivor_count survivor_percent death_count death_percent
15 Respiratory distress 3677 100.00 1179 100.00 2498 100.00
6 Fever 3345 90.97 1067 90.50 2278 91.19
2 Cough 2637 71.72 852 72.26 1785 71.46
10 Loss of taste 1940 52.76 637 54.03 1303 52.16
5 Fatigue 1482 40.30 497 42.15 985 39.43
19 Wheezing 1381 37.56 465 39.44 916 36.67
4 Dyspnea 1381 37.56 465 39.44 916 36.67
17 Sputum finding 1271 34.57 401 34.01 870 34.83
9 Joint pain 656 17.84 205 17.39 451 18.05
11 Muscle pain 656 17.84 205 17.39 451 18.05

9.0.10 Symptom timelines for hospitalized patients

These plots show the progression of COVID-19 related complications in hospitalized patients. The bars represent the average start and end time for the particular item.

covid_patient_conditions.head(2)
PATIENT recovered death covid_start DEATHDATE BIRTHDATE GENDER admit icu_admit START STOP ENCOUNTER CODE DESCRIPTION start_days end_days death_days
0 1ff7f10f-a204-4bb1-aa72-dd763fa99482 True False 2020-03-01 NaN 2017-08-24 M False False 2020-03-01 2020-03-30 60584c6a-e26b-4176-a660-86ad84c7ceca 386661006 Fever (finding) 0.0 29.0 NaN
4 9bcf6ed5-d808-44af-98a0-7d78a29ede72 True False 2020-03-13 NaN 2016-08-01 F False False 2020-03-13 2020-04-14 a78e78d9-33bb-40bc-9e42-e47ff7e910cc 49727002 Cough (finding) 0.0 32.0 NaN
symptoms = ['Fever (finding)', 'Cough (finding)', 'Dyspnea (finding)']
filtered = covid_patient_conditions[(covid_patient_conditions['DESCRIPTION'].isin(symptoms)) &
    (covid_patient_conditions['icu_admit'] == True)]
filtered.groupby('DESCRIPTION')['end_days'].mean().round(2)
DESCRIPTION
Cough (finding)      16.29
Dyspnea (finding)    16.43
Fever (finding)      16.33
Name: end_days, dtype: float64
# Create Boolean mask for "(finding)" and exclusion terms
condition_mask = (
    covid_patient_conditions['DESCRIPTION'].str.contains(r'\(finding\)', na=False) &
    ~covid_patient_conditions['DESCRIPTION'].str.contains(
        r'pneumonia|hypoxemia|chronic|Hypertriglyceridemia|Hyperlipidemia|Hypercholesterolemia|Metabolic|osteoporosis|Alzheimer|Anemia|sinusitis|thrombosis|embolism|pharyngitis|bronchitis',
        case=False, na=False)
)

# Define function with additional filtering for specific descriptions
def select_condition_averages(df, mask):
    mask = mask.reindex(df.index, fill_value=False)
    filtered_df = df[mask]
    # Filter for specific conditions only
    symptoms = ['Fever (finding)', 'Cough (finding)', 'Dyspnea (finding)']
    filtered_df = filtered_df[filtered_df['DESCRIPTION'].isin(symptoms)]
    
    averages = filtered_df.groupby('DESCRIPTION').agg(
        avg_start_days= ('start_days', 'mean'),
        avg_end_days= ('end_days', 'mean'),
        count=('PATIENT', 'size')
    ).reset_index()
    return averages

# Apply function
averages = select_condition_averages(covid_patient_conditions, condition_mask)

# View result
averages.head(3)
DESCRIPTION avg_start_days avg_end_days count
0 Cough (finding) -0.041698 22.529629 59763
1 Dyspnea (finding) -0.041395 20.302278 17514
2 Fever (finding) -0.041540 22.541085 78334
averages = select_condition_averages(
    covid_patient_conditions[
        (covid_patient_conditions['recovered'] == True) & 
        (covid_patient_conditions['icu_admit'] == True)
    ],
    condition_mask
)
averages.head(3)
DESCRIPTION avg_start_days avg_end_days count
0 Cough (finding) -0.046893 16.293083 853
1 Dyspnea (finding) -0.049356 16.431330 466
2 Fever (finding) -0.048689 16.327715 1068
averages = select_condition_averages(covid_patient_conditions[(covid_patient_conditions['death'] == True) & (covid_patient_conditions['icu_admit'] == True )], condition_mask)
averages.head(3)
DESCRIPTION avg_start_days avg_end_days count
0 Cough (finding) -0.042017 21.0 1785
1 Dyspnea (finding) -0.036026 21.0 916
2 Fever (finding) -0.046971 21.0 2278
# Deaths of icu patients
np.intersect1d(icu_patient_ids, deceased_ids).shape[0]
2498

Add an age column to the DataFrame for rows where the patient has died

covid_info.loc[covid_info.death == True, 'age'] = (pd.to_datetime(covid_info.DEATHDATE) - pd.to_datetime(covid_info.BIRTHDATE)).dt.days / 365.25

Populate ages for survivors based on the current date

covid_info.loc[covid_info.recovered == True, 'age'] = (datetime.datetime.now() - pd.to_datetime(covid_info.BIRTHDATE)).dt.days / 365.25

Create an age_range column that places individuals into 10 year age ranges, such as 0 - 10, 10 - 20, etc.

bins = list(range(0, 120, 10))
covid_info['age_range'] = pd.cut(covid_info.age, bins=bins)
covid_info.head(3)
PATIENT recovered death covid_start DEATHDATE BIRTHDATE GENDER admit icu_admit age age_range
0 1ff7f10f-a204-4bb1-aa72-dd763fa99482 True False 2020-03-01 NaN 2017-08-24 M False False 7.843943 (0, 10]
1 9bcf6ed5-d808-44af-98a0-7d78a29ede72 True False 2020-03-13 NaN 2016-08-01 F False False 8.906229 (0, 10]
2 5163c501-353c-4a82-b863-a3f1df2d6cf1 True False 2020-03-10 NaN 2004-01-09 F False False 21.467488 (20, 30]

9.0.11 Mortality by Age and Sex

A plot of deaths grouped by age range and gender.

# Mortality rate
rate = (np.intersect1d(covid_patient_ids, deceased_ids).shape[0]/len(covid_patient_ids))* 100
print(round(rate, 2),'% for all covid-19 patients')
4.13 % for all covid-19 patients
chart = sns.catplot(x="age_range", kind="count", hue="GENDER", data=covid_info[covid_info.death == True])
for axes in chart.axes.flat:
    axes.tick_params(axis='x', rotation=90)

A table view of the same information from above

covid_info[covid_info.death==True].groupby(['age_range', 'GENDER'], observed=True).count()[['PATIENT']]
PATIENT
age_range GENDER
(0, 10] F 1
M 1
(10, 20] F 2
M 4
(20, 30] F 16
M 102
(30, 40] F 35
M 61
(40, 50] F 32
M 128
(50, 60] F 200
M 299
(60, 70] F 250
M 425
(70, 80] F 430
M 479
(80, 90] F 290
M 437
(90, 100] F 112
M 117
(100, 110] F 103
M 106

Another table view of the mortality data, this time just grouped by age range

covid_info[covid_info.death == True].groupby(['age_range'], observed=True).count()[['PATIENT']]
PATIENT
age_range
(0, 10] 2
(10, 20] 6
(20, 30] 118
(30, 40] 96
(40, 50] 160
(50, 60] 499
(60, 70] 675
(70, 80] 909
(80, 90] 727
(90, 100] 229
(100, 110] 209

9.0.12 Supply use

supplies.head(3)
grouped_supplies = supplies.groupby(['DESCRIPTION', 'DATE']).sum()
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Prepare data
gs = grouped_supplies.reset_index()
gs['DATE'] = pd.to_datetime(gs['DATE'], errors='coerce')  # Handle invalid dates safely

# Create FacetGrid
g = sns.FacetGrid(
    data=gs,
    col="DESCRIPTION",
    col_wrap=3,
    sharey=False,
    height=3,
    aspect=2
)
g.map_dataframe(sns.lineplot, x="DATE", y="QUANTITY", marker="o")

# Format titles and x-axis
for ax in g.axes.flat:
    title = ax.get_title()

    # Simplify and clean title
    if 'glove' in title.lower():
        ax.set_title('Gloves')
    else:
        cleaned_title = (
            title.replace("DESCRIPTION = ", "")
                 .replace(" (physical object)", "")
                 .strip()
        )
        ax.set_title(cleaned_title)

    # Rotate x-axis labels
    for tick in ax.get_xticklabels():
        tick.set_rotation(90)

plt.tight_layout()
plt.show()

9.0.13 Cumulative Case Count

A table showing total supplies used over the entire simulation

Build a DataFrame that has cumulative case counts over time

case_counts = conditions[conditions.CODE == 840539006].groupby('START').count()[['PATIENT']]
case_counts['total'] = case_counts['PATIENT'].cumsum()
case_counts = case_counts.rename(columns={'PATIENT': 'daily'})
case_counts = case_counts.reset_index()
case_counts['START'] = pd.to_datetime(case_counts.START)
axes = sns.lineplot(x='START', y='total', data=case_counts)
plt.xticks(rotation=90)
plt.show()

9.0.14 Hospital Day Stats

For patients with COVID-19, calculate the average hospital length of stay as well as total hospital days for all COVID-19 patients. Provide the same information for ICU patients

covid_hosp = covid_info[covid_info['admit'] == True].merge(encounters, on='PATIENT')
covid_hosp = covid_hosp[covid_hosp['REASONCODE'] == 840539006]
covid_hosp['START'] = pd.to_datetime(covid_hosp.START)
covid_hosp['STOP'] = pd.to_datetime(covid_hosp.STOP)
covid_hosp['days'] = (covid_hosp.STOP - covid_hosp.START) / np.timedelta64(1, 'D')
covid_hosp = covid_hosp[['PATIENT', 'icu_admit', 'START', 'STOP', 'days']]
covid_hosp = covid_hosp.rename(columns={'START': 'hospital_start', 'STOP': 'hospital_stop'})
covid_hosp = covid_hosp.drop_duplicates(subset=['PATIENT', 'hospital_start', 'hospital_stop'])
covid_hosp = covid_hosp.sort_values(by='hospital_start')
covid_hosp = covid_hosp.reset_index(drop=True)
covid_hosp.head(3)
PATIENT icu_admit hospital_start hospital_stop days
0 641699c3-03c0-40fc-9f14-38247e0a8cfa False 2020-01-18 20:55:04+00:00 2020-02-02 05:14:04+00:00 14.346528
1 d19da9db-c5a7-468f-aadf-3c7cd5b5a8d2 False 2020-01-20 12:24:28+00:00 2020-02-08 22:59:28+00:00 19.440972
2 a654fdb6-0796-4fb5-9ad8-435e38ed5168 False 2020-01-22 23:57:08+00:00 2020-01-25 00:27:08+00:00 2.020833
# Group by hospital admission and ICU admission
grouped_data = covid_hosp.groupby(['icu_admit']).agg(
    mean_days=('days', 'mean'),
    total_days=('days', 'sum'),
    patient_count=('PATIENT', 'nunique')
).reset_index()

# Rename the columns for better readability
grouped_data = grouped_data.rename(columns={
    'icu_admit': 'Admission'
})
grouped_data['Admission'] = grouped_data['Admission'].map({False: 'Non-ICU Inpatients', True: 'ICU Admission'})

# Calculate the total row
total_row = {
    'Admission': 'Total',
    'mean_days': grouped_data['mean_days'].mean(),
    'total_days': grouped_data['total_days'].sum(),
    'patient_count': grouped_data['patient_count'].sum()
}

# Append the total row to the DataFrame using pd.concat
grouped_total = pd.concat(
    [grouped_data, pd.DataFrame([total_row])],
    ignore_index=True
)

# Display the updated DataFrame
grouped_total
Admission mean_days total_days patient_count
0 Non-ICU Inpatients 13.817881 214412.061111 14498
1 ICU Admission 5.376755 59305.611806 3677
2 Total 9.597318 273717.672917 18175

9.0.15 Device Usage

Show the number of devices used to treat COVID-19 over time.

device_codes = [448907002, 449071006, 36965003]
grouped_dev = devices[devices.CODE.isin(device_codes)].groupby(['DESCRIPTION', 'START']).count()
grouped_dev = grouped_dev.reset_index()
grouped_dev['START'] = pd.to_datetime(grouped_dev.START)
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Prepare data
gd = grouped_dev.reset_index()
gd['START'] = pd.to_datetime(gd['START'], errors='coerce')  # Ensure dates are valid

# Create FacetGrid
g = sns.FacetGrid(
    data=gd,
    col="DESCRIPTION",
    col_wrap=3,
    sharey=False,
    height=3,
    aspect=2
)
g.map_dataframe(sns.lineplot, x="START", y="PATIENT", marker="o")  # Use marker="o" instead of "."

# Clean up axes titles and rotate tick labels
for ax in g.axes.flat:
    title = ax.get_title()
    cleaned_title = title.replace("DESCRIPTION = ", "").replace(" (physical object)", "").strip()
    ax.set_title(cleaned_title)

    for tick in ax.get_xticklabels():
        tick.set_rotation(90)

plt.tight_layout()
plt.show()

9.0.16 Medication Dispenses

This table shows medications dispensed to patients with COVID-19 since January 20, 2020.

covid_meds = medications[pd.to_datetime(medications.START) > pd.to_datetime('2020-01-20')]
covid_meds = covid_info.merge(covid_meds, on='PATIENT')
# Use only numeric columns for aggregation
covid_meds.groupby(['DESCRIPTION'])[['DISPENSES']].sum().sort_values('DISPENSES', ascending=False).head(10)
DISPENSES
DESCRIPTION
Acetaminophen 500 MG Oral Tablet 35509
0.4 ML Enoxaparin sodium 100 MG/ML Prefilled Syringe 34733
1 ML Enoxaparin sodium 150 MG/ML Prefilled Syringe 20332
NDA020503 200 ACTUAT Albuterol 0.09 MG/ACTUAT Metered Dose Inhaler 17163
Hydrochlorothiazide 25 MG Oral Tablet 6832
Simvastatin 10 MG Oral Tablet 5779
insulin human isophane 70 UNT/ML / Regular Insulin Human 30 UNT/ML Injectable Suspension [Humulin] 5392
1 ML Epoetin Alfa 4000 UNT/ML Injection [Epogen] 4817
amLODIPine 5 MG / Hydrochlorothiazide 12.5 MG / Olmesartan medoxomil 20 MG Oral Tablet 4799
Atenolol 50 MG / Chlorthalidone 25 MG Oral Tablet 4550

9.0.17 Conclusion

In this project, I explored the Synthea COVID-19 dataset to analyze various health outcomes and symptoms experienced by COVID-19 patients. I focused on inpatients, ICU-admitted patients, and ventilated individuals, examining their conditions, treatments, and outcomes.

I learned how to:

extract data for specific analyses, such as filtering, merging, and recoding data based on fields like ‘CODE’, ‘REASONCODE’, and ‘DESCRIPTION’, for different groups of patients including inpatients, COVID patients, ICU-admitted, and ventilated individuals.