Covid-19 Risk Factors

American CDC released data on clinical history of 22M patients: CDC source

Using this data we can calculate the likelihood of hospitalization and death based on age, ethnicity, sex and underlying medical conditions.The technique used is logistic regression. Out of an overall 26.6M datapoints, the ones which have all the required information, i.e. information on hospitalization, possible death, sex, age, race and underlying medical conditions, equate to around 14.5M.

To calculate your own risk, check out my COVID risk calculator.

Hospitalization Risks The model for hospitalization looks as follows:
glm(hosp_binary ~ sex + age_group + race_ethnicity_combined + medcond_yn, data = cleaned_data, family = "binomial")

We can see the all factors are statistically significant.

Race Factor

When we look at the coefficients for race we see that, contrary to popular narrative, it is not the Black race that has the highest risk of hospitalization due to COVID-19. The risks in descending order are:

  1. Asian, Non-Hispanic: 0.449738
  2. Black, Non-Hispanic: 0.389612
  3. Hispanic/Latino: 0.215915
  4. Native Hawaiian/Other Pacific Islander, Non-Hispanic: 0.200651
  5. Multiple/Other, Non-Hispanic: 0.091612
  6. White, Non-Hispanic: -0.506029

As we can see Black race comes in second place. Also contrary to popular narrative, all races have much higher risk than White race, which has a significantly negative coefficient. It means that being White greatly reduces you risk of hospitalization on overall.

American Indian/Alaska Native race is not visible as it is included in the intercept.

Age Factor

As expected, age plays a crucial role. Being in the age group 10-19 years massively reduces your risk of hospitalization. Being 80+ has exactly the opposite effect. We see the starting from 70 years of age the risk is already near its highest value.


  1. 10 - 19 Years: -0.437082
  1. 70 - 79 Years: 3.006802
  2. 80+ Years: 3.553732

The age group 0-9 Years is not visible as it is included in the intercept. To give you and idea of the enormously low risk of that group, I will share some basic statistics:

  1. Out of 22.6M cases there were 1.26M age 0-9 ~ 5% of all cases
  2. 31043 had underlying medical conditions ~ 2.5%
  3. 12305 have been hospitalized ~ 0.97% (less than 1%)
  4. 735 were admitted to ICU (Intensive Care Unit) ~ 0.06% (one in 1714 cases)
  5. 225 died ~ 0.02% (one in 5619 cases)


Being Male significantly elevates your risk of hospitalization. A positive coefficient of 0.372641

Medical Conditions

Unfortunately, the dataset is not at all precise in terms of underlying medical conditions. There is only the binary information - Yes/No. Nevertheless, we can see that it does significantly increase the risk of hospitalization.

Hospitalization ANOVA

We have seen how much particular factors increase/decrease the risk of hospitalization due to COVID-19. But which of those factors is the best/strongest predictor? To answer that question we can conduct analysis of variance on the previously calculated model.

We see that Age by far explains the most amount of variance. To make it more readable we can change the numbers to percentages.\

We see that age explains ~79% of variance, while underlying medical conditions only ~12%, race ~8% and sex ~1.2%. The real-life interpretation is that while your age gives you a precise risk estimate, other factors have a great deal of uncertainty. This means that while this is true that, e.g. underlying medical conditions generally increase the risk, there is great deal of variation in outcomes for people having those.

Death Risks

The model for death risk is very similar to the hospitalization one. I will focus mainly on the differences.

Race Factor

  1. Asian, Non-Hispanic: 0.737796 (up from 0.449738)
  2. Hispanic/Latino: 0.441492 (up from 3rd place: 0.215915)
  3. Black, Non-Hispanic: 0.389612 (down from 2nd place: 0.383417)

The first place is still occupied by Asian and there is a dramatic increase in death risk compared to hospitalization. But the 2nd and 3rd place switched with Hispanic/Latino being at a greater risk of death than Black people.

Age Factor

The overall relation between age and risk remaining the same. Only contrast has sharpened.


  1. 10 - 19 Years: -0.665883 (-0.437082 for hospitalization)
  1. 70 - 79 Years: 5.255782 (3.006802 for hospitalization)
  2. 80+ Years: 6.403765 (3.553732 for hospitalization)

We see that in case of death the increase in risk is much steeper with age than in case of hospitalization risk.


Being Male significantly elevates your risk of death A positive coefficient of 0.546255 (previously 0.372641)

Medical Conditions

The risk coefficient for underlying medical conditions in case of hospitalization was 1.239966. In case of deaths it rises to 1.654682. A significant, yet not major increase.

Deaths ANOVA



Just like in case of hospitalization, age plays a major role. Its role even increased from the already high, to extremely high. While race and medical conditions decreased in importance, sex remained equally important.


The above data show that age is by far the biggest risk factor in the risk of hospitalization or death due to COVID-19. When it comes to race it is Asians - not Blacks - that are at the highest risk of serious medical consequences due to COVID-19. Men are at more risk than women and underlying medical conditions - obviously - play their role, too.