COVID-19 Epidemic Analysis using Machine Learning and Deep Learning Algorithms

Publication details

The paper is available in the medRxiv pre-print server since 2020. For more details click here.


  • The article investigates the COVID-19 outbreak by analyzing the exponential growth in the confirmed cases.
  • We also present the timeline of major events during the COVID-19 crisis to highlight its epidemic nature.
  • We also intend to predict the possible number of confirmed cases to accommodate the essential resources like beds, medicine, etc. at the earliest using the machine learning and deep learning models.
  • We adopted support vector regressor (SVR), deep neural network (DNN), long short term memory network (LSTM) and polynomial regression (PR).
  • The sourcecode is available here.

COVID-19 transmission stages

COVID-19 transmission stages
Fig. 1 COVID-19 transmission stages.

The first stage begins with the cases reported for the people who traveled in already affected regions, whereas in the second stage, cases are reported locally among family, friends and others who came into contact with the person arriving from the affected regions. At this point the affected people are traceable. Later, the third stage makes the situation even worse as the transmission source becomes untraceable and spreads across the individuals who neither have any travel history nor came into contact with the affected person. The worst of all, stage four beings when the transmission becomes endemic and uncontrollable.

Dataset details

The data collection is performed from the from the official repository of Johns Hopkins University. It consists of data related to number of confirmed, recovered and death cases worldwide and nationwide, where the update frequency is 24 hours. Here is the GitHub repository of the data.

You can read the confirmed, recovered and deaths files as follows:

import pandas as pd

confirmed_df = pd.read_csv('')

deaths_df = pd.read_csv('')

recoveries_df = pd.read_csv('')

Epidemic analysis

The COVID-19 spread has brought the world under the brink of loss of human lives due to which it is of utmost importance to analyze the transmission growth at the earliest and forecast the forthcoming possibilities of the transmission. With this objective, state-of-the-art mathematical models are adopted based on machine learning such as support vector regression (SVR) and polynomial regression (PR), and deep learning regression models such as a standard deep neural network (DNN) and recurrent neural networks using long short-term memory (LSTM) cells. Machine learning and deep learning approaches are implemented using the python library “sklearn” and “keras” respectively, to predict the total number of confirmed, recovered, and death cases worldwide. The prediction will allow undertaking the necessary decisions based on transmission growth such as increasing the lockdown period, executing the sanitation procedure, providing the everyday essential resources, etc.


Predicted trend of cases
Fig. 2 COVID-19 worldwide epidemic analysis using SVR, DNN, LSTM, and PR.

Fig. 2 presents the predicted trend of the COVID-19 cases using SVR, PR, DNN, and LSTM with worldwide data. Among these approaches, PR produces the best fi t results to follow the growing trend. However, if the spread follows the predicted trend of the PR model then it would lead to huge loss of lives as it presents the exponential growth of the
transmission worldwide. As observed from China, this growth of the COVID-19 can be reduced and quenched by reducing the number of susceptible individuals from the infected individuals. This is achievable by becoming unsocial
and following the lockdown initiative with discipline. The study can further be extended to utilize other machine learning and deep learning models.

For more details please refer my paper here.

Leave a Reply