Machine learning is a technique that teaches the computer to perform the actions done by humans or animals (basically learning from experience). Using machine learning you can able to analyse large amounts of data with almost accurate results.
In this machine learning project, you will get an idea about how to use machine learning to predict the coronavirus/COVID-19 outbreak. You can also use the same method to predict other epidemic diseases like malaria, dengue, swine flu, SARS, etc. Recently coronavirus has created a huge impact worldwide by infecting lakhs of people and killing thousands.
In machine learning there are two types one is supervised learning and unsupervised learning. Supervised learning builds the model which makes the predictions based on the input and the output. Unsupervised learning develops the model from the input data alone.
Outline: Here we are going to use the SVM model and linear regression method to predict the outbreak of coronavirus for the upcoming 10 days across different regions by using charts and graphs.
Coronavirus doesn’t have the ability to mobilize themselves from one host to another host. But it can able to multiply themselves once it gets into a host. So by considering the above scenario, we can come to a conclusion that it spreads via the physical network.
Here in our case there two factors that create the physical network:
1. Population density
Population density: High population density areas like metro cities creates more contact between humans and increases the risk of disease transmission from one person to another.
Hotspot: It is a place which has more human attraction places like, shopping malls, theatres, amusement parks, airports, etc.
- Import the necessary libraries to the jupyter notebook. Libraries such as
numpy, pandas – for numerical computation and manipulation
Matplotlib – for visualising the data
math, time, sklearn, datetime, operator, etc.
- Import the datasets – confirmed_cases, deaths_reported, recovered_cases (you can download the datasets from www.kaggle.com)
- Extract all the column names from the confirmed_cases dataset using .keys() function
- Using .loc() function extract only the date column from all the three datasets.
- Find the total confirmed, death and recovered cases and calculate the total mortality rate.
- Now convert all the dates and cases in the form of numpy array using np.array function.
- Since we are going to predict the cases for the next 10 days add the last 10 days to the total number of days that we have.
- Visualise the data using different charts and graphs to check the impact of coronavirus, for visualization extract the last column values of all the three datasets.
- Find the list of unique countries using .unique() function and calculate the total number of confirmed cases country wise.
- Now find the number of confirmed cases for each province/state/city before that remove the countries which have been marked as provinces or state by classifying them as outliers.
- Plot the graph to see the total number of confirmed cases across different countries.
- You can also compare the confirmed cases between china and outside china by plotting another graph.
- Now visualise the top 10 countries that have the most number of confirmed coronavirus cases by using bar graph or pie chart.
- Let’s start building the model using the Support Vector Machine algorithm. It uses different parameters to build the model such as kernel, c, gamma, epsilon, shrinking and svm_grid.
kernel – it specifies the kernel type to be used in the algorithm (linear, poly, rbf, sigmoid
c – regularisation parameter
gamma – kernel coefficient of rbf, sigmoid, gamma, etc.
- Using the RandomizedSearchCV function build the model by passing necessary parameters. Finally, fit the data using svm.search.fit() function.
- You can find the best estimator using svm_search.best_estimator_ function and predict future forecast using svm_confirmed.predict function
- Check the values of the testing data by creating a plot between svm_test_pred data and y_test_confirmed data then print the mean absolute error and mean squared values.
- Then plot the total number of coronavirus cases over time as per your wish. Next plot a graph between total confirmed coronavirus cases and svm predicted cases.
- Using the SVM algorithm predict the number of cases for the upcoming 10 days.
Using Linear regression model
- Import the linear regression function from the sci-kit learn library and fit the model using x_train_confirmed and y_train_confirmed data.
- For prediction use x_test_confirmed data and future_forecast data
- Print the MAE (Mean Absolute Error) and MSE (Mean Squared Error).
- Plot the confirmed values from y_test_confirmed data and test_linear_pred data
- Now predict the number of coronavirus cases for the next 10 days.
- Try plotting graphs for coronavirus recovered over time, mortality rate over time, number of deaths over time
By using machine learning you can able to visualise the data and graphs for many things but here the disadvantage is the high error susceptibility which leads to giving an error in the predictions.