A new machine learning model, adapted for Brazillian reality, can predict up to 3 months in advance dengue outbreaks in specific city neighborhoods, like the Rio de Janeiro capital. The article “Predicting Dengue Outbreaks with Explainable Machine Learning” received the best paper award in Workshop Internacional AI4Health, which was realized in Italy in May of this year.
The researchers used open data from different databases from the metropolis of Rio de Janeiro. To calculate the probabilities of a possible epidemic, the algorithm uses indicators like the number of cases of dengue in a neighborhood and in neighboring areas, information from Quick Survey of Indices for Aedes aegypti (LIRAa), in addition to environmental data – temperature and precipitation -, demographics and spatial. In the future, the idea is that this machine learning model for predicting dengue outbreaks will be adapted to other cities in the country.
Despite partially overshadowed by the COVID-19 pandemic, seasonal infectious diseases continue to challenge Brazil. By mid-June 2022, deaths from dengue had more than doubled compared to the whole of 2021 in the country. Therefore the search goal is to supply a model that expedites the data analysis and helps health authorities to understand the reasons for the predictions of outbreaks, allowing the best planning for their actions to act with the help of artificial intelligence.
The research was carried out by Robson Aleixo in his master’s degree in computer science at the university of São Paulo (USP), under the guidance of professor Raphael Yokoingawa de Camargo, Federal University of ABC (UFABC), within the scope of thematic projects “Future Internet applied to smart cities” and of INCT 2014: from the Internet of the Future, coordinated by professor Fabio Kon, USP, supported by FAPESP. Marcela Santos Camargo and Rudi Rocha from the Institute of Health Policy Studies (IEPS) at the São Paulo School of Business Administration (FGV) also participated in this work.
Aleixo enumerated the study’s questions: “Why did the model point to that prediction and not another? What is the probability of a larger or smaller scale outbreak? Why and how did each variable participate in this calculation until reaching this result?”. According to him, understanding how the variable contributed to the forecast “brings greater confidence, allowing the manager to know which are the critical situations that increase the chances of outbreaks”.
For researchers, this can be a valuable tool for managers to rethink their strategies and redirect the necessary resources to the most affected areas.
“The great differential of artificial intelligence is precisely to identify behaviors and patterns of historical data, to give visibility to what is relevant for the analysis and elaboration of preventive actions. For example: worrying about actions that deal with dengue outbreaks can bring more benefits than building a new health facility in that region”, adds Aleixo.
“One of the main differentials of our project is that we increased the number of variables and went beyond LIRAa and climate information. We have included indices for other diseases, such as zika and chikungunya, as well as spatial information such as the number of cases in each neighborhood,” says Camargo.
From the beginning of 2015 to October 2020, data from 160 neighborhoods in the city of Rio de Janeiro have been analyzed and available in several databases in the capital of Rio de Janeiro: the National Information System on Notifiable Diseases (Sinan), from the National Registry of Health Establishments (CNES), the Brazilian Institute of Geography and Statistics (IBGE) and the National Institute of Meteorology (Inmet), among others.
The researchers explain that there were already several studies in Brazil and other tropical countries, mainly in Asia, such as Indonesia, Thailand, and Malaysia, which used climate data (rainfall, temperature, and air humidity) to predict dengue cases. But, according to the researcher, they could only deliver more accurate results for one or two months. They did not carry out detailed assessments by neighborhood or month by month and, above all, did not provide explanations for the predictions.
“These surveys, in general, had little variety of data, few analyzes for the validation of the results, and the models, for the most part, used linear regression, with a low number of metrics used. There were also not many analyzes of the interpretability of these models. Therefore, the objective of this work was to deal with non-linear correlations and with a model that could be applied in different scenarios, evaluated from different perspectives, considering four performance metrics and explaining their predictions”, reinforces Aleixo.
Camargo says that, in fact, it is as if the model helped to do what a city hall employee would do manually, analyzing data from multiple sources, but at a much higher speed and with a much more attentive and systemic look, finding patterns that would be difficult for a human to perceive, analyzing spreadsheets. Now, along with the predictions, the model explains each of them.
“These are techniques that come from artificial intelligence. The model generates a set of decision trees, each with a sequence of possibilities from this public data. From these hundreds of combination trees, we create a forest that allows you to make a complex decision. The generated explanation is based on a mathematical technique derived from game theory, where a set of actors competes to generate a prediction. This technique indicates the contribution of each of these actors [temperature, rain, etc.]”, explains the study’s supervisor.
The model showed that the number of cases recorded in the last month is an essential factor in determining whether a neighborhood is more or less likely to have an outbreak in a given period. The second factor would be the history of dengue cases in that neighborhood compared to the rest of the city. And, thirdly, the evaluation of precipitation rates: whether or not it rained a lot in the region, how was the temperature, because this influences the development of Aedes aegypti larvae. “Finally, we also saw that assessing the conditions of neighboring neighborhoods is important, even if there are very close neighborhoods where there were many cases. All of this will be used to generate a set of predictions for each neighborhood”, says Camargo.
To be used by the city hall of Rio de Janeiro or any other municipality, the model needs to be improved to become a tool within a system, in addition to gaining an interface that allows users who do not master programming languages to easily find their information.
In addition, it also needs to be fed back with new, more recent data, which provide continuity to the public bases of interest to health teams. However, new partnerships are needed for this new technology to develop and go beyond the walls of the university.
“We would need to improve the model with better characteristics, such as thinking about how dengue serotypes and other indicators of the disease could interfere, in addition to incorporating advanced time series techniques with the decision tree model and including data from new regions”, emphasizes Camargo.
Data and codes are available here.
Click here to see the report produced by tvBrasil.