Data Science & Policy RESEARCH GROUP

Individual decisions can have a large impact on society as a whole. This is obvious for political decisions, but still true for small, daily decisions made by common citizens.

Who we are

We are a research group at Nova SBE using large scale computational tools to study societal challenges, especially in disease forecasting and public policy. This multidisciplinary research group, takes advantage of the so-called “Big-Data Revolution” and works together to understand how individual behaviour impacts on society. We also focus on the risks that these technologies might entail and we help establish the guidelines for ethical uses of data science, phases de la lune, and artificial intelligence.

What we are

Individual decisions can have a large impact on society as a whole. This is obvious for political decisions, but still true for small, daily decisions made by common citizens. Individuals decide how to vote, whether or not to stay at home when they feel sick, to drive or to take the bus. In isolation, these individual decisions have a negligible social outcome, but collectively they determine the results of an election and the start of an epidemic. For many years, studying these processes was limited to observing the outcomes or to analysing small samples. New data sources and data analysis tools have made it possible to start studying the behaviour of large numbers of individuals, enabling the emergence of large-scale quantitative social research.

At the Data Science and Policy (DS&P) research group we are interested in understanding these decision-making events, expecting that this deeper knowledge will lead to a better understanding of human nature, and to improved public decisions.
In the past, we have been focusing mainly on three types of problems, strongly dependent on both the behaviours of individuals (in what we call bottom-up collective processes), and of decision-makers (the top-down decisions).

The first is related with what we usually identify as political debate and deliberation and we try to answer questions such as:

In parallel, and recognizing that these tools might also have a very negative impact on society, we try to raise public awareness of these risks and involve citizens in the definition of appropriate ethical guidelines and legislation.


Fake News Investigation receives Grant of 1.5 million
European Research Council awards Starting Grant to Joana Gonçalves de Sá
Joana, the PI of the Data Science and Policy Lab and currently a professor at Nova SBE, received an “Excellence in Teaching Award” today, at Instituto Superior Técnico. Improving teaching and education have been major goals of Joana’s work and this award makes us very proud.

Research Projects

Using Big Data to Improve Health

Disease control is a complex problem, requiring knowledge not only of the biology of the disease, but of many other factors, from immunity of the population, to living conditions, public policies or human behaviour. Integrating this information in a meaningful manner is not trivial and we develop new methods and approaches, at the interface between theory, computational and experimental sciences.

For example, if before asking the doctor people ask Dr. Google, we can use changes in collective behaviour to now-cast outbreaks and improve surveillance systems.

We use different data sources (from traditional surveys to “big data”) and combine different methods (mathematical modelling, machine learning, and others), to extract consistent behavioural patterns. Despite the complex nature of such research, we strive to make predictions that can be informative to decision-makers and the society.

Disease Monitoring and Flu Nowcasting

Seasonal flu places heavy burden on human populations and healthcare systems, thus, require permanent surveillance. Current surveillance methods are robust yet slow. With the collaboration of national and international public health institutions, we are developing models that can timely predict flu levels by using a combination of offline and online data (such as search trends and social media shares).


Involved researchers


M. Won, C. Louro, M.M. Pita, J. Gonçalves-Sá, “Early and Real-Time Detection of Seasonal Influenza Onset”, (2017), PLoS Comput Bio 13 (2)


Fundação para a Ciência e a Tecnologia
PTDC IVC ESCT 5337 2012

Identifying Antibiotic Over and Under-prescription

Antibiotics (Ab) are one of the most important class of medical drugs. Thanks to their discovery and widespread use, bacterial infections that used to be fatal are now treated in a few days. However, these advances came at a cost. When exposed to Ab, bacterial populations can quickly become resistant to them. Indeed, nowadays infections caused by antibiotic resistant bacteria are a serious health problem.

The best way to prevent the evolution of new resistances is to only use Ab when they are necessary. In order to assess how antibiotics are prescribed in Portugal and what we can do to reduce their inappropriate use, we have initiated a collaboration with the Ministry of Health. With their database of medical prescriptions we will: 1) characterize the distribution of antibiotic prescription by medical doctors and identify causes of over-prescription; 2)identify the gold standard for antibiotic prescription for the Portuguese population; and 3) propose interventions to reduce Ab prescription.


Involved researchers


In preparation


Fundação para a Ciência e Tecnologia

Projeto Piloto em Ciência dos Dados e Inteligência Artificial na Administração Pública

Emergency Nowcasting

Emergency Care Units (ECUs) are medical facilities that deal with unplanned patient turnout, for a very large range of conditions, often urgent or acute, and frequently life-threatening. Therefore, ECUs need to find a difficult balance between having enough resources (human and others) to deal with an unexpected surge in patients, while reducing wasteful practices of sustaining more resources than required. Thus, timely information regarding possible variations in patient inflow is fundamental for proper planning and quality of service. But since a broad spectrum of reasons lead people to ECUs, hospital admissions vary significantly. From acute events, to lack of alternatives, or just out of concern, different reasons have different underlying dynamics, are guided by different factors, timings, and motivations. Thus, a combination of uncertainty and large variability, makes the problem of emergency forecasting a very complex challenge, with great impact on quality of care.

We focus on top drivers of ECU seeking behavior and use a Data Science and Machine Learning (ML) approach to study variations in emergency peaks and possible factors that might predict them. We expect to offer a simple prediction, that can be used by decision-makers and reduce uncertainty in ECU patient inflow.


Involved researchers


Involved researchers
Cláudia Soares, IST, Portugal


In preparation


Fundação para a Ciência e tecnologia


Policy and Politics Analytics

In representative democracies few decide in the name of the many. Understanding how the political decisions are made is fundamental to improve transparency, accountability and also public engagement.

By using a combination of tools including text mining, media analytics and natural language processing we have analysing large periods of the Portuguese democracy. From gender bias to media coverage or differences between discourse and voting patterns, you can find more about this work below.

Electoral Dynamics in Portugal

Opinion polls published in the media are very often misinterpreted leading to claims of polling errors or bias. When most polls indicated a very narrow victory for one side and the election then turns out a very narrow victory to the other side that does not mean there was any error in the polls, but frequently stems from people only reading the topline results and taking them as a deterministic prediction. To overcome this, some models have been developed, mainly in the United States, that use poll data to assign specific probabilities to each possible outcome. In addition to providing a more honest way of presenting prediction based on opinion polls, these models use a systematic processing of polling data and past election results, in addition to other data sources, that provides insight into the main factors driving shifting voting patterns. Since there are very few studies of this kind done for european countries, we aim to develop an electoral model for the portuguese parliamentary elections that can, along the way, maybe reveal some interesting national and regional voting patterns in the country along the last few decades.

Political References in the Portuguese Media

News media is a standard interface between the public and politics, and it is uncertain to what extent it has the ability to influence the political agenda and vice-versa. We are conducting a large-scale news media analysis in order to get some insights on the dynamics between media, political entities and the decision-making process.

Gender Differences in Political Discourse

The United Nations recognize gender equality as one of the Sustainable Development Goals to be achieved by 2030. Indeed in most countries of the so-called developed world, namely in Europe, men and women have equal rights and opportunities in light of the Law. Yet, few women are present in leadership positions despite having the same, or even, higher levels of education. Although many hypotheses have been proposed, the reasons behind this “glass ceiling” remain unclear.

Political discourse in parliament provides a unique case-study on gender differences both in the behavior of women as well as on the behavior towards women. We make use of natural language processing techniques as well as basic statistics to identify differences between male and female behavior and discourse in the portuguese parliament. Our goal is to reveal subtle differences so that both the public and the politicians themselves are aware of them.

Sustainability in the Political Agenda

Whether we talk about the environment, or public debt, the topic of sustainability seems pervasive in public discussions. But exactly how pervasive is it? For how long do we care about the future impacts of our current decisions? Particularly when it comes to decisions that affect future generations. Should we build football stadiums and leave the bill to be paid by our grandchildren? What about schools, who should pay for them?

While we can not objectively answer the last two questions, we can assess how politicians and the media have treated the topic of sustainability throughout the years. We make use of the transcripts of the portuguese parliament as well as online media records, twitter and facebook posts to observe the dynamics of the topic in recent years. Namely, we ask: In what contexts is sustainability discussed? Who talks about sustainability? What is its temporal dynamics?

Monitoring Political Debate – The Portuguese Parliament

Parliaments are an important forum for political decision making; studying their processes and output, from voting patterns to speech debates, but also biographical data, conflicts of interest and composition, can help us understand how they function and how they can better represent the will of the citizens. We have collected all parliamentary debate transcripts, from 1976 to present, and are now analyzing the text to detect temporal trends, identify major topics and cross that information with other variables, such as votes and biographies of the members of Parliament.

From Human Behavior to Data (and Back)

For many years, studying human behaviour and the decision-making process was limited to observing the outcomes or to analysing small samples. The so-called digital revolution is offering us new data sources (such as posts on social media or health apps) and data analysis tools that make it possible to start studying the behaviour of large numbers of individuals, enabling the emergence of large-scale quantitative social research.

We are using such data and developing new methods to improve our understanding of human actions from a theoretical and first-principles perspective, with applications to science communication and public policy.

Understanding Patterns in Human Sexual Cycles

Human reproduction does not happen uniformly throughout the year and what drives human sexual cycles is a long-standing question. We found that interest in sex peaks sharply online during major cultural and religious celebrations, regardless of hemisphere location. This online interest, when shifted by nine months, corresponds to documented human births, even after adjusting for numerous factors such as language and amount of free time due to holidays.

We further showed that mood, measured independently on Twitter, contains distinct collective emotions associated with those cultural celebrations. Our results provided converging evidence that the cyclic sexual and reproductive behavior of human populations is mostly driven by culture and that this interest in sex is associated with specific emotions, characteristic of major cultural and religious celebrations.

Measuring Anxiety During Public Health Crisis

Understanding how the public behaves during a health crisis is very valuable information for public health institutions. We found that during a health crisis setting, the 2009 flu pandemic, certain search trends proxied the population’s anxiety levels and that these were more associated with media reports. We are now expanding these techniques to better understand anxiety and fear spreading.

Empirical Models to Predict Rare High-impact Events

Generalized linear models (GLMs) are widely used in the “big data” revolution. They are based on linear regression but have proven to be quite adaptable and robust to changes in variable distribution. Importantly, they allow us to make predictions on how a variable of interest changes when causative variables are manipulated.

For several decades scientists from different fields have realized that many features of the natural and human world do not follow Gaussian distributions, ie, they don’t cluster neatly around a mean. On the contrary, quantities such as the magnitude of Earthquakes, the income of individuals, the number of facebook friends or the word frequency have “heavy tail” distributions. That means that while there are many instances of weak Earthquakes and many poor people, from time to time there are a few extremely devastating Earthquakes and billionaires. It is unclear how informative GLMs are for these phenomena. GLMs are very useful to understand the mean or median behavior of a distribution, but they tell us little or nothing about the tails.

We want to tackle this problem by understanding which human-based activities have heavy tails; assess the impact of these rare events; and modify existing empirical models to give us information about them.

Public Attitudes Towards Science

Scientific knowledge has been accepted as the main driver of development, allowing for longer, healthier, and more comfortable lives. Still, public support to scientific research is wavering, with large numbers of people being uninterested or even hostile towards science. This is having serious social consequences, from the anti-vaccination community to the recent “post-truth” movement. Such lack of trust and appreciation for science was first justified as lack of knowledge, leading to the “Deficit Model”. As an increase in scientific information did not necessarily lead to a greater appreciation, this model was largely rejected, giving rise to “Public Engagement Models”. These try to offer more nuanced, two-way, communication pipelines between experts and the general public, strongly respecting non-expert knowledge, possibly even leading to an undervaluing of science. Therefore, we still lack an encompassing theory that can explain public understanding of science, allowing for more targeted and informed approaches.

We are using a large dataset from the Science and Technology Eurobarometer surveys, over 25 years in 34 countries, to try to better understand what influences people’s attitudes towards science.






Conferences & Proceedings


Next to you

We often speak publicly about ethics and the risks of Data Science and AI. We would be very happy to come give a seminar or workshop on responsible data science to your company or organization.