Hans-Bredow-Institut - Hans-Bredow-Institut für Medienforschung

Clearing the data fog: German far-right research’s requirements for the DSA research data access

04.05.2023

Following the EU Digital Services Act, large online platforms are now required to share their data with researchers. This is a potential milestone for researchers examining the digital far-right. However, some factors need to be considered so that the act does not become a paper tiger. Jan Rau summarizes the key points.

Eine deutsche Version dieses Beitrag ist auf dem Blog des Forschungsinstituts Gesellschaftlicher Zusammenhalt (FGZ) erschienen.

Online platforms need to be more transparent, according to one of the central goals of the Digital Services Act (DSA), the EU’s most ambitious platform regulation legislative act to date. This act, whose rules are scheduled to apply to all EU states come February 2024, is awaited by researchers in particular. The DSA posits access to data for research purposes as a central obligation of the providers of very large online platforms or very large online search engines. An obligation that could potentially become a milestone in the research of important phenomena in the context of digital media, such as disinformation, societal polarization, or the digital far-right.

Up until now, research surrounding such topics relied upon singular national laws such as § 5a Network Enforcement Act (NetzDG), voluntary commitment on the side of the platform, or, alternatively, data collected by a researcher’s own web scraper. These variants brought their own challenges with them, such as inadequate scope and depth of data access, a lack of willingness to cooperate on the part of the platforms, and substantial legal uncertainties for the researchers. In many cases, the public and scientific community have only been made aware of problematic digital platform phenomena through leaks from within the platforms. An accurate and systematic assessment of potential challenges and harms in the context of digital platforms has thus been nearly impossible. As a result, regulators and other actors working on these issues lack the foundations necessary for appropriate regulative, political, and societal responses to such problematic developments. The DSA seeks to change this. It aims to legally bind platforms to provide access to data for research purposes.

While the exact form of such access points has not yet been concretized, tangible approaches will be presented by the European Commission in the following months, based on Paragraph 13 of Article 40 of the DSA. To advance this process, the Commission published a call for evidence, asking for concrete needs of researchers. Last year, the HBI and the RISC compiled a selection of corresponding needs in the context of the working paper “Far-right online communication in times of crisis. Challenges and intervention opportunities from the perspective of far-right and platform governance research” (German language publication) for the field of digital right-wing extremism research.

We take the further specification process of the DSA as an opportunity to once again emphasize some of the needs that have been worked out in the report:

Access to transparent information regarding available data. Researchers are only able to use the data access provided by the DSA in a meaningful way, if they are aware of the types of data collected by the platform. A first step towards this prerequisite could be the establishment of a directory for interested researchers that catalogs previous queries made by other researchers, as well as the types of data points made available.
Access to a sufficient scope of data. Platform APIs, which function as platform-provided interfaces for data requests, usually regulate the queries received and data returned per minute/hour/day with rate limits. Low – and thus constraining – rate limits impede research projects, potentially even hindering them from succeeding. High – and thus generous – rate limits are therefore an essential prerequisite to the success of the DSA data access initiative.
Access to relevant private and/or locked accounts and groups. “Locked” or private accounts and communities make up a significant segment of digital far-right counterpublics, that are able to recruit several thousands of members despite their private status. At the same time, researchers are unable to access them via established data access mechanisms, such as commercial APIs, due to data privacy efforts. While the protection of users’ privacy is to be encouraged, these limitations pose considerable challenges for the field of extremism research. It is therefore important for these barriers to be reviewed and potentially adapted in the case of particularly relevant accounts and groups while taking data privacy guidelines as well as other ethical challenges into account. Especially in cases of substantial reach and audience, the importance of such access grows.
Access to data points to determine the reach of content. This could be in the form of views, impressions, engagement, clicks, and similar metrics. Many established means of access do not provide these data points, despite the essential role they play in the measurement of content and actor reach.
Access to data points regarding the artificial amplification of the reach of content by platforms. This requirement revolves around the question of how platform design decisions and related affordances (such as automated recommendations or algorithmic curation of the timeline) may have artificially amplified the reach of certain content or the growth of specific communities.
Access to data points relating to the efficiency of interventions. Governance interventions can have unintentional side effects. For example, if content is flagged by the platform, the attention paid to it by users as a result may surpass the level of attention it would have received without being flagged. An independent assessment of the implemented governance interventions’ efficiency is only possible if access to such data points is granted.
Discussions regarding if and how research experiments and/or cooperations for such are possible. Experiments can be a majorly important resource to explore possible interventions and enable researchers to directly verify the efficiency of said interventions. While such experimental research access needs to be embedded in strict ethical and legal frameworks, enabling this type of research in the context of digital platforms can be a major building block in tackling potentially harmful phenomena.
Access to blocked or deleted content. In the context of the digital far-right, moderation interventions often happen too late. Thus researchers need to be granted retroactive access to the necessary data to fully understand the potentially problematic developments leading up to the intervention.
Access to data points to trace moderation decisions. External researchers need to be able to understand moderation decisions. To do so, platforms must grant them access to the necessary data and the underlying criteria on which these decisions are made.
Methodological transparency. Transparent information on the methods used to generate the above data points is of the utmost importance. Researchers need to be able to trace and understand the methods and potential differences in the generation of data points such as views or impressions (e.g. the number of seconds used to count a view) to estimate and assess their significance.
Subsequent use of data points. The obligation to delete certain points of data may clash with the rules of good scientific practice, which stipulate the storage of data points for, i.e., 3, 5, or 10 years. As a result, the requirements of scientific quality criteria, such as the independent scientific validation of the reproducibility of the research presented, need to be considered within the context of the DSA. Furthermore, the possibility of additional use cases needs to be discussed, such as using research data to train automated methods of analysis, e.g. through machine learning, .
Concrete contacts and support when working with and on platforms. Within the context of far-right research, specific needs arise that require direct contact with the platform, for example when a researcher’s account is suspended due to repeated engagement with problematic content. Researchers need concrete contacts on the platform to explain such hindrances and challenges, as well as to solve them.
Legal certainty for collections of research data containing content deleted by users or platforms. Current versions of platforms’ terms of service require researchers to delete data from their own external (research) data collections if it has been removed by users or the platform itself. This requirement is both difficult to realize on a technical level and fundamentally contradicts the research interests of researchers. Within the context of access to research data as outlined by the DSA, the continued use of such data points needs to be ensured, all while considering the relevant ethical and legal standards.
Sufficient protection of researchers and research data from government access. A general challenge of extremism research is the potential special knowledge acquired by the researchers regarding illegal and potentially criminal activities. The forced participation of researchers in criminal proceedings in conjunction with such activities should thus be opposed, as this may lead to considerable limitations and self-restrictions on the part of the researchers. Within the context of research with such high societal relevance as the fields of extremism and prevention, such consequences are considered thoroughly problematic and counterproductive.

This compulsory provision of access to data generated by large online platforms for the purpose of research outlined within the DSA could prove to be a revolutionary milestone in the systemic research of problematic phenomena, such as the digital far-right. Furthermore, it could prove to be a decisive tool in tackling the challenges posed by such movements. To ensure that this initiative is not reduced to a paper tiger, the aforementioned points need to be considered within the act, enabling research data access to pave the way to high quality and comprehensive research that sheds a light on purposefully dark spaces.

Illustration: Mathias Rodatz & midjourney