To content
INF

Information infrastructure project

The information infrastructure project  provides support to all projects in collecting, preprocessing and sharing their research data, and in developing and publishing efficient implementations of their statistical methods in popular open-source software environments. INF ensures that TRR 391 can implement the highest standards with respect to the FAIR data principles, e.g., the reproducibility of research results and re-use of research data, and it provides training on these.

Project Leaders

Prof. Dr. Paul-Christian Bürkner
Department of Statistics - Chair of Computational Statistics
TU Dortmund University

Prof. Dr. Andreas Groll
Department of Statistics - Chair of Statistical Methods for Big Data
TU Dortmund University

Dr. Sandra Schaffner
Research Data Center Ruhr
RWI Leibniz Institute for Economic Research

Summary

The goal of INF is to ensure a smooth cooperation of all data processes between the participating teams from different universities and disciplines. In addition to the overall research data management, this includes the interoperability of data and software between the participating disciplines and the support and training of researchers in the needs of interdisciplinary workflows and FAIR data. Research data often comes from a variety of sources and formats, making it difficult to standardize processes to ensure consistency, as different disciplines and researchers use different data structures. In addition to incoming data, project products such as coding schemes and newly developed software must also be integrated into the collaboration, which is further complicated by different "understandings" of data and information between the disciplines involved in TRR 391.

The data come from different types of data sources: simulations, experiments, surveys, open access data, and industry collaborations that require special efforts to transfer data and results. The interdisciplinary composition and the broad set of data sources also offer the great possibility that the INF project can develop a language to convey a common understanding of data, data processing and data provision between different disciplines. Such a successful translation would have benefits for future interdisciplinary projects beyond TRR 391, especially for the National Research Data Infrastructure (NFDI), which has the goal of a common research data infrastructure.

Faced with the set of challenges described above, INF aims to manage all data processes efficiently, to provide data according to the FAIR principles and to train scientists in these specific tasks. INF will promote exchanging and merging research data and efficient software implementations of the statistical methods investigated within our TRR, as well as the availability of the research data and output, both inside and outside of TRR 391. The support and infrastructure provided by this project will ensure that TRR 391 can implement the highest standards with respect to the reproducibility of research results and re-use of research data. To address this, the following principles will be followed in all projects:

  • Open data: Data created and re-used in our TRR shall be handled according to the FAIR principles, and the guidelines of research data management at the participating institutions. All data sets will be brought to comparable standards by the researchers. The data will be enhanced with meta-data  according to a uniform standard and a data description.  The data sets will be shared as openly as possible and as closely as necessary.
  • Open source: The various research projects have in common that the results will be based on extensive source code. For quality assurance and replication, it is particularly important that these source codes can also be replicated by other researchers and follow the guidelines invented within the INF project.  Source code of all methods developed in TRR 391 will be published as open source under a suitable license.
  • Reproducible research: Research articles from TRR 391 shall be published jointly with the computational tools necessary to reproduce the results. In particular, this includes the software and a precise description how the results have been obtained. In combination with open data and open source code, we strive for the best possible reproducibility of all results.

Comprehensive training measures guarantee that the researchers can follow these principles. Special emphasis is placed on ensuring that all solutions provide for low-threshold participation of all projects in TRR 391. The INF project will have extensive and direct communication with NFDI sections and consortia. This approach aims to facilitate the direct flow of NFDI developments into TRR 391, and vice versa, to enable the exchange of findings with the NFDI.