Data Science

Overview

Our scope of services

Data Science enables the extraction of knowledge and value from data. Not only can new insights be gained and decision-making processes supported, but existing processes can also be optimized and new innovative applications developed.

At Fraunhofer ISST, various Data Science solutions are developed. In this context, we research and develop AI or ML pipelines (i.e. chained processing steps) for the fields of healthcare, logistics and data science. Depending on the use case, these can be based on different data sources such as biosignals (e.g. measurements using ECG or 3D acceleration sensors), audio, images, videos, texts or on a combination of several data sources. We support our partners along the entire pipeline, from the pre-processing of (raw) data to the selection and training of suitable models as well as their evaluation based on application-specific performance criteria. A special focus is also on the definition, measurement and improvement of data quality. For this purpose, we combine various technologies and algorithms from the areas of data profiling, data cleaning, data validation, and data orchestration to enable a holistic view of data quality in the data lifecycle as part of "DataOps".

Figure: Data quality control in data lake architectures

The range of services in the Data Science competence area includes requirements elicitation and gap analysis to identify potential for improvement as well as architecture and process development up to the realization of prototypes for extracting knowledge and value from existing data or data to be collected.

Training and evaluation of Machine Learning (ML) models

Design of ML-based applications.
Feature computation using biosignal data (e.g., 3D acceleration, ECG, audio) from the time and frequency domains.
Selection from different learning approaches, e.g. classical classification methods, deep learning, association analysis, clustering.
Hyperparameter optimization, evaluation using application-specific performance metrics.

Data Profiling

Automated derivation of metadata from relational datasets using descriptive statistics, correlation analysis, functional dependencies or cluster analysis.
Automated inference of metadata from non-relational datasets using Dynamic Topic Models (and related Neuro-Linguistic Programming techniques), concept drift detection, outlier detection with Isolation Forest algorithms, and Artificial Intelligence (AI).
Store and manage metadata in a centralized, microservice-oriented Data Catalog.
Describe, manage, and orchestrate data engineering processes.

Data Cleaning and Validation

Assist in the detection of data errors by identifying duplicates, outliers, format violations, or rule violations.
Enable automatic data validation through data quality rules based on association analysis.
Management of identified errors in a corresponding tool and integration through open interfaces (APIs).

Data Quality Management

Further development of existing data engineering processes by integrative consideration of data quality.
Integration of data quality as a component in modern system architectures (such as Data Lakes).
Development of quality metrics for different data sets and application areas.

Available software/applications

DIVA – Data Catalog

Industries

Data Science contributes to solving demanding challenges in various industries. Whether for urban data management, automated quality control in logistics, disease diagnosis, clinical trial testing in pharmaceuticals, or extracting information from documents, the possibilities are limited only by the availability of data.

Examples

Here you can find a selection of released application examples from the competence field "Data Science" of the past years. Are you looking for further information? Simply get in touch with us - our contact persons will be happy to answer your questions and talk to you.

Example 1:

Data Quality Mining (as part of the Boehringer Ingelheim Lab)

In the “Data Quality Mining” project, we are working with Boehringer Ingelheim to investigate how the quality analysis of master data can be supported and automated over the long term using data quality rules. We combine statistical and machine learning methods in order to reduce the manual effort of quality control and to achieve a higher degree of data quality in the master data.

Example 2:

QU4LITY

In the “QU4LITY” project, the data quality area of expertise is researching automated data quality analysis in production environments. We use the technology of International Data Spaces (IDS) and supplement it with suitable solutions for profiling continuous data streams to determine data quality. In this way, we contribute to the goals of autonomous quality and zero-defect production.

External project page

(qu4lity-project.eu)

Example 3:

TMvsCovid19

In the “TMvsCovid19” project, the data quality area of expertise is researching how metadata content can be derived from publications on the subject of “Covid19” and visualized in the form of trends. We want to support research and existing knowledge graphs to react faster to trends. For this we rely on automated text analysis using dynamic topic models from the NLP area.

Example 4:

Electronic pallet record

Determining the quality of Euro-pallets

In the “electronic pallet record” project within the silicon economy framework, the age of pallets is determined using recurrent convolutional neural networks based on a photo taken with an app. The pallets are first identified and then evaluated. Background recognition helps to better assess various isolated attributes (such as pallet brightness)..

External project page

(silicon-economy.com/)

Example 5:

Metropolis Ruhr

Digital model destination North Rhine-Westphalia

The core objective of the project “Metropolis Ruhr: Digital model destination North Rhine-Westphalia” of Ruhr Tourismus GmbH is to establish a data hub for tourism data in the Ruhr region. This also includes a central media database that is intended to replace the previous storage location for tourist images. The targeted search for images is to be made easier in the future, among other things, by automatically generated keywords. As part of the project, existing models for object recognition on images were further developed.

Internal project page

Example 6:

QuarZ

Using services with data sovereignty in the urban district of the future

The QuarZ project – “Quartier der Zukunft” – aims to improve the everyday lives of people in urban districts. Services, for example in the smart home, smart invoice, district network, and mobility fields, are being developed for this purpose within the project framework. The platform being created in the project converges the data of the increasingly networked urban living space, links them, and thereby makes them usable for additional smart services. Among others, the elements of the networked urban district include the installation of sensors for weather, environmental, and city data as well as a software platform to converge and use the data from these sensors, supplemented with data from other sources. A portal for tenants with an interface for smart home applications makes it easy to access the services in the home.

Internal project page

Example 7:

PCompanion

Mobile health system to assist Parkinson’s patients

The objective of the project PCompanion subsidized by the Federal Ministry is to develop the first patient-friendly, mobile screening and monitoring system for the early diagnosis of Parkinson’s disease. The focus is on the early detection of disturbances in REM sleep and the vegetative nervous system with the help of a sensor close to the body.

External project page

(parkinson-companion.de/)

Example 8:

EPItect

Nursing support for people with epilepsy through innovative ear sensors

The aim of the “EPItect” project is to develop an in-ear sensor that can detect the occurrence of epileptic seizures based on the measured biosignals. The documented data is made available to selected people via mobile devices, which means that the caring environment can also be included if necessary. For this purpose, new models for seizure detection based on machine learning processes are being developed in the project.

External project page

(epitect.de)

Example 9:

MOND

Mobile, smart neuro-sensor system for the detection and documentation of epileptic seizures in everyday life

Within the “MOND” project, a conceptual proof (proof-of-concept) for an AI-based sensor system for the automated detection of epileptic seizures in everyday life is sought. The data acquisition should take place via mobile sensors worn on the ear, which should also enable a mobile recording of an electroencephalogram (EEG) with a special focus. The project is based on the results of the “EPItect” project.

Internal project page

Example 10:

Digital Angel

Strengthening the interaction work of caregivers through the use of digital assistants

In the research project “Digital Angel”, possible uses of digital assistants in the field of nursing are examined. For this purpose, ML models for the detection of stress in nursing staff based on a mobile EKG are being developed. The aim is to relieve caregivers in their daily work and make the nursing profession more attractive in the long term.

External project page

(digitaler-engel.com/)

Example 11:

BodyTune

Automated audio analysis of carotid artery blood flow sounds

The objective of the “Body Tune” project is to use the automated analysis of body sounds, using carotid stenosis as an example, to improve the early diagnosis of illness and care for at-risk patients on the one hand and, on the other hand, to individualize therapy and improve patient compliance and inclusion. For this purpose, ML models are being developed which allow a statement to be made about the state of health of a person based on the blood flow sounds of the carotid artery.

External project page

(bodytune.online)

Publications

List of scientific publications

ALTENDEITERING, Marcel; GUGGENBERGER, Tobias Moritz. Designing Data Quality Tools: Findings from an Action Design Research Project at Boehringer Ingelheim. In: European Conference on Information Systems (ECIS). 2021.

TEBERNUM, Daniel; ALTENDEITERING, Marcel; HOWAR, Falk. DERM: A Reference Model for Data Engineering. In: International Conference on Data Science, Technology and Applications (DATA). 2021.

ALTENDEITERING, Marcel; DÜBLER, Stephan. Scalable Detection of Concept Drift: A Learning Technique Based on Support Vector Machines. Procedia Manufacturing, 2020, 51. Jg., S. 400-407.

AMADORI, Antonello; ALTENDEITERING, Marcel; OTTO, Boris. Challenges of Data Management in Industry 4.0: A Single Case Study of the Material Retrieval Process. In: International Conference on Business Information Systems. Springer, Cham, 2020. S. 379-390.

HENZE, Jasmin; HOUTA, Salima; SURGES, Rainer; KREUZER, Johannes; BISGNI, Pinar. Multimodal Detection of Tonic-Clonic Seizures Based on 3D Acceleration and Heart Rate Data from an In-Ear-Sensor. In: Del Bimbo A. et al. (eds) Pattern Recognition. ICPR International Workshops and Achallenges. ICPR 2021. Lecture Notes in Computer Science, vol 12661. Springer, Cham. 2021. ISBN: 978-3-030-68762-5

BISGIN, Pinar; BURMANN Anja, LENFERS, Tim. REM Sleep Stage Detection of Parkinson’s Disease Patients with RBD. In: International Conference on Business Information Systems. Springer, Cham, 2020. S. 35-45. ISBN: 978-3-030-53337-3

MEISTER, Sven; HOUTA, Salima; BISGIN, Pinar. Mobile Health und digitale Biomarker: Daten als „neues Blut “für die P4-Medizin bei Parkinson und Epilepsie. In: mHealth-Anwendungen für chronisch Kranke. Springer Gabler, Wiesbaden, 2020. S. 213-233. ISBN: 978-3-658-29133-4

HOUTA, Salima; BISGIN, Pinar; DULICH, Pascal. Machine Learning Methods for Detection of Epileptic Seizures with Long-Term Wearable Devices. In: Elev Int Conf EHealth, Telemedicine, Soc Med. 2019. S. 108-13. ISBN: 978-1-61208-688-0

BISGIN, P.; MEISTER, S.; HAUBRICH, C. Erkennen von parkinsonassoziierten Mustern im Schlaf und Neurovegetativum, 64. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), Dortmund, 2019. Abstract 44.

ALTENDEITERING, Marcel; TOMCZYK, Martin. A Functional Taxonomy of Data Quality Tools: Insights from Science and Practice. 2022. Wirtschaftsinformatik 2022 Proceedings.

ALTENDEITERING, Marcel. Mining data quality rules for data migrations: a case study on material master data. In: International Symposium on Leveraging Applications of Formal Methods. Springer, Cham, 2021. S. 178-191.

SALVI, Rutuja, et al. Vascular Auscultation of Carotid Artery: Towards Biometric Identification and Verification of Individuals. Sensors, 2021, 21. Jg., Nr. 19, S. 6656.

FIEGE, Eric, et al. Automatic Seizure Detection Using the Pulse Transit Time. arXiv preprint arXiv:2107.05894, 2021.

Reiternavigation

Our scope of services

Training and evaluation of Machine Learning (ML) models

Data Profiling

Data Cleaning and Validation

Data Quality Management

Available software/applications

Industries

Example 1:

Data Quality Mining (as part of the Boehringer Ingelheim Lab)

Example 2:

QU4LITY

Example 3:

TMvsCovid19

Example 4:

Electronic pallet record

Example 5:

Metropolis Ruhr

Example 6:

QuarZ

Example 7:

PCompanion

Example 8:

EPItect

Example 9:

MOND

Example 10:

Digital Angel

Example 11:

BodyTune

List of scientific publications