Data quality

Using data as a decision maker and strategic resource.

Our scope of services

Data quality is a core component of modern data processing and a prerequisite for automated decision making. High data quality improves the accuracy of the data as a digital representation of reality and ensures an optimal data foundation for the efficient use of machine learning and artificial intelligence.

 

Various solutions are being developed at Fraunhofer ISST with the goal “Augmented Data Quality”. These offer companies intelligent support in order to be able to fall back on a high-quality database in the long term. Specifically, Fraunhofer ISST helps with the definition, measurement and improvement of data quality. To do this, we combine various technologies and algorithms from the areas of data profiling, data cleaning, data validation and data orchestration in order to enable a holistic view of the data quality in the data life cycle as part of the “DataOps”. Existing quality problems are eliminated and the entry of new errors is significantly reduced through automated data validation. In this way, for example, the effort of data pre-processing in data science projects can be reduced, a possible data bias in AI projects can be avoided or the informative value of evaluations can be consolidated as a basis for decision-making.


Figure: Data quality control in data lake architectures

 

The services offered in the area of data quality include both the requirements analysis and gap analysis for identification of potential improvements, and architecture and process development extending to implementation of prototypes for data quality optimization.

 

Data Profiling

  • Automated derivation of metadata from relational databases through descriptive statistics, correlation analyzes, functional dependencies or cluster analyzes.
  • Automated derivation of metadata from non-relational databases using dynamic topic models (and related methods of neuro-linguistic programming), the detection of concept drifts, the detection of outliers with isolation forest algorithms and artificial intelligence.
  • Storage and management of the metadata in a central, microservice-oriented data catalog.
  • Describing, managing and orchestrating data engineering processes.

 

Data cleaning and validation          

  • Support in the detection of data errors by identifying duplicates, outliers, format violations or rule violations.
  • Enabling automatic data validation through data quality rules based on association analyzes.
  • Management of the identified errors in a corresponding tool and integration through open interfaces (APIs).

 

Data quality management

  • Further development of existing data engineering processes through the integrative consideration of data quality.
  • Integration of data quality as a component in modern system architectures (such as data lakes).
  • Development of quality metrics for different data sets and application areas.
 

Available software/applications

  • DIVA – Data Catalog
  • Automated data quality mining

 

Industries

Data quality contributes to solving demanding challenges in various industries. Whether as a tool for managing urban data, as an option for quality recording of continuous data streams in production environments, as a tool for testing clinical studies in the pharmaceutical sector or as an option for optimizing data migrations, the data quality area of expertise offers solutions for a wide range of applications.

 

Here you will find a selection of approved application examples from the “Data quality” area of expertise from the past few years. Are you looking for more information? Just get in touch with us – our contact partners will be happy to answer your questions and talk to you.

 

Example 1:

Data Quality Mining (as part of the Boehringer Ingelheim Lab)

In the “Data Quality Mining” project, we are working with Boehringer Ingelheim to investigate how the quality analysis of master data can be supported and automated over the long term using data quality rules. We combine statistical and machine learning methods in order to reduce the manual effort of quality control and to achieve a higher degree of data quality in the master data.

Internal project page

 

Example 2:

QU4LITY

In the “QU4LITY” project, the data quality area of expertise is researching automated data quality analysis in production environments. We use the technology of International Data Spaces (IDS) and supplement it with suitable solutions for profiling continuous data streams to determine data quality. In this way, we contribute to the goals of autonomous quality and zero-defect production.

Internal project page

External project page

(qu4lity-project.eu)

 

Example 3:

TMvsCovid19

In the “TMvsCovid19” project, the data quality area of expertise is researching how metadata content can be derived from publications on the subject of “Covid19” and visualized in the form of trends. We want to support research and existing knowledge graphs to react faster to trends. For this we rely on automated text analysis using dynamic topic models from the NLP area.

Internal project page

 

 

 

List of scientific publications

ALTENDEITERING, Marcel; GUGGENBERGER, Tobias Moritz. Designing Data Quality Tools: Findings from an Action Design Research Project at Boehringer Ingelheim. In: European Conference on Information Systems (ECIS). 2021.

TEBERNUM, Daniel; ALTENDEITERING, Marcel; HOWAR, Falk. DERM: A Reference Model for Data Engineering. In: International Conference on Data Science, Technology and Applications (DATA). 2021.

ALTENDEITERING, Marcel; DÜBLER, Stephan. Scalable Detection of Concept Drift: A Learning Technique Based on Support Vector Machines. Procedia Manufacturing, 2020, 51. Jg., S. 400-407.

AMADORI, Antonello; ALTENDEITERING, Marcel; OTTO, Boris. Challenges of Data Management in Industry 4.0: A Single Case Study of the Material Retrieval Process. In: International Conference on Business Information Systems. Springer, Cham, 2020. S. 379-390.