Our scope of services
Data quality is a core component of modern data processing and a prerequisite for automated decision making. High data quality improves the accuracy of the data as a digital representation of reality and ensures an optimal data foundation for the efficient use of machine learning and artificial intelligence.
Various solutions are being developed at Fraunhofer ISST with the goal “Augmented Data Quality”. These offer companies intelligent support in order to be able to fall back on a high-quality database in the long term. Specifically, Fraunhofer ISST helps with the definition, measurement and improvement of data quality. To do this, we combine various technologies and algorithms from the areas of data profiling, data cleaning, data validation and data orchestration in order to enable a holistic view of the data quality in the data life cycle as part of the “DataOps”. Existing quality problems are eliminated and the entry of new errors is significantly reduced through automated data validation. In this way, for example, the effort of data pre-processing in data science projects can be reduced, a possible data bias in AI projects can be avoided or the informative value of evaluations can be consolidated as a basis for decision-making.
Figure: Data quality control in data lake architectures
The services offered in the area of data quality include both the requirements analysis and gap analysis for identification of potential improvements, and architecture and process development extending to implementation of prototypes for data quality optimization.
- Automated derivation of metadata from relational databases through descriptive statistics, correlation analyzes, functional dependencies or cluster analyzes.
- Automated derivation of metadata from non-relational databases using dynamic topic models (and related methods of neuro-linguistic programming), the detection of concept drifts, the detection of outliers with isolation forest algorithms and artificial intelligence.
- Storage and management of the metadata in a central, microservice-oriented data catalog.
- Describing, managing and orchestrating data engineering processes.
Data cleaning and validation
- Support in the detection of data errors by identifying duplicates, outliers, format violations or rule violations.
- Enabling automatic data validation through data quality rules based on association analyzes.
- Management of the identified errors in a corresponding tool and integration through open interfaces (APIs).
Data quality management
- Further development of existing data engineering processes through the integrative consideration of data quality.
- Integration of data quality as a component in modern system architectures (such as data lakes).
- Development of quality metrics for different data sets and application areas.
- DIVA – Data Catalog
- Automated data quality mining
Data quality contributes to solving demanding challenges in various industries. Whether as a tool for managing urban data, as an option for quality recording of continuous data streams in production environments, as a tool for testing clinical studies in the pharmaceutical sector or as an option for optimizing data migrations, the data quality area of expertise offers solutions for a wide range of applications.