Our scope of services
Data Science enables the extraction of knowledge and value from data. Not only can new insights be gained and decision-making processes supported, but existing processes can also be optimized and new innovative applications developed.
At Fraunhofer ISST, various Data Science solutions are developed. In this context, we research and develop AI or ML pipelines (i.e. chained processing steps) for the fields of healthcare, logistics and data science. Depending on the use case, these can be based on different data sources such as biosignals (e.g. measurements using ECG or 3D acceleration sensors), audio, images, videos, texts or on a combination of several data sources. We support our partners along the entire pipeline, from the pre-processing of (raw) data to the selection and training of suitable models as well as their evaluation based on application-specific performance criteria. A special focus is also on the definition, measurement and improvement of data quality. For this purpose, we combine various technologies and algorithms from the areas of data profiling, data cleaning, data validation, and data orchestration to enable a holistic view of data quality in the data lifecycle as part of "DataOps".
Figure: Data quality control in data lake architectures
The range of services in the Data Science competence area includes requirements elicitation and gap analysis to identify potential for improvement as well as architecture and process development up to the realization of prototypes for extracting knowledge and value from existing data or data to be collected.
Training and evaluation of Machine Learning (ML) models
- Design of ML-based applications.
- Feature computation using biosignal data (e.g., 3D acceleration, ECG, audio) from the time and frequency domains.
- Selection from different learning approaches, e.g. classical classification methods, deep learning, association analysis, clustering.
- Hyperparameter optimization, evaluation using application-specific performance metrics.
Data Profiling
- Automated derivation of metadata from relational datasets using descriptive statistics, correlation analysis, functional dependencies or cluster analysis.
- Automated inference of metadata from non-relational datasets using Dynamic Topic Models (and related Neuro-Linguistic Programming techniques), concept drift detection, outlier detection with Isolation Forest algorithms, and Artificial Intelligence (AI).
- Store and manage metadata in a centralized, microservice-oriented Data Catalog.
- Describe, manage, and orchestrate data engineering processes.
Data Cleaning and Validation
- Assist in the detection of data errors by identifying duplicates, outliers, format violations, or rule violations.
- Enable automatic data validation through data quality rules based on association analysis.
- Management of identified errors in a corresponding tool and integration through open interfaces (APIs).
Data Quality Management
- Further development of existing data engineering processes by integrative consideration of data quality.
- Integration of data quality as a component in modern system architectures (such as Data Lakes).
- Development of quality metrics for different data sets and application areas.
Available software/applications
Industries
Data Science contributes to solving demanding challenges in various industries. Whether for urban data management, automated quality control in logistics, disease diagnosis, clinical trial testing in pharmaceuticals, or extracting information from documents, the possibilities are limited only by the availability of data.