FEC-ARRC

Optimization of the Fraunhofer Edge Cloud with automatic recommendations for resource configuration

ARRC optimizes the Fraunhofer Edge Cloud through seamless OpenStack integration, explainable AI, and automated rightsizing via GitOps. In the FEC-ARRC project, we are integrating the Automatic Recommender for Resource Configuration, or ARRC for short, into the OpenStack ecosystem of the Fraunhofer Edge Cloud (FEC). The FEC is a productively operated reference system that shows companies how edge and cloud platforms work in everyday life. ARRC uses explainable artificial intelligence, time series analysis, and multi-agent game theory methods. This enables the system to continuously and decentralizedly evaluate the utilization and configuration of the platform, make rule-compliant recommendations on the correct size of resources, and automatically implement approved changes using the GitOps operating model.

The challenge

Without a leading, policy-compliant system, oversizing, “zombie” and idle resources, and hidden bottlenecks can occur, leading to 100 percent utilization. The FEC is used institution-wide in self-service as a productive demonstrator. This means that it serves as an example system that companies can use to test and understand how edge and cloud platforms are set up and operated in a practical manner. In practice, varying levels of prior knowledge lead to oversized virtual machines, incorrect limits and quotas, and resources that continue to run unused or are available but currently do nothing. We refer to unused resources as “zombie” instances. We refer to resources that are available but currently inactive as idle instances. In addition, there are bottlenecks that are difficult to identify in everyday use. At Fraunhofer IPT, utilization is sometimes close to 100%, which creates capacity bottlenecks, deadline risks, and high operating costs. What is needed, therefore, is a transparent and policy-compliant system that guides users, supports governance, FinOps, and sustainability, methodically secures decisions, and makes the complexity of OpenStack manageable. By methodical security, we mean operations research, i.e., the targeted use of mathematical optimization to prepare robust decisions among multiple objectives and constraints.

 

Our contribution

ARRC provides prioritized rightsizing and shutdown recommendations based on explainable AI, embedded in guidelines and automatically rolled out with GitOps. We adapt ARRC to the OpenStack of the Fraunhofer Edge Cloud and integrate historical and current monitoring data. This data is used to generate easy-to-understand, prioritized recommendations for the correct size of resources and for shutting down unused workloads. We mirror the recommendations as issues in GitLab or Jira so that teams can review them directly. Approved changes are then automatically rolled out via GitOps. Explainable artificial intelligence explains which characteristics influenced the recommendation. At the same time, clear guidelines ensure that limits are adhered to. These include service level agreements, budget requirements, and security requirements. Operations research uses this information to create action plans that comply with capacity and policy requirements. It uses methods of integer and multi-objective optimization as well as fixed constraints. The result is a plan that makes efficient use of capacity and enables a sensibly balanced oversubscription.

 

The result

ARRC has been proven to increase available resources, reduce costs and energy consumption, and create capacity for new projects. In the proof of concept at Fraunhofer IPT, i.e., in a practical feasibility study, we were able to significantly increase available resources through freeing up space and targeted rightsizing. Up to 363 percent additional CPUs and up to 336 percent additional RAM were available. ARRC has achieved Technology Readiness Level 6. This means that the technology has been successfully demonstrated in a relevant environment. The solution simplifies operational management, reduces zombie and idle resources, and, through productive operation as a reference system, strengthens the transfer to standard enterprise edge and cloud platforms.

 

The partners

  • Fraunhofer IPT
    Contact: Dr.-Ing. Mario Pothen, Business Unit Digitalization and Networking