Skip to content

RHOAI Metrics Dashboard for Single Serving Models

Enable RHOAI User Workload Metrics for Single Serving Models and deploy the Grafana Metrics Dashboard to monitor the performance of your Single Serving Models and the resources they consume.

Overview

Enabling RHOAI User Workload Metrics for Single Serving Models and deploying a Grafana Metrics Dashboard provides valuable insights into the performance and resource usage of your Single Model Serving instances.

By monitoring these metrics, you can identify bottlenecks, optimize resource allocation, and ensure efficient infrastructure utilization. This enables data-driven decisions to improve the overall performance and scalability of your AI applications.

Prerequisites

Installation

To enable RHOAI User Workload Metrics for Single Serving Models and deploy the Grafana Metrics Dashboard, perform the following steps:

Configure Monitoring for the Single Model Serving Platform

To configure monitoring for the Single Model Serving Platform, refer to the official documentation. The Single Model Serving Platform includes metrics for supported runtimes of the KServe component. KServe relies on the underlying model-serving runtimes to provide metrics and does not generate its own. The available metrics for a deployed model depend on its model-serving runtime.

Additionally, you can configure monitoring for OpenShift Service Mesh to understand dependencies and traffic flow between components in the mesh.

Once monitoring is configured for the Single Model Serving Platform, you can view the metrics in the OpenShift Web Console under the Observe Dashboards section.

Configure GPU Monitoring Dashboard

To configure the GPU Monitoring Dashboard, refer to the official documentation. The GPU Monitoring Dashboard provides a comprehensive view of GPU utilization, memory usage, and other metrics for your GPU nodes.

The GPU Operator exposes GPU telemetry for Prometheus using the NVIDIA DCGM Exporter. These metrics can be visualized in the OpenShift Web Console under the Observe Dashboards section, specifically in the NVIDIA DCGM Exporter Dashboard.

Note: This step is optional but very useful for monitoring the GPU resources consumed by your Single Serving Models. If you do not enable this step, the Grafana Dashboard will not display GPU metrics.

Install the RHOAI Metrics Grafana and Dashboards for Single Serving Models

To install the RHOAI Metrics Grafana Dashboards for Single Serving Models (for both vLLM and OpenVino), refer to the RHOAI UWM repository. The Grafana Dashboard provides a comprehensive view of the performance and resource utilization of Single Serving Models.

kubectl apply -k overlays/grafana-uwm-user-app

The RHOAI UWM Grafana Dashboard will deploy a Grafana instance with pre-configured dashboards for monitoring the performance of your Single Serving Models using the Grafana Operator.

The following dashboards are currently available:

  • vLLM Model Metrics Dashboard: Provides Model metrics for vLLM Single Serving Models dashboard.

vLLM Dashboard 1

  • vLLM Service Performance Dashboard: Provides Service Performance metrics for vLLM Single Serving Models dashboard.

vLLM Dashboard 2

  • OpenVino Service Model Metrics Dashboard: Provides metrics for OpenVino Single Serving Models

vLLM Dashboard 4

  • OpenVino Model Metrics Dashboard: Provides Service Performance metrics for OpenVino Single Serving Models.

vLLM Dashboard 3

Conclusion

By turning on RHOAI User Workload Metrics and setting up the Grafana Dashboard, you can easily track how your Single Serving Models are doing and what resources they're using. It helps you find problems, tweak resource use, and make better choices to keep your AI apps running smoothly.