Introduction

HPE Performance Cluster Manager (HPCM) is a high-performance computing (HPC) cluster management software developed by Hewlett Packard Enterprise (HPE). It is designed to manage, monitor, and optimize large-scale computing clusters used in scientific research, artificial intelligence (AI), machine learning (ML), and enterprise HPC workloads. It provides centralized management, allowing administrators to deploy, monitor, and maintain thousands of nodes efficiently while ensuring high availability and performance.

Here are the key discovery components typically found in an HPE Performance Cluster Manager (HPCM):

  1. Nodes: Automatically detects compute nodes, management nodes, and storage nodes in the cluster.
  2. Network: Identifies and configures network interfaces, switches, and fabric topology.
  3. Hardware: Scans for CPU, memory, disk, GPUs, and other hardware components in each node.
  4. Firmware & BIOS: Checks firmware versions and BIOS settings across all nodes.
  5. Groups: HPCM organizes system components into multiple groups to simplify management, monitoring, and operations across the cluster. These groups help administrators efficiently manage nodes, resources, and workloads.

Here is the highlevel architecture of HPE Performance Cluster Manager (HPCM):

Supported Target version

HPE Performance Cluster Manager (HPCM) v1.12

Resource Hierarchy of HPCM

HPE Performance Cluster Manager HPC Nodes HPC NICs

Key Use cases

  • Automates cluster provisioning & management – Reduces complexity in deploying and scaling HPC systems.
  • Optimizes resource allocation – Ensures maximum performance for CPU, GPU, and memory-intensive workloads.
  • Enhances monitoring & predictive maintenance – Prevents system failures with real-time health tracking.
  • Supports hybrid cloud environments – Enables seamless on-premise + cloud HPC integration.

Discovery Use cases

The device discovery enables the customer with an unified way of showing all of the elements making up a HPCM - Cluster manager along with the relationships.

Monitoring Use cases

The device monitoring helps to collect the metric values with respect to time and sends alert to the intended customer team to act up on immediately in case of any threshold breach or unexpected metric behaviour observed based on configurations. In a way it helps the customer with smooth functioning of business with minimal or zero downtime in case of any infrastructure related issues occurring.

  • Provides metrics related to job scheduling time and status etc..

  • Concern alerts will be generated for each metric to notify the administrator regarding the issue with the resource

Version History

Application VersionBug fixes / Enhancements
1.0.0Initial Version with discovery and monitoring features.