Introducing the Interim National HPC Service

The Kay platform which supported the National HPC Service reached the end of its 5 year service life in November 2023. Because investment for the replacement CASPIr platform has been delayed and the operational lifespan of Kay could not be further extended for a number of contractual and technical reasons, an interim solution was required to bridge the gap until the CASPIr system can be deployed.

This page describes the nature of the service being put in place to enable researchers maintain access to HPC resources as well as the high level migration plan in operation to migrate researchers to the new system.

 

New Platform for Interim National HPC Service

ICHEC first presented its plan for the provision of compute sources from foreign sites (to be procured on a commercial basis) at its Board meeting of September 2022. These arrangements were deemed essential to ensure continuity of service to the research community.

After submitting a funding proposal to DFHERIS in order to purchase HPC compute services in May 2023, a tender was published in July seeking compute resources with a similar environment and user interface as that provided by Kay. Subsequently this contract was awarded in November to LuxProvide and their Meluxina HPC platform. 

All National HPC Service projects will eventually be hosted on this platform with a gradual migration of existing projects from Kay to it occurring through the first half of 2024. The migration of Class A projects was initiated in mid-December, with Class B to follow in January 2024 and Class C beginning in February 2024.  Kay will continue to run on an at-risk basis until June 2024 with a reduced number of compute nodes available. Here, at-risk means that because there is no longer any warranty or technical support covering the system by the hardware and software vendors, certain hardware component or software failures could potentially result in a sudden and permanent loss of service (as well as potentially loss of all data stored on the system).

Some notable benefits of the Meluxina platform include:

  1. Newer, larger platform with significantly faster CPU processors as well as newer and more numerous GPU resources (4 x NVidia A100 per node).
  2. Ability to run larger scale simulations as well as Big Data and AI workloads.
  3. Familiar ssh based login plus Slurm and environment modules for workload management enables easy migration from Kay.
  4. Pre-installed software packages for common scientific applications.

 

Migration Timeline

At a high level, the following are the main phases and considerations involved in migrating projects from Kay to Meluxina:

  • All technical support queries and requests continue to operate through the existing ICHEC Helpdesk 
  • The migration of Class A projects began in Dec 2023.
  • Existing Class B projects will begin migration in Jan 2024.
  • Existing Class C projects will begin migration in Feb 2024.
  • The Principal Investigator of each project will be contacted to arrange project migration.
  • No data will be migrated by ICHEC staff, it is the responsibility of users to copy any important data off Kay.
  • New project applications will be paused from Feb 1st 2024 for a period of 2 months to enable the migration of existing projects.

 

Supported By

File Browser Reference
Department FHERIS
University of Galway
HEA Logo