The Art & Science of Scaling HPC

On premise or in the cloud, the ability to scale is one of the most important considerations in HPC. Too often scaling is an afterthought – but smart HPC users know how to factor scaling – temporary and ongoing – into their plans so that there are no interruptions, no downtime and no surprises.

 

When you need to upgrade a system – can it be done without completely upending legacy systems? Are there project deadlines or peak usage times that require you to get temporary additional capacity – and how well does that dovetail with your systems? Do you foresee growing your team soon, or an increase in the complexity of the problems you are trying to solve? Are specific results required quicker? Is one part of the process holding up others. Do you require greater design exploration to secure that all important competitive advantage?

Come to think of it – many HPC questions are really questions about how to scale HPC. Whether you use an on premise, in the cloud or hybrid system – the ability of your system to scale seamlessly, efficiently and cost-effectively makes all the difference.

It may not be as complicated as it sounds. Scaling essentially requires two different components: the hardware to run the simulations, and the software licenses. Scaling HPC is an exercise of increasing both components in tandem.

Hardware scaling

Temporary scaling is easy enough, and it can be achieved through the use of cloud-based infrastructure. The key is to set up cloud systems that work with your usual infrastructure. For example, you’ll need to ensure that a job scheduler is in place and that the system is configured to ensure the system is aware of temporary expansion. Careful consideration will be needed to paid to the movement of data.

Virtual machines can be setup for the time they are needed, and then removed once complete. This approach applies if you have on premise hardware or a cloud-based system.

Temporary bursting is where the cloud excels! Having your entire infrastructure on the cloud however, can get very expensive very quickly. Certain requirements for data security may also prevent a completely cloud-based system.

Growing with your company, CrunchYard’s Office HPC Systems can be scaled by adding more systems, as they are modular. This enables you to start with a small system that you can grow as required. New computer servers do not even have to be the same as the original servers as the scheduler is setup to be aware of the differences and optimise the workload accordingly. This allows for an easy to use, easy to scale, cost effective plug and play approach.

 Software scaling

Software providers use a number of different license schemes – complicating how they can be scaled somewhat.  Each scheme scales differently.

The first model is a license-per-core approach, where there are licenses per core. The more CPU cores you have, the more licenses you need. This is a linear relationship.

Other license schemes are unlimited with respect to the number of cores, but are time limited. For example, you may be able to run on unlimited number of cores, but only for 10 hours. This applies to STARCCM+ POD license approach.

Licenses are actually the most expensive component of scaling – and need to be carefully optimised so hardware can be used as optimally as possible.

When it comes to scaling, no one size approach truly fits all in traditional HPC (not counting our Office HPC solutions). Many of our clients have legacy systems that need to be taken into account alongside their plans for expansion, migration and optimisation. The team at CrunchYard is experienced in helping organisations of all sizes understand the various pro’s and con’s, and architecting solutions that enable them to scale optimally.

If you have more complex scaling problems you need help with, get in touch and we can help you grow where you need to.

info@crunchyard.com

Comments
Comments are closed