AMD Cloud Platform Makes It Easy To Try Out AMD's Latest CPUs, Accelerators & ROCm Software
Last week at Intel's Innovation conference the Intel Developer Cloud "DevCloud" was announced, while on the AMD side there is already something similar: the AMD Cloud Platform. At the tail end of 2021, AMD announced the Accelerator Cloud as a way for trying out the latest EPYC CPUs and Instinct accelerators complete with a pre-configured ROCm compute software stack. The AMD Cloud Platform is a currently parallel effort to the Accelerator Cloud with the former intended more for developers while the latter is more customer-oriented. After trying out the AMD Cloud Platform, it's indeed an easy way to evaluate the latest AMD data center wares while having a easy-to-deploy, pre-configured software environment.
The AMD Cloud Platform (or the similar AMD Accelerator Cloud) has long been on my TODO list with having access to it for testing, albeit my TODO list is perpetually long and then Intel's DevCloud announcement reminded me of this long overdue evaluation. While Intel's Developer Cloud is focused on trying pre-production hardware like the upcoming Sapphire Rapids, Data Center GPUs, and other products, in the case of the AMD Cloud Platform right now it's about released but current hardware: AMD EPYC "Milan" CPUs and an assortment of different AMD Instinct accelerators being the current offerings.
Right now the AMD Accelerator Cloud and AMD Cloud Platform are two similar AMD clouds at the company albeit managed by different groups but hopefully with time they will merge into a single cloud... It seems a bit odd that AMD has two different but similar clouds with completely different branding and many people likely have never heard of either. The AMD Cloud Platform is intended more for developer testing where as the AMD Accelerator Cloud was established with marketing/customers in mind. I'm sure with time we will see this expanded to also include Xilinx products and the like. We'll see if they end up offering any pre-production hardware access too in the future as a nod from Intel's Developer Cloud.
Compared to public cloud providers, the AMD Cloud Platform offers a lot of documentation on running different applications/workloads to ensure you are properly leveraging the Instinct accelerators.
AMD's Cloud Platform is currently scattered across AMD's facilities in Frankfurt and Munich.
For the purposes of this testing, AMD kindly provided free access to their ACP cloud.
The AMD Cloud Platform is very structured and from the web-based interface makes it very easy to run a number of different and common GPU compute focused workloads like PyTorch, DeepSpeed, MLFlow, and various benchmarks from HPL-AI to MLPerf to others. AMD says they continue adding more benchmarks and HPC workloads currently available from the AMD Infinity Hub over to the AMD Cloud Platform as well.
With these different stock applications is still the ability to seed custom inputs/models and other configuration changes to cater the evaluation to your needs.
The AMD Cloud Platform also offers an interactive SSH session for those just wanting to remotely connect to an AMD node to run your own custom workload or explore. With that interactive session via SSH, the ROCm compute stack comes pre-configured as well just like all the other application instances -- making things very quick and convenient if wanting to evaluate the AMD software/hardware support without first investing in the hardware and also the time spent setting up the software stack.
The AMD Cloud Platform also allows specifying the number of nodes, the number of GPUs up to 16 (but 8 is the current limit), and any other application-specific configuration changes.
Overall this was a nice and efficient way to try out the AMD Instinct accelerators in particular with a pre-configured software stack. Via the various public clouds it's already very easy if just wanting to evaluate the AMD EPYC Milan/Milan-X CPU performance while with the AMD Cloud Platform it allows trying Instinct accelerators up through the MI250 with up to 8 GPUs (the web interface has a knob for 16 GPUs, but I am told they do not yet have servers with 16 GPUs yet) while most importantly having the pre-configured software stack ready to go. The structure of the AMD Cloud Platform avoids having to setup ROCm manually and from the web-based interface allows evaluating a variety of common AI/ML-focused workloads or having your own interactive SSH session for more custom evaluations. Hopefully this is extended in the future to include AMD-Xilinx products. If not already selectively done so, it would also be nice to have the possibility of - taking a nod from the Intel DevCloud - using it to allow for more pre-production evaluation/enablement work to happen with AMD partners and customers.
If you enjoyed this article consider joining Phoronix Premium to view this site ad-free, multi-page articles on a single page, and other benefits. PayPal or Stripe tips are also graciously accepted. Thanks for your support.