Google Cloud has announced general availability of G4 virtual machines in a significant expansion of its GPU offerings with NVIDIA Corp.’s RTX PRO 6000 Blackwell Server Edition GPUs.

The new instances are available across more Google Cloud regions, targeting applications with strict latency or regulatory requirements.

The G4 VMs represent a substantial performance leap, delivering up to nine times the throughput of previous G2 instances, according to Google and NVIDIA. The enhancement positions the platform for demanding workloads including multimodal AI inference, photorealistic design, and robotics simulation. Available configurations range from one to eight GPUs, with fractional GPU options planned for future release.

The platform’s technical specifications can handle up to 768 GB of GDDR7 memory, NVIDIA Tensor Cores, and fourth-generation Ray Tracing cores. These features enable the G4 to handle large language models (LLMs) ranging from less than 30 billion to more than 100 billion parameters through advanced quantization techniques and multi-GPU configurations.

A key innovation is Multi-Instance GPU support, which allows a single GPU to be partitioned into up to four isolated instances. Each partition receives dedicated memory, compute cores, and media engines, maximizing price-performance by running multiple workloads concurrently with guaranteed resource isolation, Google said.

Google has also made NVIDIA Omniverse available as a virtual machine image on its Cloud Marketplace. When paired with G4 instances, this integration facilitates the development of industrial digital twins and physical AI simulations, supporting scenarios with billions of cells in complex environments.

Google has implemented an enhanced PCIe-based peer-to-peer data path that significantly boosts multi-GPU performance. The company reports up to 168% throughput gains and 41% lower inter-token latency when using tensor parallelism for model serving compared to standard offerings.

The G4 VMs integrate seamlessly with Google’s existing cloud services, including Google Kubernetes Engine, Vertex AI, Dataproc, and Cloud Run. This integration extends GPU capabilities to serverless platforms, enabling real-time AI inference with pay-per-use pricing.

For storage needs beyond local capacity, G4 instances can connect to Hyperdisk ML for low-latency operations, Managed Lustre for high-performance file storage, or Cloud Storage for globally scalable capacity. The addition of G4 VMs complements Google’s existing A-series VMs and cost-efficient G2 instances, providing customers with a comprehensive GPU portfolio for diverse computational needs.

TECHSTRONG TV

Click full-screen to enable volume control
Watch latest episodes and shows

Tech Field Day Events

SHARE THIS STORY