Source – https://insidebigdata.com/
Scientists, researchers, and engineers are solving the world’s most important scientific, industrial, and big data challenges with AI and high-performance computing (HPC). Businesses, even entire industries, harness the power of AI to extract new insights from massive data sets, both on-premises and in the cloud. NVIDIA Ampere architecture-based products, like the NVIDIA A100 or the NVIDIA RTX A6000, designed for the age of elastic computing, deliver the next giant leap by providing unmatched acceleration at every scale, enabling innovators to push the boundaries of human knowledge and creativity forward.
The NVIDIA Ampere architecture-based products implements ground-breaking innovations. Third-generation Tensor Cores deliver dramatic speedups to AI, reducing training times from weeks to hours and providing massive inference acceleration. Two new precisions – Tensor Float (TF32) and Floating Point 64 (FP64, NVIDIA A100 only) accelerates AI adoption and extends the power of Tensor Cores to HPC.
TF32 works just like FP32 while delivering speedups of up to 10x for AI without requiring any code changes when utilizing sparsity. Automatic mixed precision and FP16 can be invoked for performance optimization by adding just a couple of lines of code. With support for bfloat16, INT8, and INT4, NVIDIA’s third generation Tensor Cores are an incredibly versatile accelerator for AI training and inference. By bringing the power of Tensor Cores to HPC, the NVIDIA A100 enables matrix operations in up to full, IEEE-certified, FP64 precision.
Every AI, data science, and HPC application can benefit from acceleration, but not every application needs the performance of a full Ampere architecture-based GPU. With Multi-Instance GPU (MIG), supported by the A100, the GPU can be partitioned into up to seven GPU instances, fully isolated and secured at the hardware level with their own high-bandwidth memory, cache, and compute cores. This brings breakthrough acceleration to all applications, big and small, and delivers guaranteed quality of service. IT administrators can offer right-sized GPU acceleration for optimal utilization and expand access to every user and application across bare-metal and virtualized environments.
The A100 SXM4 configuration with 40 GB of GPU memory brings massive amounts of compute performance to data centers. To keep these compute engines fully utilized the DGX A100 provides class leading 1.6 terabytes per second (TB/sec) of memory bandwidth, a 67 percent increase over the previous generation. The A100 also has significantly more on-chip memory, including a 40 megabyte (MB) level 2 cache – 7x larger than the previous generation – to maximize compute performance. The PCIe board version retains the 40 GB of HBM2 GPU memory, with a memory bus width of 5120 bits and a peak memory bandwidth of up to 1555 GB/sec, easily taking the performance crown from the prior generation Tesla V100.
Scaling applications across multiple GPUs requires extremely fast movement of data. Third generation NVIDIA NVLink in the A100 SXM4 doubles the GPU-to-GPU direct bandwidth to 600 gigabytes per second (GB/sec), almost 10x higher than PCIe Gen 4. The PCIe 4.0 A100 implementation also features a total maximum NVLink bandwidth of 600 GB/sec. NVIDIA DGX A100 servers can take advantage of NVLink and NVSwitch technology via NVIDIA HGX A100 baseboards to deliver greater scalability for HPC and AI workloads. For those who prefer to deploy PCIe motherboards the NVIDIA A100 PCIe option fully supports NVLink.
Contemporary AI networks are big and getting bigger, with millions and in some cases billions of parameters. Not all of these are necessary for accurate predictions and inference, and some can be converted to zeros to make models “sparse” without compromising accuracy. Ampere architecture-based Tensor Cores in the NVIDIA A100 or RTX A6000 provide up to 10x higher performance for sparse models. While the sparsity feature more readily benefits AI inference, it can also be used to improve the performance of model training.
NVIDIA Ampere architecture-based second-generation RT Cores in the NVIDIA RTX A6000 and NVIDIA A40 GPUs deliver massive speedups for big data analytics, data science, AI, and HPC use cases where seeing (visualizing) the problem is essential to solving the problem. RT Cores enable real-time ray tracing for photorealistic results and work synergistically with Tensor Cores to deliver AI denoising and other productivity enhancing features.