A Blog by Jonathan Low

 

May 14, 2020

3rd Gen AI With 5 Petaflops of Computing Power Helps National Lab Study Covid

The partnering of AI systems with the massive computing power of US national laboratories will help understand the spread of the virus and identify treatments more quickly. JL

Matt Hamblen reports in FierceElectronics:

A third-generation artificial intelligence (AI) system with 5 petaflops of compute power  is being used by Argonne National Lab to study the spread of the COVID-19 virus and to explore treatments and vaccines. A single rack of  the system could replace an entire data center doing AI training and inference (for) less than 10% of the cost.  The system uses 4% of the space of the data center and just 5% of the electricity. The new systems will help researchers at Argonne do “years’ worth of AI accelerated work in months or days.”
Nvidia announced a third-generation artificial intelligence (AI) system on Thursday with 5 petaflops of compute power that is initially being used by Argonne National Lab to study the spread of the COVID-19 virus and to explore treatments and vaccines.
The performance of the new DGX A100 systems will help researchers at Argonne do “years’ worth of AI accelerated work in months or days,” said Rick Stevens, associate lab director for computing at Argonne, in a statement. The first DGX A100s were delivered to Argonne in early May.
Also, the University of Florida will receive DGX A100 systems to use across its entire curriculum, according to a statement from university president Kent Fuchs. Other early adopters include the Center for Biomedical AI in Germany, Chulalongkorn University in Thailand, Element AI in Montreal and several others.
Nvidia said there are already thousands of previous generation DGX systems in use around the globe that are used for autonomic vehicle AI, natural speech research and recommendation AI that is used by retailers and online search engines.
The DGX A100 has eight A100 Tensor Core GPUs that provide 5 petaflops of power with 320 GB in total GPU memory running at 12.4 Terabytes per second in bandwidth.
Nvidia DGX A100 exploded
 There are also six Nvidia NVSwitch interconnect fabrics and nine Nvidia Mellanox ConnectX-6 network interfaces for a total of 3.6 terabytes per second of bandwidth.  Nvidia completed the purchase of Mellanox in April for $7 billion. Nvidia uses the Mellanox in-network acceleration engines for high performance in the DGX A100 as well.
Nvidia also provides the software for AI and data science workloads.
On a briefing with reporters, CEO Jensen Huang noted that the DGX is the first system to provide all the elements of machine learning, from data analytics to training to inference work.  Many small workloads can be accelerated by partitioning the DGX A100 into as many as 56 computing instances to allow enterprises and researchers to conduct different AI functions on demand for diverse workloads.
Huang said the entire system starts at $199,000, noting that Dell Technologies, IBM and three other storage providers plan to integrate the DGX A100 into their products.  
 Huang illustrated how a single rack of DGX A100 systems costing $1 million could replace an entire data center doing AI training and inference costing $11 million, or less than 10% of the cost.  The DGX A100s would also use just 4% of the space of the data center and just 5% of the electricity.
“This is unquestionably the first time that the unified workload of an entire data center has been brought into one rack for video analytics, voice, and data processing,” Huang said.  “The amount of savings is actually off the charts.”

0 comments:

Post a Comment