Jensen Huang.
https://www.nvidia.com/en-us/on-demand/session/supercomputing2024-keynote/ Supercomputers are among humanity's most vital instruments, driving scientific breakthroughs and expanding the frontiers of knowledge.
At NVIDIA, our journey has been profoundly shaped by our work on supercomputers.
In 2006, we announced CUDA and launched the world's first GPU for scientific computing.
Japan's Tokyo Tech's Tsubame, in 2008, was the world's first NVIDIA-accelerated supercomputer.
Four years later, NVIDIA powered the world's fastest supercomputer, the Oak Ridge National Lab's Titan.
Then, in 2016, NVIDIA introduced the first AI supercomputer, DGX-1, which I hand-delivered to OpenAI.
01:39
From the world's first GPU for supercomputers to now building AI supercomputers for the world, our journey in supercomputing over the past 18 years has shaped the NVIDIA of today, since CUDA's inception.
NVIDIA has driven down the cost of computing by a million-fold.
For some, NVIDIA is a computational microscope, allowing them to see the impossibly small.
For others, it's a telescope, exploring the unimaginably distant.
And for many, it's a time machine, letting them do their life's work within their lifetime.
02:19
NVIDIA CUDA has become one of the very few ubiquitous computing platforms.
But the real stars are the CUDAx libraries.
They're the engines of accelerated computing.Just as OpenGL is the API that connects computer graphics to accelerators, CUDAx are domain-specific libraries that connect new applications to NVIDIA acceleration. CUDAx opens new markets and industries to NVIDIA, from
0healthcare and telecommunication to manufacturing and transportation.
02:54
In chip manufacturing, cuLitho accelerates computational lithography.
03:00
In telecommunications, Arial processes wireless radio and CUDA.
And in healthcare and genomics, Parabricks accelerates gene sequence alignment and variant calling.
For data science and analytics, cuDF supercharges data processing by accelerating popular libraries like SQL, Pandas, Polars, and Spark.
03:23
KU-VS accelerates vector database indexing and retrieval, central to building AI agents. NVIDIA KU Quantum Library performs quantum circuit simulations on CUDA.
Omniverse is a suite of libraries that realize and operate digital twins for robotics, manufacturing, and logistics.
In Jan 2025, we announced a major new library, CuPyNumeric, a GPU-accelerated implementation of NumPy, the most widely used library for data science, machine learning, and numerical computing.
With over 400 CUDAx libraries, NVIDIA accelerates important applications in nearly every field of science and industry fueling a virtuous cycle of increasing GPU adoption increasing ecosystem partners, and increasing developers.
One of our most impactful libraries is cuDNN, which processes deep learning and neural network operations. QDNN accelerates deep learning frameworks, enabling an incredible one-million-fold scaling of large language models over 6
the past decade and led to the creation of ChatGPT.
04:41
AI has arrived, and a new computing era has begun.
Every layer of the computing stack has been reinvented from coding software with rules and logic to machine learning of patterns and relationships. From code that runs on CPUs to neural network processed by GPUs,. AI emerged from our innovations in scientific computing. And now, science is leveraging AI to supercharge the scientific method.
I spoke about this fusion in my Supercomputing 2018 address. Since then, AI and machine learning have been integrated into nearly every field of science. AI is helping to analyze data at incredible scales, accelerate simulations, control experiments in real time, and build predictive
0 models These are examples that revolutionize fields from drug discovery, genomics, to quantum computing. Using AI, we can emulate physical processes at a scale previously computationally prohibitive. This transformative impact has been recognized at the highest levels. Jeffrey Hinton and John Hopfield received the Nobel Prize in Physics for their pioneering work on neural networks. Demis Hassabis, John Jumper, and David Baker received the Nobel Prize in Chemistry for groundbreaking advancements in protein prediction.
This is just the beginning. Scaling laws have shown predictable improvements in AI model performance as they scale with model size, data, and computing power. The industry's current trajectory scales computing power four-fold annually, projecting a million-fold increase over a decade. For comparison, Moore's Law achieved a hundred-fold increase per decade. These scaling laws apply not only to LLM training, but with the advent of OpenAI Strawberry, also to inference.
Over the next decade, we will accelerate our roadmap to keep pace with training and inference scaling demands, and to discover the next plateaus of intelligence. Nvidia's Blackwell chip means today's AI computers are unlike anything built before. Every stage of AI computing, from data processing to training to inference, challenges every component, from GPUs to memory to networking and switches.
The significant investment in AI factories make every detail crucial. Time to first train, reliability, utilization, power efficiency, token generation throughput, and responsiveness. NVIDIA embraces extreme co-design, optimizing every layer, from chips and systems to software and algorithms.
The Blackwell system integrates seven different types of chips. Each liquid-cooled rack is 120 kilowatts, 3,000 pounds, consists of 18 compute trays, nine NVLink switches, connecting 144 Blackwell chips, over two miles of NVLink copper cabling, into one giant virtual GPU with 1.44 AI exaflops. Blackwell is in full production. Taiwan's Foxconn is building new Blackwell production and testing facilities in the US, Mexico and Taiwan, using NVIDIA Omniverse to bring up the factories as fast as possible. 25 years after creating the first GPU, we have reinvented computing and sparked a new industrial revolution. An entirely new industry is emerging. AI factories producing digital intelligence, manufacturing AI at scale. AI will accelerate scientific discovery.
Researchers will have AI-powered assistance to generate and explore promising ideas. In business, AI will work alongside teams across every function—marketing, sales, supply chain, chip design, software development, and beyond. Eventually, every company will harness digital AI agents boosting productivity, fueling growth, and creating new jobs. And in the physical world, AI will soon power humanoid robots capable of adapting and performing a variety of tasks with minimal demonstration.
Manufacturing, logistics, and service industries stand 09:27 to benefit from an AI-powered productivity growth that will reshape the global economy. It's extraordinary to be on the edge of such transformation. 0We are thrilled to bring to life the incredible computers we're designing today and to see how AI and computing will revolutionize 0 each of the world's $100 trillion industries in the coming decade. | ...What is OpenAI Strawberry?Strawberry is an advanced AI model developed by OpenAI, designed to enhance reasoning capabilities beyond traditional large language models (LLMs). Unlike most LLMs trained via "outcome supervision," Strawberry uses process supervision, rewarding the model for correctly reasoning through each step of a problem rather than just producing the right answer. This approach enables Strawberry to solve complex logic and math problems, generate step-by-step solutions, and create high-quality synthetic training data for future models like OpenAI's Orion13. Strawberry's ability to "reason" makes it a game-changer for AI applications, including education, as it can simulate human-like planning and problem-solving. It also aims to reduce hallucinations in AI outputs by focusing on logical consistency17. How Inference Goes Beyond AI TrainingAI inference refers to the process where a trained model applies its learned knowledge to new, unseen data to make predictions or decisions. It represents the "execution phase" of AI, while training is the "learning phase." Here’s how inference extends beyond training: Real-Time Applications: Inference allows AI systems to operate in real-world scenarios, such as autonomous vehicles recognizing stop signs or industrial robots detecting defects on production lines46. For LLMs like Strawberry, inference enables reasoning and planning in response to user prompts, going beyond simple text generation.
Efficiency and Scalability: Inference optimizes pre-trained models for practical use, requiring less computational power compared to training large models from scratch. This makes advanced AI accessible for broader applications26.
Active Problem-Solving: Models like Strawberry demonstrate how inference can simulate "thinking" by performing background calculations and planning before outputting results, enabling more accurate and context-aware responses7.
Potential Impact on Education and Climate ResearchEducation: Strawberry's reasoning capabilities can support personalized learning tools that teach problem-solving skills step-by-step. It could also generate educational content tailored to specific curricula, enhancing advanced STEM education.
Climate Research: Nvidia’s Earth-2 platform (leveraging AI inference) and partnerships with AWS could complement Strawberry’s reasoning abilities by simulating high-resolution climate models. These tools align with the mission of institutions like the Stanford Doerr School of Sustainability, providing actionable insights for climate mitigation strategies and serving as interactive educational resources for students and policymakers136.
In summary, Strawberry exemplifies how advanced reasoning in AI can transform both education and research by enabling deeper problem-solving and practical applications through inference... |
09:49
Let's build the future together. 0This year at SC24, Professor Ari Kaufman, Distinguished Professor at Stony Brook University, is being honored with the Test of Time Award for his landmark 2004 paper, GPU Cluster for High-Performance Computing. Using fluid dynamics equations, Professor Kauffman simulated airborne contaminant dispersion in New York City's Times Square on the first large-scale GPU cluster.
1 His research laid the groundwork for today's accelerated computing, proving the power of GPUs for large-scale simulations. From everyone at NVIDIA, congratulations,
10:32
Professor Kauffman, on this well-deserved recognition.
10Your pioneering contributions exemplify the spirit of progress that drives the field forward. As a result of this groundbreaking work, accelerated computing has become the technology of choice for supercomputing. This chart shows the history of accelerated and unaccelerated supercomputers among the top 100 fastest systems in the world.

In the last five years, the number of accelerated systems has increased by eight systems per year and increased from 33% to over 70% of the top 100. Our goal is to help the world accelerate every workload. It is certainly ambitious, given the wide range of programming languages, breadth of developer experience and needs, and ongoing emerging algorithms and techniques.
To support our developer community and meet them where they are, today we offer over 450 libraries, many of which are tailored to specific domains and ever-evolving developer landscape. Take Python, for example. Warp is our Pythonic framework for building and accelerating data generation and spatial computing. Most physics-based simulators in HPC engage in some form of spatial computing, whether it in quantum chemistry, climate modeling, or fluid dynamics, all of which operate in 3D space. And by expressing these calculations in Warp, they can be automatically differentiated and integrated into AI workflows. One of the most popular Python libraries that
12:01
exists today is NumPy. NumPy is the foundation library for mathematical computing for Python developers. It's used by over 5 million scientific and industrial
1developers with 300 million downloads just last month. Over 32,000 GitHub applications use NumPy in important science domains like astronomy, physics, signal processing, and many others.
However, as scientists look to scale their applications to use large HPC clusters, they often need to use lower-level distributed computing libraries like OpenMPI. But what if you didn't? What if your NumPy program could just automatically parallelize across a GPU cluster without having to rewrite the Python program into a different supercomputing application? Now we're announcing CuPy Numeric, a drop-in replacement for the NumPy library. With CuPy Numeric, researchers can write in Python and easily scale their work without needing expertise in distributed computing. CuPy Numeric automatically distributes the data across the GPU cluster using standard NumPy data types powered by NVIDIA's latest communication and math libraries.
The results are incredible. The research team at SLAC is using CUPEye numeric to analyze terabytes of data from the LCLS X-ray laser, which fires 120 shots per second. During a 60-hour beam time, they achieved a 6x speedup in data analysis, allowing to make real-time decisions and uncover material properties. This acceleration has reduced their analysis time before publication from three years to only six months. Early adopters for CUPAI Numeric include Stanford University's Center for Turbulence Research, working on computational fluid dynamic solvers; Los Alamos National Laboratory, scaling ML algorithms for the Venato supercomputer, and the University of Massachusetts Boston, studying the entropy production rate in a microscopy imaging experiment.
One of the highest honors at supercomputing is the Gordon Bell Prize, recognizing exceptional achievements in high-performance computing. Today, we celebrate five finalist teams whose groundbreaking research leverages NVIDIA's accelerated systems across diverse fields, including molecular dynamics, protein design, genomics, and climate modeling. Giuseppe Barca and his team from the University of Canberra and Melbourne University scaled an alternative method for calculating atomic energies, achieving a 3x speedup over other GPU-accelerated methods and a super-linear scaling on multiple GPUs.
1We'll hear from Dr. David Keyes at KAUST about their pioneering work on genomic epistasis and climate emulation.
1 But first, let's turn to Dr. Arvind Ramanathan from Argonne National Laboratory to discuss their advancements in protein design.
One of the key tasks in protein design is to come up with
1novel proteins that have the same function but are different from what we have seen from over the course of four billion years of evolution. Experimental data happens to come at much slower paces than what you
1would expect from a computational workflow like a simulation. Routine design is quite a happening area in AI right now.
1 It's undergoing this huge revolution in terms of how AI models are being deployed and developed. This is one of the first attempts at building something that's a multi-modal. We actually have the descriptions that are provided in natural language and then you have the protein sequences that are represented and we basically use that to train a large language model that will enable us to interact with and get new designs. And one of the key things that we also learned from this paper was the fact that, yes, it is possible for us to stand up this workflow, not on just one platform, but across several different platforms at the same time. We happen to use almost all of NVIDIA's architecture from A100s
16:05
all the way to Grace Hopper chips. But one of the cool things that we observed was, in terms of pre-training this model, we could really achieve nearly three exa-flops in mixed precision runs on the system. Again, this sort of runs at the scale of, I think, half of the system was what we actually used. to achieve that sort of performance is really unbelievable.
One of the key benefits and motivations for accelerating workloads is to reduce energy consumption. Nowhere is this more understood than in supercomputing where the end of Moore's law was first predicted. Supercomputing centers like Oak Ridge National Labs, as far back as 2010, recognized that the energy for next-gen supercomputers built with CPUs will consume even more power than a major US city. This truth also applies to applications themselves. Even though an accelerated server may consume more power than a standard CPU system, the significant reduction in time to solution results in a huge reduction in the total energy needed to compute a solution. For example, Texas Advanced Computing Center and ANSYS achieved a 110x speedup and 6x better energy efficiency on a 2.5 billion cell problem using Grace Hopper. At NREL, H100 GPUs improved energy efficiency by 4x for a wind farm simulation. TSMC reduced energy use by 9x using CuLitho for semiconductor manufacturing. The University of Tokyo's Earthquake Research Institute, in partnership with JAMSTEC and RIKEN, also achieved an 86x speedup and 32x better energy efficiency for earthquake simulations
1using the EuroHPC ALPS system.
1We are continuously pushing for higher performance and efficiency in AI as well. Large language models like LAMA 3.1. require multiple GPUs working together for optimal performance.
1 To fully utilize these GPUs, our inference software stack provides optimized parallelism techniques relying on fast data transfers between GPUs. NVIDIA's NVSwitch technology equips Hopper with superior GPU-to-GPU throughput and, when integrated with our TRT-LLM
18:27
software stack, delivers continuous performance improvements. This ensures Hopper achieves higher performance and lower cost per token for models like 405B. In just two months, we've seen over a 1.6x improvement in performance thanks to innovations in speculative execution, NVLink communication, and specialized AI kernels. And we're not stopping with Hopper. We're actively innovating to harness the power of Blackwell to build next-generation AI factories. We're excited to collaborate and support customer successes within our growing solution ecosystem. Our partners offer a wide range of systems, from Hopper to Blackwell. The H200-NVL is specifically designed for air-cooled, flexible HPC solutions, featuring a 4-GPU NVLink domain in a standard PCIe form factor. We're also working with partners to bring Grace Blackwell configurations to market. These include the GB200 Grace Blackwell NVL4 Superchip, which integrates a 4GPU NVLink domain with a dual Grace CPU for
1 liquid-cooled scientific computing.
1The rollout of Blackwell solutions is progressing smoothly thanks to our reference architecture, enabling partners to quickly bring products to market while adding their own customizations. Our goal is to accelerate every workload to drive discovery and maximize energy efficiency.
1 This includes both accelerated and partially accelerated applications that can take advantage of tightly coupled CPU and GPU products like Grace Hopper and Grace Blackwell. However, not everything takes advantage of acceleration just yet. For this long tail, using the most energy efficient CPU in a power-constrained data center environment maximizes workload throughput. The GRACE CPU is purpose-built for high performance and energy efficiency. GRACE features 72-arm NeoVerse V2 cores and the NVIDIA Scalable Coherency Fabric, which delivers 3.2 terabytes a second of bandwidth, double that of a traditional CPU. Paired with LPDDR5X memory, it can achieve 500 gigabytes a second of memory bandwidth while consuming just 16 watts. That's one-fifth the power of conventional DDR memory. These innovations enable Grace to deliver up to 4x the performance for workloads like weather forecasting and geoscience when compared to x86 systems, making it an ideal solution for energy-efficient, high-performance CPU computing.
Earlier this year at Computex, Jensen unveiled our next-generation ARM-based CPU, VERA, set to debut in 2026. It will be available both as a standalone product and as a tightly integrated solution with the Rubin GPU. With a focus on data movement, our next-generation CPU fabric and NVLink chip-to-chip technologies are designed to maximize system performance. Bearer will be a versatile CPU capable of delivering performance and efficiency across a wide range of compute and memory intensive tasks. And it's not just about single-node computing. Networking plays a critical role in today's accelerated computing platforms. Traditional Ethernet was designed for enterprise data
centers, optimized for single-server workloads. NVIDIA's NVLink combined with InfiniBand or Spectrum
2 X Ethernet sets the gold standard for AI training
2 and inference data centers. This combination enables extreme scalability and peak performance. NVLink switch systems allow GPUs to scale up and communicate seamlessly as a unified whole.
For east-west compute fabrics, where fast data exchange between GPUs is critical, NVIDIA Quantum InfiniBand and Spectrum X Ethernet provide the low-latency, high-throughput infrastructure needed to scale beyond NVLink domains. For north-south traffic, Bluefield DPUs optimize data flow between the data center and the external networks, ensuring both efficiency and security. Together, these technologies create a powerful and resilient infrastructure for large-scale AI and HPC workloads. NVIDIA Quantum InfiniBand delivers unmatched high-speed
2 data transfer with minimal latency, essential for parallel processing and distributed computing. The Quantum X800 platform features a 144-port switch with 800 gigabits per port, powered by the ConnectX8 SuperNIC. Together, they support MPI and nickel offloads, enabling 14.4 teraflops of in-network computing with NVIDIA SHARP. Without SHARP, all reduced operations require repeated point-to-point transfers. SHARP can optimize this by performing data reductions directly within the network switches, reducing data transfer and boosting efficiency. This results in a 1.8x more effective bandwidth for AI applications.
2Microsoft Azure, a longtime user of InfiniBand for scientific simulations, will be among the first to adopt the advanced Quantum X800 for developing cutting-edge Trillion Parameter models.
2 Many customers want to use Ethernet instead of InfiniBand
2 to simplify their operations.
2 However, standard Ethernet was not built to meet the demands of AI.
2 AI factories alternate between GPU compute periods and collective operation data transfers. Network delays introduced can cause tail latencies, which slows the overall workload performance.
2 This histogram comparison shows how SpectrumX reduces tail latency compared to traditional Ethernet.
24:15
In multi-tenant deployments, SpectrumX with noise isolation eliminates network hotspots and can deliver 2.2x better
2 all-reduced performance. By dynamically rebalancing to avoid failed links, SpectrumX increases
2 point-to-point bandwidth by 1.3x. This results in superior performance and reliability for the largest AI data center deployments.
Dec 2024, X announced the world's largest accelerated system, the Colossus supercomputer. This system is built by Dell and Supermicro, featuring 100,000 H100 GPUs, to train Grok 3, one of the world's most advanced large-language models. We've worked diligently with our partners to deploy Colossus in record time, going from equipment delivery to training in just 19 days and full-scale production within 122 days. Thus far, X has been thrilled with the system performance. Spectrum X Ethernet is achieving an impressive 95% of theoretical data throughput compared to 60% with traditional Ethernet solutions. The system also maintains zero latency degradation and no packet loss due to flow collisions across the three tiers of the network fabric. This deployment sets a new standard for AI at scale.
2 We're extremely excited to announce that Spectrum X is making its debut on the top 500 list. In fact, two Spectrum X powered systems find themselves in the top 50. Both are Dell-based, one built by GMO Internet Group and the other by our very own NVIDIA Israel 1 supercomputer.
2 I'm sure these will be the first of many to come.
We build our platform on a one-year cadence, continually advancing
2 each component to redefine performance and efficiency. But it's not just about the hardware. Continuous software optimization is key. With every cycle, we enhance our software stack to extract
2more from our GPUs, CPUs, and DPUs. This means users can consistently leverage cutting-edge advancements without disruption, leading to compounding improvements. Today, Blackwell is in full production. Next year, Blackwell Ultra will raise the bar even higher, followed by Rubin, ensuring that each generation builds on the last to deliver even greater breakthroughs in AI and HPC.
Over the past year, we've witnessed an explosion of new AI-driven use cases, datasets, and foundational models. A standout example is evolutionary scales work in accelerating drug discovery with the release of the ESM3 generative model for protein design. Built on the NVIDIA accelerated computing platform, ESM3 was trained on over 2 billion protein sequences, using 60 times more data and 25 times more computing power than its predecessor ESM2. Now, let's hear from Dr. David Keyes of KAUST to tell us
27:13
about their two Gordon Bell finalist submissions for genomics
27:16
and climate modeling. Genome-wide association studies explore the great dogma of biology that genotype leads to phenotype. Genotype here includes not only genomic factors, but also environmental factors like demographics, diet, smoking habits, and so forth. And the goal is to start with a large database of individuals. We used the 305,000 person UK biobank and then compare their genomes and their generalized genotypes to one another and then to the prevalence of diseases to which they are subject.
2We actually were able by means of scaling up to go from the 305,000 patients for which we have real data to a synthetic database generated from 13 million patients. And that number is actually enough for more than half the countries of the world to do a full genomic analysis of their populations. We had very little difficulty moving the code from one to another, so in particular we ran on the V100s of Summit, we ran on the A100s of Leonardo.
2 And we ran on the Hopper 100s, the GH configuration of ALPS. Node for node, Hopper is by far the most interesting in terms of performance and also because it offers the FP8. And we're happy to use it as close to the low precision end as we can get away with and would encourage other domain scientists to try to take advantage of that. This is a very exciting prospect because many future venues of smart
2health and smart agriculture will benefit greatly from a democratized genome-wide association of studies.
29:04
There is a 45 institution campaign in its sixth generation of generating future climates called C-MIP.
2 And they are starting to become handicapped by the volume of data that they produce. And each of these institutions is devoting hundreds of millions of core hours to generating these future climates. Climate emulation is a statistical model that tries to reproduce the statistics of the chaotic simulation.
2 We actually brought the queryable distance down to about 3.5 kilometers from maybe 100 kilometers on earlier models. We were able to obtain 0.8 exaflops of mixed precision
2on the ALPS system, and that was on 2,000 of those nodes. I think Digital Twin is reproducing the statistics of the real world by making the full 2D surface of the earth visible at very high
3 resolution with data compression, with statistics that reproduce ensembles of PDE-based models.
30:11
We feel that we've democratized climate emulation. It is incredible to see all the great work being done by researchers to harness the power of AI for science.
While training AI models is important, the real value lies in deploying these models and using them in inference, where they can generate insights and predictions in real time.
3 To make it easier for users to scale AI models in production, we've introduced NVIDIA NIM Inference Microservices. We've collaborated with the model builders worldwide to convert
3their models into high-performance, efficient runtimes as NIMs. These NIMs deliver two to five times faster token throughput than standard AI runtimes, offering the best total cost of ownership.
30Weather and climate affect a wide range of industries – transportation, energy, agriculture, insurance, and many more. The frequency of extreme weather events costing over a billion dollars is increasing at an alarming rate. Since 1980, the financial impact of severe storm events in the U.S. has increased 25-fold. The importance of timely and accurate weather modeling data is at an all-time high. The Weather Company uses NVIDIA GPUs for kilometer-scale
3simulations of their graph model, achieving 10 times higher throughput and 15 times greater energy efficiency compared to
3 traditional CPU-based simulations. To drive even greater speed and efficiency, they are also working with NVIDIA to adopt AI-based methods to generate high-resolution forecast data
. Now m Jan 2025, we're announcing two new NIMs for Earth-2 to empower climate tech application providers with new AI capabilities. NVIDIA's Earth-2 CoreDIF is a generative AI model for kilometer-scale super resolution. Earlier this year, we showed its ability to super resolve typhoons over Taiwan.
3 Today, Earth-2 NIM for CoreDIF is now available.
3CoreDiff is 500 times faster and 10,000 times more energy efficient than traditional high-resolution numerical weather prediction using CPUs.
3We've also worked with U.S. weather forecasting agencies to develop a CoreDiff model for the entire continental U.S., an area 300 times larger than the original Taiwan-based model. However, not every use case requires high resolution forecasts.
3Some applications can benefit from more larger ensembles at coarser resolutions. State-of-the-art numerical models like the GFS are limited to 20 ensembles due to computational constraints. Today, we're also announcing the availability of ForecastNet NIM. It can deliver global two-week forecasts 5,000 times faster than numerical weather models. This makes it possible to use ensembles with thousands of members, opening new opportunities for climate tech providers.
3 Now they can estimate risks related to extreme weather and predict low-probability events that current computational methods might miss.
There is a new industrial revolution happening in biopharma driven by AI. AI models shorten the time to therapy and increase the success rate of new medicines. The NVIDIA BioNemo framework lets scientists choose from
3various AI templates to build customized models. BioNemo is purpose-built for pharma applications and, as
3 a result, delivers twice the training performance than other AI software used today. BioNemo is accelerating computer-aided drug discovery in many of the world's pharmaceutical companies. Today, we're announcing the BioNemo framework is available as an open source repository on GitHub. We're excited to see what AI can do for the future of the healthcare industry. Today, we're also announcing DiffDoc 2.0, an NVIDIA NIM microservice for predicting how drugs interact with target proteins. DiffDoc 2.0 is six times faster than 1.0, published just one year ago.
One of the main drivers behind our performance boost is the new QEquivariance library, which speeds up essential mathematical operations for molecular predictions. DiffDoc has been retrained using the Plinder database, the largest molecular protein structure database in the world, which is boosting DiffDoc's accuracy. This new version is built to unlock a new scale of virtual screening and drug discovery, and we're excited to see what our ecosystem of researchers does with it next.
AI has transformed the study of proteins for drug discovery. And we think AI has the potential to make the same impact in digital chemistry. With an estimated 10 to the 60th possible materials in the universe and only 10 to the 8th currently known, there's huge potential to innovate.
3Announcing NVIDIA Alchemy. It's a collection of chemistry-specific NIMS for the discovery of new compounds. Scientists start by defining the properties they want, like strength, conductivity, low toxicity, or even color. A generative model suggests thousands to millions of potential candidates with the desired properties. Then the alchemy NIM can sort the candidate compounds for stability by solving for their lowest energy states using NVIDIA Warp, resulting in 100 times faster design space search, going from months to a day. Using the alchemy workflow, the best candidates are identified before moving forward to costly real-world testing.
Traditional engineering analysis workflows, from physics simulation to visualization, can take weeks or even months to complete.
3Most analysis of physical systems, like planes and automobiles and ships, use a set of loosely coupled applications, each generating information which must be interpreted by engineers at each step. A real-time digital twin enables an engineer to adjust design parameters. For example, you could change the shape of a
3 body panel and see how it impacts streamlines in real time.
3Announcing Omniverse Blueprint for real-time digital twins. The Blueprint is a reference workflow that includes NVIDIA's acceleration libraries, physics AI frameworks, and Omniverse to design, simulate, and visualize all in real time. It can run on all cloud platforms as well as NVIDIA's own DJX Cloud. Altair, Cadence, Siemens, and others are exploring how to integrate the Blueprint into their own services and products for design acceleration. NVIDIA is also collaborating with Rescale to incorporate the Blueprint into their physics AI platform.
Let's take a look at the Blueprint in action.
3 Everything manufactured is first simulated with advanced physics and solvers. Computational Fluid Dynamic Simulations, or CFD, can take hours or months, limiting the number of possible design explorations. With an NVIDIA Omniverse Blueprint for real-time
3 physics digital twins, software makers can integrate NVIDIA acceleration libraries, PhysicsML and RTX Visualization into their existing tools, enabling a 1200x speedup in design iteration time.
37:40
Here, Luminary Cloud builds a fully real-time virtual wind tunnel based on the Blueprint. First, Luminary uses the NVIDIA Modulus PhysicsML framework to
3 train a simulation AI model using data generated from their NVIDIA
37CUDAx Accelerated CFD solver. The model understands the complex relationship between airflow fields and varying car geometries, generating results orders of magnitude times faster than with the solver alone. The AI output is visualized in real-time using Omniverse APIs. Now an engineer can make geometry or scene modifications, seeing the effects in real time, and because of Omniverse's data interoperability, the engineer can even bring new geometries, and the simulation will adapt instantly. What took weeks or even months is now a matter of seconds.
Software developers everywhere can now bring unprecedented speed and flexibility to the world's industrial designers and engineers, helping save massive costs and shorten the time to market. ANSYS is adopting NVIDIA's technologies into its CAE platform. Ansys Fluent, accelerated by NVIDIA GPUs, Ensight, powered by Omniverse Visualization, and SimAI, built on NVIDIA NIM microservices.
AI will not only transform simulation, it will accelerate scientific experimentation as well. An overwhelming amount of data is being generated by advanced instruments, such as radio telescopes, particle accelerators, X-ray light sources, and fusion reactor experiments. For example, the Square Kilometer Array is expected to be completed by the end of the decade. The SKA in Australia will produce an average of one terabyte a second, a thousand times more than the current state-of-the-art array. In particle physics, the LHCb detector at CERN produces five terabytes of data per second. Following the 2030 upgrade, it could reach as high as 25 terabytes per second. Both the instruments and the researcher's time are incredibly valuable, making it essential to extract meaningful insights from all of this data as efficiently as possible. We are working with the researchers at the SETI Institute and Breakthrough Listen to deploy the world's first AI search for fast radio bursts, or FRBs. While over 1,000 have been detected, only 15 have been traced to specific galaxies. We've implemented a real-time pipeline using NVIDIA Holoscan at the Allen Telescope Array, processing data from 28 dishes
4 at 100 gigabits per second. This pipeline can process 100 times more data than conventional methods used today. This is the first direct feed of raw telescope data to an AI model for FRB detection.
Quantum hardware offers the opportunity to revolutionize computing in fundamental ways. Unfortunately, today's best quantum processors can only perform hundreds of operations before their fundamental unit of computation, their qubits, just become overwhelmed with noise. This makes scaling quantum hardware into useful computing devices impractical. Today, we're announcing a partnership with Google to apply NVIDIA's state-of-the-art AI supercomputing to solve this challenge and accelerate their quantum hardware development. To be useful, quantum computers need large numbers of qubits, operating with performance far beyond today's capabilities. AI supercomputing is the key to building higher-quality, error-corrected qubits that can meet these demands. Google Quantum AI is working with NVIDIA to explore how to accelerate digital representations of their superconducting qubits. Unlike circuit simulations, which focus on the high-level operation of an ideal quantum computer, dynamical simulations model the complex physics describing real, noisy, quantum hardware, fully accounting for how the qubits inside the quantum processor interact not only with each other but also with their surrounding environment. Dynamical simulations are essential to understanding and reducing qubit-specific sources of noise. Using NVIDIA hardware and software, Google Quantum AI researchers can accelerate these complex simulations. This enhances the ability of researchers to understand the noise in their systems, explore new designs, and increase hardware performance, all of which are essential for
4 scaling quantum processors.
4 And we're also announcing that dynamical simulation is available in CUDA Q, our open source quantum development platform. This means that through CUDA Q, simulations capture the full dynamics of every qubit comprehensively, unlike commonly performed quantum simulations. These types of comprehensive qubit simulations that would have previously taken a week now can run in just minutes. With CUDA Q, developers of all quantum processors can perform larger simulations and explore more scalable qubit designs. Together, NVIDIA's growing network of quantum partners are driving toward the goal of achieving practical, large-scale quantum computing.
As we conclude this exciting journey through NVIDIA's latest
4 innovations, we invite you to come to the NVIDIA booth to see many of these technologies firsthand.6
Interact with James, our digital human, and witness the future
4of AI-driven virtual interactions. Experience the world's first real-time interactive
4wind tunnel built on NVIDIA Omniverse blueprints. Explore the power of Earth-2 NIMS in climate modeling and see how Holoscan is revolutionizing radio astronomy. You'll also hear from researchers sharing breakthroughs in fields like energy storage and seismic simulation in our theater. have a great supercomputing 2024.
Chris AI Macrae MA DAMTP CantabAuthor