.IN SEARCH OF INTELLIGENCE unites families advancing next generation's life on earth.. Spring 2025, Wash DC, chris.macrae@yahoo.co.uk ; linkedin UNwomens::: 2025Reporter Club
Over 60% of people depend on pacific trades. Because of Western era of Empire, Pacific people's growth exponentials have depended on what von neumann called designing development rpind above zero-sum trading maps through 3 waves of "millions times more tech brainpower : Moore's engineers of linking in silicon chip valley 95-65; Satellite Data worlkdwiode 5G to 1G (2015-1990), Partners of Jensen Huang's Deep Data Computing & Learning ets Stanford Engineeriing Quadrangle 20102025reprt.com map of 75 years of inteligence versus ignorance -discussion welom chris.macrae@yahoo.co.uk
That's our open syatem foundations observation. scaling over 75 years since John Von Neumann asked Economist journalists to mediate futures of brainworking through 3 million fold hi-tech waves :Moore's Silicon Valley,*Satellites 1G to 5G Death of Distance mobilising data round earth* Jensens platforms for DEEP LEARNING Data Science aligned to Einstein's 1905 nano-science-Earth revolution. NB Miraculous Transformations In tha last 5 quarters of human endeavor, may we commend projects emerging from 6 summits linkedin by Taiwanese-Americans gravitated by Jensen Huang (Nvidia) and 4 summits living up to King Charles wishes for humanity : Nov 2023 London Turing latest Deep Minds,, May 2024 Korea, Summer 2024 semi-private Japan State Visit to London (Charles 60th Anglo-Japan reunion as 1964 delegate to Tokyo Olympics), December 2024 India's Wadwani AI in DC (with next round of King Charles Series - Macron Paris Feb 2025).. Jensen's health AI meta-collab: Hong Kong Digital Twin 2020s supercity health centres :Tokyo Update Maso Son & Japan Royal LLM everywhere; India's sata socereignty od world largest population with Ambani & Modi; NVidia in DC with eg LOgkhttf Martin ; Taiwan RWins galore eg Fioxconnn extension to foundry for autonomous as well as mobile world; San Jose March 2-24 tenth annual upfate of most joyful parternship tech world has ever generated Over the past year, key international organizations, like the G7, OECD, and Global Partnership on Artificial Intelligence (GPAI), have shaped the global AI governance conversation and focused on foundational principles, critical risks, and responsible AI development. Looking ahead to 2025, how are G7 countries and corporations planning to implement AI governance frameworks and address challenges, such as the growing energy demand for AI technologies? Join the Wadhwani AI Center for the International AI Policy: Outlook for 2025 conference. This full-day event will be held at CSIS headquarters on December 9, 2024, from 9:00 AM to 6:00 PM ET and will convene leading policymakers, industry experts, and thought leaders to explore the latest international efforts in AI governance. Featuring keynote speeches from distinguished figures, including Ambassador Shigeo Yamada of Japan to the United States, Ambassador Laurent Bili of France to the United States, and Sara Cohen, Deputy Head of Mission at the Embassy of Canada, this conference will highlight key international perspectives in AI governance.

Wednesday, February 12, 2025

Thank goodness for the inteligence of jensen and nvidia coworkers

Jensen Huang. https://www.nvidia.com/en-us/on-demand/session/supercomputing2024-keynote/  Supercomputers are among humanity's most vital instruments, driving scientific breakthroughs and expanding the frontiers of knowledge. At NVIDIA, our journey has been profoundly shaped by our work on supercomputers. 

 In 2006, we announced CUDA and launched the world's first GPU for scientific computing. 

Japan's Tokyo Tech's Tsubame, in 2008, was the world's first NVIDIA-accelerated supercomputer.

 Four years later, NVIDIA powered the world's fastest supercomputer, the Oak Ridge National Lab's Titan.

 Then, in 2016, NVIDIA introduced the first AI supercomputer, DGX-1, which I hand-delivered to OpenAI. 

01:39 From the world's first GPU for supercomputers to now building AI supercomputers for the world, our journey in supercomputing over the past 18 years has shaped the NVIDIA of today, since CUDA's inception. NVIDIA has driven down the cost of computing by a million-fold. 

 For some, NVIDIA is a computational microscope, allowing them to see the impossibly small. 
 For others, it's a telescope, exploring the unimaginably distant. 
 And for many, it's a time machine, letting them do their life's work within their lifetime.

 02:19 NVIDIA CUDA has become one of the very few ubiquitous computing platforms. 
 But the real stars are the CUDAx libraries. 
They're the engines of accelerated computing.Just as OpenGL is the API that connects computer graphics  to accelerators, CUDAx are domain-specific libraries  that connect new applications to NVIDIA acceleration.  CUDAx opens new markets and industries to NVIDIA, from 0healthcare and telecommunication to manufacturing and transportation. 

02:54 In chip manufacturing, cuLitho accelerates  computational lithography. 
03:00 In telecommunications, Arial processes wireless radio and CUDA.
  And in healthcare and genomics, Parabricks accelerates  gene sequence alignment and variant calling. 

 For data science and analytics, cuDF supercharges data processing by accelerating popular libraries like SQL, Pandas, Polars, and Spark. 

03:23 KU-VS accelerates vector database indexing and retrieval, central  to building AI agents.  NVIDIA KU Quantum Library performs quantum circuit  simulations on CUDA. 

 Omniverse is a suite of libraries that realize and  operate digital twins for robotics, manufacturing, and logistics. 
 In Jan 2025, we announced a major new library, CuPyNumeric, a GPU-accelerated implementation of NumPy, the most widely used library for data science, machine learning, and numerical computing. 

With over 400 CUDAx libraries, NVIDIA accelerates important  applications in nearly every field of science and industry fueling a virtuous cycle of increasing GPU adoption increasing ecosystem partners, and increasing developers. 

One of our most impactful libraries is cuDNN, which  processes deep learning and neural network operations.  QDNN accelerates deep learning frameworks, enabling an incredible one-million-fold scaling of large language models over 6 the past decade and led to the creation of ChatGPT. 04:41 AI has arrived, and a new computing era has begun. 

Every layer of the computing stack has been reinvented from coding software with rules and logic to machine learning of patterns and relationships.  From code that runs on CPUs to neural network processed by GPUs,. AI emerged from our innovations in scientific computing. And now, science is leveraging AI to supercharge the  scientific method. 

 I spoke about this fusion in my Supercomputing 2018 address.  Since then, AI and machine learning have been integrated into nearly every field of science.  AI is helping to analyze data at incredible scales, accelerate simulations, control experiments in real time, and build predictive 0 models These are examples that revolutionize fields from drug discovery, genomics, to quantum computing. Using AI, we can emulate physical processes at a scale previously computationally prohibitive. This transformative impact has been recognized at the highest levels.  Jeffrey Hinton and John Hopfield received the Nobel Prize in  Physics for their pioneering work on neural networks.  Demis Hassabis, John Jumper, and David Baker received the Nobel Prize in Chemistry for groundbreaking advancements  in protein prediction. 

This is just the beginning. Scaling laws have shown predictable improvements  in AI model performance as they scale with model size, data, and computing power.  The industry's current trajectory scales computing power four-fold  annually, projecting a million-fold increase over a decade. For comparison, Moore's Law achieved a hundred-fold increase per decade. These scaling laws apply not only to LLM training, but with the advent of OpenAI Strawberry, also to inference. 

 Over the next decade, we will accelerate our roadmap to keep pace with training and inference scaling demands, and to discover the next plateaus of intelligence. Nvidia's Blackwell chip means today's AI computers are unlike anything built before.  Every stage of AI computing, from data processing to training to inference, challenges every component, from GPUs to memory  to networking and switches. 

The significant investment in AI factories make every detail crucial.  Time to first train, reliability, utilization, power efficiency,  token generation throughput, and responsiveness. NVIDIA embraces extreme co-design, optimizing every layer, from chips and systems to software and algorithms.  

The Blackwell system integrates seven different types of chips.  Each liquid-cooled rack is 120 kilowatts, 3,000 pounds, consists of 18 compute trays, nine NVLink switches, connecting 144 Blackwell chips, over two miles of NVLink copper cabling, into one giant virtual GPU with 1.44 AI exaflops.   Blackwell is in full production. Taiwan's Foxconn is building new Blackwell production and testing facilities in the US, Mexico and Taiwan, using NVIDIA Omniverse to bring up  the factories as fast as possible. 25 years after creating the first GPU, we have reinvented  computing and sparked a new industrial revolution.  An entirely new industry is emerging.  AI factories producing digital intelligence, manufacturing AI at scale. AI will accelerate scientific discovery.

Researchers will have AI-powered assistance to generate and explore promising ideas.  In business, AI will work alongside teams across every function—marketing, sales, supply chain, chip design, software development, and beyond. Eventually, every company will harness digital AI agents boosting productivity, fueling growth, and creating new jobs.  And in the physical world, AI will soon power humanoid robots capable of adapting and performing a variety of tasks with minimal demonstration. 

 Manufacturing, logistics, and service industries stand 09:27 to benefit from an AI-powered productivity growth that will  reshape the global economy.  It's extraordinary to be on the edge of such transformation. 0We are thrilled to bring to life the incredible computers we're  designing today and to see how AI and computing will revolutionize 0 each of the world's $100 trillion industries in the coming decade.
...

What is OpenAI Strawberry?

Strawberry is an advanced AI model developed by OpenAI, designed to enhance reasoning capabilities beyond traditional large language models (LLMs). Unlike most LLMs trained via "outcome supervision," Strawberry uses process supervision, rewarding the model for correctly reasoning through each step of a problem rather than just producing the right answer. This approach enables Strawberry to solve complex logic and math problems, generate step-by-step solutions, and create high-quality synthetic training data for future models like OpenAI's Orion13.

Strawberry's ability to "reason" makes it a game-changer for AI applications, including education, as it can simulate human-like planning and problem-solving. It also aims to reduce hallucinations in AI outputs by focusing on logical consistency17.

How Inference Goes Beyond AI Training

AI inference refers to the process where a trained model applies its learned knowledge to new, unseen data to make predictions or decisions. It represents the "execution phase" of AI, while training is the "learning phase." Here’s how inference extends beyond training:

  1. Real-Time Applications:

    • Inference allows AI systems to operate in real-world scenarios, such as autonomous vehicles recognizing stop signs or industrial robots detecting defects on production lines46.

    • For LLMs like Strawberry, inference enables reasoning and planning in response to user prompts, going beyond simple text generation.

  2. Efficiency and Scalability:

    • Inference optimizes pre-trained models for practical use, requiring less computational power compared to training large models from scratch. This makes advanced AI accessible for broader applications26.

  3. Active Problem-Solving:

    • Models like Strawberry demonstrate how inference can simulate "thinking" by performing background calculations and planning before outputting results, enabling more accurate and context-aware responses7.

Potential Impact on Education and Climate Research

  1. Education:

    • Strawberry's reasoning capabilities can support personalized learning tools that teach problem-solving skills step-by-step.

    • It could also generate educational content tailored to specific curricula, enhancing advanced STEM education.

  2. Climate Research:

    • Nvidia’s Earth-2 platform (leveraging AI inference) and partnerships with AWS could complement Strawberry’s reasoning abilities by simulating high-resolution climate models.

    • These tools align with the mission of institutions like the Stanford Doerr School of Sustainability, providing actionable insights for climate mitigation strategies and serving as interactive educational resources for students and policymakers136.

In summary, Strawberry exemplifies how advanced reasoning in AI can transform both education and research by enabling deeper problem-solving and practical applications through inference...

 

  

09:49 Let's build the future together.  0This year at SC24, Professor Ari Kaufman, Distinguished Professor  at Stony Brook University, is being honored with the Test of Time Award for his landmark 2004 paper, GPU Cluster for High-Performance Computing.  Using fluid dynamics equations, Professor Kauffman simulated airborne contaminant dispersion in New York City's Times Square on  the first large-scale GPU cluster. 1 His research laid the groundwork for today's accelerated computing,  proving the power of GPUs for large-scale simulations.  From everyone at NVIDIA, congratulations, 10:32 Professor Kauffman, on this well-deserved recognition. 10Your pioneering contributions exemplify the spirit of progress  that drives the field forward. As a result of this groundbreaking work, accelerated computing has become the technology of choice for supercomputing.  This chart shows the history of accelerated and unaccelerated  supercomputers among the top 100 fastest systems in the world. 




In the last five years, the number of accelerated systems has increased by eight systems per year and increased from 33% to over 70% of the top 100.  Our goal is to help the world accelerate every workload. It is certainly ambitious, given the wide range of programming  languages, breadth of developer experience and needs, and ongoing emerging algorithms and techniques. 

 To support our developer community and meet them where they are, today we offer over 450 libraries, many of which are tailored to specific domains and ever-evolving developer landscape.  Take Python, for example. Warp is our Pythonic framework for building and accelerating data  generation and spatial computing. Most physics-based simulators in HPC engage in some form of spatial computing, whether it in quantum chemistry, climate modeling, or fluid dynamics, all of which operate in 3D space.  And by expressing these calculations in Warp, they  can be automatically differentiated and integrated into AI workflows. One of the most popular Python libraries that 12:01 exists today is NumPy.  NumPy is the foundation library for mathematical computing for Python developers.  It's used by over 5 million scientific and industrial 1developers with 300 million downloads just last month.  Over 32,000 GitHub applications use NumPy in important science  domains like astronomy, physics, signal processing, and many others. 

However, as scientists look to scale their applications to use large HPC clusters, they often need to use lower-level distributed  computing libraries like OpenMPI.  But what if you didn't?  What if your NumPy program could just automatically  parallelize across a GPU cluster without having to rewrite  the Python program into a different supercomputing application?  Now we're announcing CuPy Numeric, a drop-in replacement for the NumPy library. With CuPy Numeric, researchers can write in Python and easily scale their work without needing expertise in distributed computing. CuPy Numeric automatically distributes the data across the GPU cluster using standard NumPy data types powered by NVIDIA's latest communication and math libraries.  

The results are incredible. The research team at SLAC is using CUPEye numeric to analyze terabytes of data from the LCLS X-ray laser, which fires 120 shots per second. During a 60-hour beam time, they achieved a 6x speedup in data analysis, allowing to make real-time decisions and uncover material properties. This acceleration has reduced their analysis time before publication from three years to only six months. Early adopters for CUPAI Numeric include Stanford University's  Center for Turbulence Research, working on computational fluid dynamic solvers; Los Alamos National Laboratory, scaling  ML algorithms for the Venato supercomputer, and the University of Massachusetts Boston, studying the entropy production rate  in a microscopy imaging experiment.  

One of the highest honors at supercomputing is the Gordon Bell Prize, recognizing exceptional achievements in high-performance computing.  Today, we celebrate five finalist teams whose groundbreaking  research leverages NVIDIA's accelerated systems across  diverse fields, including molecular dynamics, protein design,  genomics, and climate modeling. Giuseppe Barca and his team from the University of Canberra  and Melbourne University scaled an alternative method for calculating atomic energies, achieving a 3x speedup over other GPU-accelerated methods and a super-linear scaling on multiple GPUs. 1We'll hear from Dr. David Keyes at KAUST about  their pioneering work on genomic epistasis and climate emulation. 1 But first, let's turn to Dr. Arvind Ramanathan from Argonne National  Laboratory to discuss their advancements in protein design. 

 One of the key tasks in protein design is to come up with 1novel proteins that have the same function but are different from what we have seen from over the course of four billion years of evolution.  Experimental data happens to come at much slower paces than what you 1would expect from a computational workflow like a simulation.  Routine design is quite a happening area in AI right now. 1 It's undergoing this huge revolution in terms of  how AI models are being deployed and developed.  This is one of the first attempts at building something that's a multi-modal. We actually have the descriptions that are provided in natural  language and then you have the protein sequences that are represented and we basically use that to train a large language  model that will enable us to interact with and get new designs.  And one of the key things that we also learned from this paper was the fact that, yes, it is possible for us to stand up this  workflow, not on just one platform, but across several different platforms at the same time.  We happen to use almost all of NVIDIA's architecture from A100s 16:05 all the way to Grace Hopper chips.  But one of the cool things that we observed was, in terms  of pre-training this model, we could really achieve nearly three exa-flops in mixed precision runs on the system. Again, this sort of runs at the scale of, I think, half of the system was what we actually used. to achieve that sort of performance is really unbelievable.  

One of the key benefits and motivations for accelerating  workloads is to reduce energy consumption. Nowhere is this more understood than in supercomputing where the end of Moore's law was first predicted. Supercomputing centers like Oak Ridge National Labs, as far back as 2010, recognized that the energy for next-gen supercomputers built with CPUs will consume even more power than a major US city.  This truth also applies to applications themselves.  Even though an accelerated server may consume more power than a standard CPU system, the significant reduction  in time to solution results in a huge reduction in the total energy needed to compute a solution.  For example, Texas Advanced Computing Center and ANSYS achieved a 110x speedup and 6x better energy efficiency on a 2.5 billion  cell problem using Grace Hopper.  At NREL, H100 GPUs improved energy efficiency by 4x for  a wind farm simulation. TSMC reduced energy use by 9x using CuLitho for  semiconductor manufacturing. The University of Tokyo's Earthquake Research Institute, in partnership with JAMSTEC and RIKEN, also achieved an 86x speedup and 32x better energy efficiency for earthquake simulations 1using the EuroHPC ALPS system. 

1We are continuously pushing for higher performance and efficiency in AI as well. Large language models like LAMA 3.1. require multiple  GPUs working together for optimal performance. 1 To fully utilize these GPUs, our inference software stack provides optimized parallelism techniques relying on fast  data transfers between GPUs. NVIDIA's NVSwitch technology equips Hopper with superior  GPU-to-GPU throughput and, when integrated with our TRT-LLM 18:27 software stack, delivers continuous performance improvements. This ensures Hopper achieves higher performance and lower cost per token for models like 405B.  In just two months, we've seen over a 1.6x improvement in  performance thanks to innovations in speculative execution,  NVLink communication, and specialized AI kernels.  And we're not stopping with Hopper.  We're actively innovating to harness the power of Blackwell to  build next-generation AI factories.  We're excited to collaborate and support customer successes within  our growing solution ecosystem.  Our partners offer a wide range of systems, from Hopper to Blackwell.  The H200-NVL is specifically designed for air-cooled, flexible HPC solutions, featuring a 4-GPU NVLink domain in a standard PCIe form factor.  We're also working with partners to bring Grace  Blackwell configurations to market. These include the GB200 Grace Blackwell NVL4 Superchip, which integrates a 4GPU NVLink domain with a dual Grace CPU for 1 liquid-cooled scientific computing. 

1The rollout of Blackwell solutions is progressing smoothly thanks  to our reference architecture, enabling partners to quickly bring products to market while adding their own customizations.  Our goal is to accelerate every workload to drive discovery and maximize energy efficiency. 1 This includes both accelerated and partially accelerated applications  that can take advantage of tightly coupled CPU and GPU products like Grace Hopper and Grace Blackwell.  However, not everything takes advantage of acceleration just yet.  For this long tail, using the most energy efficient  CPU in a power-constrained data center environment maximizes workload throughput.  The GRACE CPU is purpose-built for high performance and energy efficiency.  GRACE features 72-arm NeoVerse V2 cores and the NVIDIA Scalable  Coherency Fabric, which delivers 3.2 terabytes a second of  bandwidth, double that of a traditional CPU.  Paired with LPDDR5X memory, it can achieve 500 gigabytes  a second of memory bandwidth while consuming just 16 watts. That's one-fifth the power of conventional DDR memory. These innovations enable Grace to deliver up to 4x  the performance for workloads like weather forecasting  and geoscience when compared to x86 systems, making it an ideal solution for energy-efficient, high-performance CPU computing.

  Earlier this year at Computex, Jensen unveiled our next-generation ARM-based CPU, VERA, set to debut in 2026.  It will be available both as a standalone product and as a tightly integrated solution with the Rubin GPU.  With a focus on data movement, our next-generation CPU  fabric and NVLink chip-to-chip technologies are designed to maximize system performance.  Bearer will be a versatile CPU capable of delivering performance and efficiency across a wide range of compute and memory intensive tasks.  And it's not just about single-node computing.  Networking plays a critical role in today's accelerated  computing platforms.  Traditional Ethernet was designed for enterprise  data 
centers, optimized for single-server workloads.  NVIDIA's NVLink combined with InfiniBand or Spectrum 2 X Ethernet sets the gold standard for AI training 2 and inference data centers.  This combination enables extreme scalability and peak performance.  NVLink switch systems allow GPUs to scale up and communicate  seamlessly as a unified whole. 

 For east-west compute fabrics, where fast data exchange  between GPUs is critical, NVIDIA Quantum InfiniBand  and Spectrum X Ethernet provide the low-latency, high-throughput  infrastructure needed to scale beyond NVLink domains.  For north-south traffic, Bluefield DPUs optimize data flow between the data center and the external networks, ensuring  both efficiency and security.  Together, these technologies create a powerful and resilient  infrastructure for large-scale AI and HPC workloads.  NVIDIA Quantum InfiniBand delivers unmatched high-speed 2 data transfer with minimal latency, essential for parallel processing  and distributed computing.  The Quantum X800 platform features a 144-port switch  with 800 gigabits per port, powered by the ConnectX8 SuperNIC. Together, they support MPI and nickel offloads, enabling 14.4 teraflops of in-network computing with NVIDIA SHARP.  Without SHARP, all reduced operations require repeated point-to-point transfers.  SHARP can optimize this by performing data reductions  directly within the network switches, reducing data transfer and boosting efficiency.  This results in a 1.8x more effective bandwidth  for AI applications.

 2Microsoft Azure, a longtime user of InfiniBand for scientific simulations, will be among the first to adopt the advanced Quantum  X800 for developing cutting-edge Trillion Parameter models. 2 Many customers want to use Ethernet instead of InfiniBand 2 to simplify their operations. 2 However, standard Ethernet was not built to meet the demands of AI. 2 AI factories alternate between GPU compute periods and collective  operation data transfers.  Network delays introduced can cause tail latencies, which slows  the overall workload performance. 2 This histogram comparison shows how SpectrumX reduces tail latency  compared to traditional Ethernet. 24:15 In multi-tenant deployments, SpectrumX with noise isolation  eliminates network hotspots and can deliver 2.2x better 2 all-reduced performance.  By dynamically rebalancing to avoid failed links, SpectrumX increases 2 point-to-point bandwidth by 1.3x.  This results in superior performance and reliability for the  largest AI data center deployments. 

Dec 2024, X announced the world's largest accelerated  system, the Colossus supercomputer.  This system is built by Dell and Supermicro, featuring  100,000 H100 GPUs, to train Grok 3, one of the world's most  advanced large-language models. We've worked diligently with our partners to deploy Colossus  in record time, going from equipment delivery to training  in just 19 days and full-scale production within 122 days. Thus far, X has been thrilled with the system performance.  Spectrum X Ethernet is achieving an impressive 95% of theoretical data  throughput compared to 60% with traditional Ethernet solutions. The system also maintains zero latency degradation and  no packet loss due to flow collisions across the three  tiers of the network fabric.  This deployment sets a new standard for AI at scale. 2 We're extremely excited to announce that Spectrum X is making  its debut on the top 500 list.  In fact, two Spectrum X powered systems find  themselves in the top 50.  Both are Dell-based, one built by GMO Internet Group and  the other by our very own NVIDIA Israel 1 supercomputer. 2 I'm sure these will be the first of many to come.  

We build our platform on a one-year cadence, continually advancing 2 each component to redefine performance and efficiency.  But it's not just about the hardware. Continuous software optimization is key.  With every cycle, we enhance our software stack to extract 2more from our GPUs, CPUs, and DPUs. This means users can consistently leverage cutting-edge advancements  without disruption, leading to compounding improvements.  Today, Blackwell is in full production.  Next year, Blackwell Ultra will raise the bar even higher,  followed by Rubin, ensuring that each generation builds  on the last to deliver even greater breakthroughs in AI and HPC. 

Over the past year, we've witnessed an explosion of  new AI-driven use cases, datasets, and foundational models.  A standout example is evolutionary scales work in accelerating  drug discovery with the release of the ESM3 generative model  for protein design.  Built on the NVIDIA accelerated computing platform, ESM3 was  trained on over 2 billion protein sequences, using 60 times more  data and 25 times more computing power than its predecessor ESM2.  Now, let's hear from Dr. David Keyes of KAUST to tell us 27:13

 about their two Gordon Bell finalist submissions for genomics 27:16 and climate modeling.  Genome-wide association studies explore the great dogma of biology that genotype leads to phenotype.  Genotype here includes not only genomic factors, but  also environmental factors like demographics, diet, smoking  habits, and so forth.  And the goal is to start with a large database of individuals.  We used the 305,000 person UK biobank and then compare  their genomes and their generalized genotypes to one another and  then to the prevalence of diseases to which they are subject. 2We actually were able by means of scaling up to go from the  305,000 patients for which we have real data to a synthetic database generated from 13 million patients. And that number is actually enough for more than half the countries  of the world to do a full genomic analysis of their populations.  We had very little difficulty moving the code from one to another, so in particular we ran on the V100s of Summit,  we ran on the A100s of Leonardo. 2 And we ran on the Hopper 100s, the GH configuration of ALPS.  Node for node, Hopper is by far the most interesting in  terms of performance and also because it offers the FP8.  And we're happy to use it as close to the low precision end  as we can get away with and would encourage other domain scientists  to try to take advantage of that.  This is a very exciting prospect because many future venues of smart 2health and smart agriculture will benefit greatly from a democratized  genome-wide association of studies. 29:04 

There is a 45 institution campaign in its sixth  generation of generating future climates called C-MIP. 2 And they are starting to become handicapped by the volume  of data that they produce. And each of these institutions is devoting hundreds of millions  of core hours to generating these future climates.  Climate emulation is a statistical model that  tries to reproduce the statistics of the chaotic simulation. 2 We actually brought the queryable distance down to about 3.5  kilometers from maybe 100 kilometers on earlier models.  We were able to obtain 0.8 exaflops of mixed precision 2on the ALPS system, and that was on 2,000 of those nodes.  I think Digital Twin is reproducing the statistics of the real  world by making the full 2D surface of the earth visible at very high 3 resolution with data compression, with statistics that reproduce  ensembles of PDE-based models. 30:11 We feel that we've democratized climate emulation.  It is incredible to see all the great work being done  by researchers to harness the power of AI for science.  

While training AI models is important, the real value  lies in deploying these models and using them in inference,  where they can generate insights and predictions in real time. 3 To make it easier for users to scale AI models in production,  we've introduced NVIDIA NIM Inference Microservices.  We've collaborated with the model builders worldwide to convert 3their models into high-performance, efficient runtimes as NIMs.  These NIMs deliver two to five times faster token throughput  than standard AI runtimes, offering the best total cost of ownership. 30Weather and climate affect a wide range  of industries – transportation, energy, agriculture, insurance, and many more.  The frequency of extreme weather events costing over  a billion dollars is increasing at an alarming rate.  Since 1980, the financial impact of severe storm events in the U.S.  has increased 25-fold.  The importance of timely and accurate weather modeling  data is at an all-time high.  The Weather Company uses NVIDIA GPUs for kilometer-scale 3simulations of their graph model, achieving 10 times higher  throughput and 15 times greater energy efficiency compared to 3 traditional CPU-based simulations.  To drive even greater speed and efficiency, they are also  working with NVIDIA to adopt AI-based methods to generate  high-resolution forecast data

. Now m Jan 2025, we're announcing two new NIMs for Earth-2 to empower  climate tech application providers with new AI capabilities.  NVIDIA's Earth-2 CoreDIF is a generative AI model for  kilometer-scale super resolution.  Earlier this year, we showed its ability to super resolve  typhoons over Taiwan. 3 Today, Earth-2 NIM for CoreDIF is now available. 3CoreDiff is 500 times faster and 10,000 times more energy efficient than traditional high-resolution numerical  weather prediction using CPUs. 3We've also worked with U.S.  weather forecasting agencies to develop a CoreDiff model  for the entire continental U.S., an area 300 times larger than the original Taiwan-based model.  However, not every use case requires high resolution forecasts. 3Some applications can benefit from more larger ensembles at coarser resolutions.  State-of-the-art numerical models like the GFS are  limited to 20 ensembles due to computational constraints. Today, we're also announcing the availability of ForecastNet NIM.  It can deliver global two-week forecasts 5,000 times faster  than numerical weather models.  This makes it possible to use ensembles with thousands of  members, opening new opportunities for climate tech providers. 3 Now they can estimate risks related to extreme weather and predict  low-probability events that current computational methods might miss. 

There is a new industrial revolution happening in  biopharma driven by AI.  AI models shorten the time to therapy and increase the  success rate of new medicines.  The NVIDIA BioNemo framework lets scientists choose from 3various AI templates to build customized models.  BioNemo is purpose-built for pharma applications and, as 3 a result, delivers twice the training performance than  other AI software  used today.  BioNemo is accelerating computer-aided drug discovery  in many of the world's pharmaceutical companies.  Today, we're announcing the BioNemo framework is available as an  open source repository on GitHub. We're excited to see what AI can do for the future of  the healthcare industry.  Today, we're also announcing DiffDoc 2.0, an NVIDIA  NIM microservice for predicting how drugs  interact with target proteins.  DiffDoc 2.0 is six times faster than 1.0, published  just one year ago. 

 One of the main drivers behind our performance boost is the  new QEquivariance library, which speeds up essential  mathematical operations for molecular predictions.  DiffDoc has been retrained using the Plinder database, the  largest molecular protein structure database in the world, which  is boosting DiffDoc's accuracy.  This new version is built to unlock a new scale of virtual screening  and drug discovery, and we're excited to see what our ecosystem  of researchers does with it next.

 AI has transformed the study of proteins for drug discovery.  And we think AI has the potential to make the same impact in  digital chemistry.  With an estimated 10 to the 60th possible materials in  the universe and only 10 to the 8th currently known, there's huge potential to innovate. 3Announcing NVIDIA Alchemy.  It's a collection of chemistry-specific NIMS  for the discovery of new compounds. Scientists start by defining the properties they want,  like strength, conductivity, low toxicity, or even color.  A generative model suggests thousands to millions of  potential candidates with the desired properties.  Then the alchemy NIM can sort the candidate compounds for stability  by solving for their lowest energy states using NVIDIA Warp, resulting  in 100 times faster design space search, going from months to a day.  Using the alchemy workflow, the best candidates are identified  before moving forward to costly real-world testing. 

 Traditional engineering analysis workflows, from physics simulation  to visualization, can take weeks or even months to complete. 3Most analysis of physical systems, like planes and  automobiles and ships, use a set of loosely coupled applications,  each generating information which must be interpreted  by engineers at each step. A real-time digital twin enables an engineer to adjust design parameters.  For example, you could change the shape of a 3 body panel and see how it impacts streamlines in real time.

 3Announcing Omniverse Blueprint for real-time digital twins.  The Blueprint is a reference workflow that includes NVIDIA's  acceleration libraries, physics AI frameworks, and Omniverse to design, simulate, and visualize all in real time.  It can run on all cloud platforms as well as NVIDIA's own DJX Cloud.  Altair, Cadence, Siemens, and others are exploring how  to integrate the Blueprint into their own services and  products for design acceleration. NVIDIA is also collaborating with Rescale to incorporate the Blueprint into their physics AI platform. 

 Let's take a look at the Blueprint in action. 3 Everything manufactured is first simulated with advanced  physics and solvers.  Computational Fluid Dynamic Simulations, or CFD, can take hours  or months, limiting the number of possible design explorations.  With an NVIDIA Omniverse Blueprint for real-time 3 physics digital twins, software makers can integrate NVIDIA  acceleration libraries,  PhysicsML and RTX Visualization into their existing tools,  enabling a 1200x speedup in design iteration time. 

37:40 Here, Luminary Cloud builds a fully real-time virtual  wind tunnel based on the Blueprint.  First, Luminary uses the NVIDIA Modulus PhysicsML framework to 3 train a simulation AI model using data generated from their NVIDIA 37CUDAx Accelerated CFD solver.  The model understands the complex relationship between  airflow fields and varying car geometries, generating results orders of magnitude times faster than with the solver alone.  The AI output is visualized in real-time using Omniverse APIs. Now an engineer can make geometry or scene modifications, seeing  the effects in real time, and because of Omniverse's data  interoperability, the engineer can even bring new geometries, and the  simulation will adapt instantly.  What took weeks or even months is now a matter of seconds. 

 Software developers everywhere can now bring unprecedented speed  and flexibility to the world's industrial designers and engineers,  helping save massive costs and shorten the time to market.  ANSYS is adopting NVIDIA's technologies into its CAE platform. Ansys Fluent, accelerated by NVIDIA GPUs, Ensight, powered by Omniverse Visualization, and SimAI, built on NVIDIA NIM microservices. 

 AI will not only transform simulation, it will accelerate scientific experimentation as well.  An overwhelming amount of data is being generated by  advanced instruments, such as radio telescopes, particle  accelerators, X-ray light sources, and fusion reactor experiments.  For example, the Square Kilometer Array is expected to be completed  by the end of the decade. The SKA in Australia will produce an average of one terabyte a second, a thousand times more than the current state-of-the-art array. In particle physics, the LHCb detector at CERN produces  five terabytes of data per second.  Following the 2030 upgrade, it could reach as high as 25 terabytes per second. Both the instruments and the researcher's time are  incredibly valuable, making it essential to extract meaningful  insights from all of this data as efficiently as possible.  We are working with the researchers at the SETI Institute and  Breakthrough Listen to deploy the world's first AI search for fast radio bursts, or FRBs.  While over 1,000 have been detected, only 15 have been traced to specific galaxies. We've implemented a real-time pipeline using NVIDIA Holoscan at the Allen Telescope Array, processing data from 28 dishes 4 at 100 gigabits per second. This pipeline can process 100 times more data than conventional  methods used today. This is the first direct feed of raw telescope data to an  AI model for FRB detection.  

Quantum hardware offers the opportunity to revolutionize  computing in fundamental ways.  Unfortunately, today's best quantum processors can only  perform hundreds of operations before their fundamental unit  of computation, their qubits, just become overwhelmed with noise.  This makes scaling quantum hardware into useful computing  devices impractical.  Today, we're announcing a partnership with Google to  apply NVIDIA's state-of-the-art AI supercomputing to solve  this challenge and accelerate their quantum hardware development.  To be useful, quantum computers need large numbers of qubits,  operating with performance far beyond today's capabilities.  AI supercomputing is the key to building higher-quality, error-corrected qubits that can meet these demands.  Google Quantum AI is working with NVIDIA to explore how to  accelerate digital representations of their superconducting qubits.  Unlike circuit simulations, which focus on the high-level  operation of an ideal quantum computer, dynamical simulations  model the complex physics describing real, noisy, quantum  hardware, fully accounting for how the qubits inside the quantum processor interact not only with each other but also  with their surrounding environment. Dynamical simulations are essential to understanding and reducing  qubit-specific sources of noise.  Using NVIDIA hardware and software, Google Quantum AI researchers can accelerate these complex simulations.  This enhances the ability of researchers to understand  the noise in their systems, explore new designs, and increase  hardware performance, all of which are essential for 4 scaling quantum processors. 4 And we're also announcing that dynamical simulation is available in CUDA Q, our open source quantum development platform.  This means that through CUDA Q, simulations capture the  full dynamics of every qubit comprehensively, unlike commonly  performed quantum simulations.  These types of comprehensive qubit simulations that would  have previously taken a week now can run in just minutes.  With CUDA Q, developers of all quantum processors can perform  larger simulations and explore more scalable qubit designs.  Together, NVIDIA's growing network of quantum partners  are driving toward the goal of achieving practical, large-scale  quantum computing. 

As we conclude this exciting journey through NVIDIA's latest 4 innovations, we invite you to come to the NVIDIA booth to see many of these technologies firsthand.6 Interact with James, our digital human, and witness the future 4of AI-driven virtual interactions.  Experience the world's first real-time interactive 4wind tunnel built on NVIDIA Omniverse blueprints.  Explore the power of Earth-2 NIMS in climate modeling  and see how Holoscan is revolutionizing radio astronomy. You'll also hear from researchers sharing breakthroughs in fields  like energy storage and seismic simulation in our theater.  have a great supercomputing 2024.

No comments:

Post a Comment