Corey Nolet’s Post

Principal Engineer | Big-Data Science, ML and Graph Analytics | High-Performance & Distributed Computing

2mo

Device-initiated reads from NVMe that don’t need to wait on CPU scheduling? Yes, please! This is massively useful tech, especially for access patterns that are already optimized for fast disk IO, and when compute can already be moved to the GPU.

Ryan Meredith

Director, Storage Solutions Architecture at Micron Technology

2mo

NVIDIA GTC'24: Check out Micron's new Gen5 NVMe SSD get 2x the performance on BaM and the NVIDIA H100 get 5x faster GNN training time! Check out our blog here: https://lnkd.in/gms3DjuB Micron Technology worked with teams at Dell Technologies and NVIDIA to produce industry-leading research on AI training model offload to NVMe, which it showcased at the NVIDIA GTC global AI conference. We tested Big Accelerator Memory (BaM) with GPU-initiated direct storage (GIDS) on the NVIDIA H100 Tensor Core GPU in a Dell PowerEdge R7625 server with Micron’s upcoming high-performance Gen5 E3.S NVMe SSD. Huge thanks to the following folks at Micron, Dell, and NVIDIA for making this possible: Micron: John Mazzie, Jeff Armstrong Dell: Seamus Jones, Jeremy Johnson, Mohan Rokkam NVIDIA: Vikram Sharma Mailthody, Chris (CJ) Newburn, Brian Park, Zaid Qureshi, Wen-mei Hwu #MicronTechnology #DellTechnologies #NVIDIA #AI #NVMe #GPU #storage #research #innovation #GTC2024

To view or add a comment, sign in

More Relevant Posts

Ryan Meredith

Director, Storage Solutions Architecture at Micron Technology
2mo
Report this post
NVIDIA GTC'24: Check out Micron's new Gen5 NVMe SSD get 2x the performance on BaM and the NVIDIA H100 get 5x faster GNN training time! Check out our blog here: https://lnkd.in/gms3DjuB Micron Technology worked with teams at Dell Technologies and NVIDIA to produce industry-leading research on AI training model offload to NVMe, which it showcased at the NVIDIA GTC global AI conference. We tested Big Accelerator Memory (BaM) with GPU-initiated direct storage (GIDS) on the NVIDIA H100 Tensor Core GPU in a Dell PowerEdge R7625 server with Micron’s upcoming high-performance Gen5 E3.S NVMe SSD. Huge thanks to the following folks at Micron, Dell, and NVIDIA for making this possible: Micron: John Mazzie, Jeff Armstrong Dell: Seamus Jones, Jeremy Johnson, Mohan Rokkam NVIDIA: Vikram Sharma Mailthody, Chris (CJ) Newburn, Brian Park, Zaid Qureshi, Wen-mei Hwu #MicronTechnology #DellTechnologies #NVIDIA #AI #NVMe #GPU #storage #research #innovation #GTC2024
1 Comment
Like Comment
To view or add a comment, sign in
Madeleine Boulariah

Experienced Sales Leader Boston SARL
9mo
Report this post
🚀 Unleash the Power of Accelerated AI with Supermicro NVIDIA MGX™ Systems, brought to you by Boston! 🚀 At Boston, we're thrilled to introduce you to the future of accelerated computing. With Supermicro's NVIDIA MGX™ Systems, powered by the latest NVIDIA GH200 Grace Hopper™ Superchip and NVIDIA Grace™ CPU Superchip, you can transform your AI capabilities like never before. 💼 Here's what makes these systems extraordinary: Modular architecture designed to standardize AI infrastructure. Available in both compact 1U and 2U form factors for ultimate flexibility. Ultimate expansion ability for present and future GPUs, DPUs, and CPUs. But that's not all. Our advanced liquid-cooling technology enables power-efficient hyper-dense configurations, delivering plug and play compatibility with thousands of rack-scale AI clusters per month from around the world. 🌐 With the Supermicro MGX™ 1U Systems, you get: Coherent Memory System for AI and HPC Applications. Up to 2 NVIDIA GH200 Grace Hopper Superchips with high memory capabilities. High-bandwidth interconnect for substantial speedups in AI workloads. Modular bays for full-size PCIe expansions. And with our Supermicro MGX™ 2U Systems, you can: Opt for a NVIDIA Grace™ CPU Superchip or Intel® Xeon® Scalable processor. Support up to 4 GPUs without sacrificing I/O networking or thermals. Tailor your system for various accelerated workloads and fields, from AI to HPC, data analytics, and more. 💡 Revolutionize your infrastructure, tackle the world's most important challenges, and stay ahead in the AI era with Boston. Explore the future of computing today! ☀ #AI #Technology #Supermicro #AcceleratedComputing #Innovation #LiquidCooling Boston Server & Storage Solutions GmbH Boston Limited Boston SARL NVIDIA AI Supermicro
Like Comment
To view or add a comment, sign in
Benjamin Lindner, CISA

ex-PwC and ex-AuditBoard
5mo
Report this post
🌐 Nvidia's Eos: Unveiling a New Era of Enterprise AI Supercomputing 💻 State-of-the-Art Supercomputer: Nvidia introduces Eos, the company's fastest and most advanced enterprise AI supercomputer, setting a new benchmark in the field. 🌟 Global Ranking and Purpose: Ranked 9th globally for FP64 performance, Eos is purpose-built for cutting-edge AI development and scalability. 🚀 Powerful Configuration: Eos boasts a formidable setup with 576 DGX H100 systems, integrating 4,608 Nvidia H100 GPUs and 1,152 Intel Xeon Platinum 8480C CPUs. 📈 Impressive Performance Metrics: With Nvidia's Quantum-2 InfiniBand (400Gb/s) and In-Network Computing, Eos achieves 121.4 FP64 PetaFLOPS (Rmax) and 18.4 FP8 ExaFLOPS, tailored for AI workloads. 🤖 Optimized Software Stack: Eos comes equipped with a comprehensive software stack, optimized for AI development, deployment, and orchestration, catering to diverse AI applications. 🧠 AI Development Frontier: Eos is a game-changer for generative AI and large-scale model training, offering unprecedented capabilities for complex AI tasks. 🔍 With the unveiling of Eos, Nvidia continues to redefine the boundaries of AI supercomputing, providing unparalleled resources for enterprise AI innovation. #NvidiaEos #AISupercomputing #TechAdvancement #AIInnovation
Like Comment
To view or add a comment, sign in
César Beltrán Miralles
1mo
Report this post
Nvidia’s CEO Jensen Huang is revolutionizing AI with groundbreaking hardware and strategic vision, leading the company to unprecedented heights. - 🚀 Unveiled the GB200 Grace Blackwell Superchip for enhanced AI performance. - 🖥️ Introduced the Omniverse platform for creating ultra-realistic digital twins. - 📈 Achieved a $2 trillion market cap, reflecting strong confidence in its AI leadership. #AI #Innovation #TechLeadership - 💡 The GB200 Grace Blackwell Superchip integrates two Blackwell Tensor Core GPUs with the Grace CPU, optimizing energy use and networking traffic. - 🌐 The Omniverse platform enables industries to create detailed digital replicas for various applications, from automotive simulations to virtual operating rooms. - 📊 Nvidia's strategic investments have propelled its market cap to $2 trillion, showcasing Wall Street's confidence in the company's AI advancements. Nvidia’s CEO math. https://lnkd.in/gP8N9AXr

Nvidia’s CEO math.

theverge.com
Like Comment
To view or add a comment, sign in
Pinakastra Cloud

449 followers
3mo
Report this post
Nvidia and Intel promote their support for Meta Llama 3 genAI LLM. Intel and Nvidia vie for dominance in supporting Meta's latest large language model, Llama 3, reflecting the competitive landscape of the HPC market. Intel, positioning its Gaudi accelerators and Xeon processors, emphasizes enhanced AI capabilities with Llama 3. Performance tests show significant improvements, with Xeon 6 processors achieving faster inference latency compared to previous generations. Nvidia’s take on Llama 3 Nvidia joins the competition with its TensorRT-LLM software library, enhancing inference performance on Nvidia GPUs for Meta's new LLMs. Intel's CEO acknowledges Nvidia as a strong competitor amidst the launch of Intel's Gaudi 3 GPU and Nvidia's latest Blackwell GPUs. Despite Nvidia's dominant market share, Intel's rapid support for ecosystem developments signals its determination to catch up, potentially gaining share this year. Nvidia's TensorRT-LLM significantly boosts LLM inference performance on its GPUs, contributing to its success in MLPerf testing. Support for Meta Llama 3 is seamlessly integrated into Nvidia's AI platform, offering flexible deployment options across various environments. #Nvidia #Intel #MetaLlama3 #GenAILLM #HPC #Competition #AI #GaudiAccelerators #XeonProcessors #TensorRTLLM #GPU #InferencePerformance #BlackwellGPUs #MLPerf #AIPlatform #TechCompetiton #AIInnovation #HardwareSupport #AIInfrastructure #TechLeadership #MarketShare #TechTrends #AIHardware #GPUPerformance #MLM #InnovationRace #AIecosystem #MLDevelopment #TechIntegration #DeepLearning #AIProgress #MLResearch

2 Comments
Like Comment
To view or add a comment, sign in
Pradeep R

Building the new AI Internet | Data Mobility For AI | AI Compute | GPU Cloud | AI Cloud Infrastructure Engineering Leader, AI-Ready Data Centers | Hyperscalers| Cloud,AI/HPC Infra Solutions | Sustainability
7mo Edited
Report this post
Achieving Top Inference Performance with the NVIDIA H100 Tensor Core GPU and NVIDIA TensorRT-LLM Best-in-class AI performance requires an efficient parallel computing architecture, a productive tool stack, and deeply optimized algorithms. NVIDIA released the open-source NVIDIA TensorRT-LLM, which includes the latest kernel optimizations for the NVIDIA Hopper architecture at the heart of the NVIDIA H100 Tensor Core GPU. These optimizations enable models like Llama 2 70B to execute using accelerated FP8 operations on H100 GPUs while maintaining inference accuracy. At a recent launch event, AMD talked about the inference performance of the H100 GPU compared to that of its MI300X chip. The results shared did not use optimized software, and the H100, if benchmarked properly, is 2x faster. The following is the actual measured performance of a single NVIDIA DGX H100 server with eight NVIDIA H100 GPUs on the Llama 2 70B model. This includes results for both “Batch-1” where an inference request is processed one at a time, as well as results using fixed response-time processing.

Achieving Top Inference Performance with the NVIDIA H100 Tensor Core GPU and NVIDIA TensorRT-LLM | NVIDIA Technical Blog

developer.nvidia.com
Like Comment
To view or add a comment, sign in
ANNY Y.

GM | Semiconductor Distributor | Drone Broker | Startupper | Runner
7mo
Report this post
AMD launches the AMD Instinct MI300X, the highest-performance accelerator in the world for generative AI. AMD Instinct MI300X uses 8 XCD (containing 304 CDNA 3 computing units, which means each computing unit contains 34 CUs (CU: Computing Unit)), 4 IO dies, 8 HBM3 stacks (making the HBM capacity of MI300 X up to 192GB, 2.4 times that of NVIDIA H100 80GB HBM 3), Designed with up to 256MB of AMD Infinity Cache and 3.5D packaging (using TSMC's SoIC and CoWoS technology), supporting new math formats such as FP8 and sparsity, it is a design that is fully oriented to AI and HPC workloads. AMD stated in the new presentation document that the peak storage bandwidth of MI 300X is 5.3TB/s, which is 2.4 times that of Nvidia H100 SXM (storage bandwidth of 3.3TB/s). A year ago, AMD CEO Lisa Su believed that the AI accelerator market in 2023 would be US$30 billion. The global data center AI accelerator market will reach US$150 billion by 2027, implying a CAGR of approximately 50% during the period. But now, Lisa Su believes that the market size of AI accelerators will reach US$45 billion in 2023, and the CAGR in the next few years will be as high as 70%, pushing the entire market to US$400 billion by 2027. #artificialintelligence #Nvidia #HPC
Like Comment
To view or add a comment, sign in
Jolanta Kramer

Angel investor and business development advisor
4mo
Report this post
Jensen Huang announced NVIDIA's latest breakthrough in graphics accelerators at the GTC 2024 conference. The NVIDIA B200 Tensor Core is a high-performance accelerator based on the advanced Blackwell architecture that combines two GPU chips into a single unit. With 192GB of HBM3e memory and a total computing power of 20 PFLOPS, this accelerator offers immense computing power while consuming less energy. This breakthrough in graphics accelerators is a game-changer in data processing, from engineering simulations to quantum computing. It enables the creation and training of more complex AI models, which is crucial for innovation in new technologies. Major organizations such as Amazon Web Services, Dell Technologies, Google, Meta, Microsoft, OpenAI, Oracle, Tesla, and xAI are expected to adopt the Blackwell architecture. With up to 25x lower costs and energy consumption, this accelerator opens up enormous opportunities for the technology industry. #NVIDIA #AI #graphicsaccelerator #Blackwellarchitecture #GTC2024 #computingpower #AImodels #technologyindustry
Like Comment
To view or add a comment, sign in
Sigma2 AS - Norwegian research infrastructure services (NRIS)

1,034 followers
10mo
Report this post
Hey there, researchers! 🤩 We're all about to unwrap a surprise. 🎁 As you can see in the image 📦🚚, we've got some mysterious packages arriving. Is it Christmas coming early? 🎅🎄 We're thrilled to announce that Saga, one of our national supercomputers, is getting a significant GPU compute boost. By September's end, this upgrade will unlock a new level of computing power, offering endless possibilities for researchers. Eight advanced GPU nodes, each an featuring AMD EPYC 7542 32-CPU, 1024 GB of RAM, 9.6 TB flash storage, and 4 NVIDIA A100 GPUs with 80 GB memory, will replace some existing nodes on Saga. This means enhanced processing capabilities and access to cutting-edge NVIDIA A100 GPUs for more efficient research. 💻🔬 Researchers, get ready for top-tier hardware, perfect for machine learning, AI, climate modeling, and drug discovery. This upgrade marks a significant step in advancing scientific research in Norway. More info on our website: https://lnkd.in/dKASh4ZU 🤓 #Supercomputing #ResearchAdvancements #HPC
2 Comments
Like Comment
To view or add a comment, sign in

2,651 followers

947 Posts

View Profile Follow

Corey Nolet’s Post

More Relevant Posts

Explore topics