News
Running Large-Scale GPU Workloads on Kubernetes with Slurm
2+ hour, 16+ min ago (539+ words) Slinky, an open source project developed by Sched MD (now part of NVIDIA), takes two approaches to this integration: Slinky slurm-operator represents each Slurm component (slurmctld for scheduling, slurmdbd for accounting, slurmd for compute workers, slurmrestd for API access) as…...
Cut Checkpoint Costs with About 30 Lines of Python and NVIDIA nv COMP
2+ hour, 23+ min ago (913+ words) Hardware interruptions at 1000+ GPU scale aren't rare. Meta reported 419 unexpected interruptions across 54 days of Llama 3 training on 16, 384 NVIDIA H100 GPUs (~one every 3 hours). This is why most teams checkpoint every 15-30 minutes; it's load-bearing infrastructure, not optional overhead. This breakdown surprises people…...
How to Accelerate Protein Structure Prediction at Proteome-Scale
4+ hour, 16+ min ago (543+ words) Proteins rarely function in isolation as individual monomers. Most biological processes are governed by proteins interacting with other proteins, forming protein complexes whose structures are described in the hierarchy of protein structure as the quaternary representation." This represents one level…...
Integrate Physical AI Capabilities into Existing Apps with NVIDIA Omniverse Libraries
1+ day, 3+ hour ago (768+ words) Physical AI'AI systems that perceive, reason, and act in physically grounded simulated environments'is changing how teams design and validate robots and industrial systems, long before anything ships to the factory floor. At GTC 2026, NVIDIA highlighted physical AI as a key…...
Running AI Workloads on Rack-Scale Supercomputers: From Hardware to Topology-Aware Scheduling
2+ day, 25+ min ago (890+ words) The NVIDIA GB200 NVL72 and NVIDIA GB300 NVL72 systems, featuring NVIDIA Blackwell architecture, are rack-scale supercomputers. They're designed with 18 tightly coupled compute trays, massive GPU fabrics, and high-bandwidth networking packaged as a unit." This post demonstrates how Mission Control, Slurm, and NVIDIA Run: ai…...
NVIDIA Platform Delivers Lowest Token Cost Enabled by Extreme Co-Design
1+ week, 1+ day ago (670+ words) Co-designed hardware, software, and models are key to delivering the highest AI factory throughput and lowest token cost. Measuring this goes far beyond peak chip specifications. Rigorous AI inference performance benchmarks are critical to understanding real-world token output, which drives…...
Accelerating Vision AI Pipelines with Batch Mode VC-6 and NVIDIA Nsight
6+ day, 22+ hour ago (840+ words) In vision AI systems, model throughput continues to improve. The surrounding pipeline stages must keep pace, including decode, preprocessing, and GPU scheduling. In the previous post, Build High-Performance Vision AI Pipelines with NVIDIA CUDA-Accelerated VC-6, this was described as the data-to-tensor…...
Bringing AI Closer to the Edge and On-Device with Gemma 4
1+ week, 10+ hour ago (242+ words) The bundle includes four models, including Gemma's first Mo E model, which can all fit on a single NVIDIA H100 GPU and supports over 140 languages. The 31 B and 26 B A4 B variants are high-performing reasoning models suitable for both local and data…...
Achieving Single-Digit Microsecond Latency Inference for Capital Markets
1+ week, 2+ hour ago (1177+ words) NVIDIA GH200 Grace Hopper Superchip sets record in STAC-ML benchmark The NVIDIA GH200 Grace Hopper Superchip in the Supermicro ARS-111 GL-NHR server has achieved single-digit microsecond latencies in the STAC-ML Markets (Inference) benchmark, Tacana suite (audited by STAC), providing performance comparable to…...
Accelerate Token Production in AI Factories Using Unified Services and Real-Time AI
1+ week, 1+ day ago (356+ words) Operations teams and administrators need more than dashboards. They need flexibility and foresight. In one example where NVIDIA had MAX-Q profile in operation, domain power service allowed the data center to run at 85% power with only 7% throughput loss. It was…...