Try our newest merchandise
After months of rumors, hypothesis and leaks, NVIDIA CEO Jensen Huang formally unveiled the GeForce RTX 50 collection primarily based on the RTX Blackwell graphics structure, throughout a mega-keynote tackle on the Michelob Extremely Area in Las Vegas throughout CES 2025. For normal readers, the preliminary bulletins weren’t a shock, however among the inner-workings and new options in RTX Blackwell that Jensen revealed left many in attendance (and viewing on-line) slack-jawed, for a number of causes.
The RTX Blackwell graphics structure on the basis of the GeForce RTX 50 collection takes every thing from the corporate’s earlier gen Ada structure – which powers the RTX 40 collection – and cranks it as much as 11. RTX Blackwell additionally introduces the idea of neural rendering, which can seemingly mark a paradigm shift in the best way PC video games are rendered sooner or later.
Evidently, there’s so much to get to. However earlier than we dive in, we have to level you to some earlier protection. You may watch NVIDIA’s CES Keynote right here for particulars straight from the horse’s mouth, and take a look at the playing cards and the GeForce RTX 5090’s tiny PCB too. We will save the deep dive on the precise {hardware} for our upcoming critiques, and deal with the precise RTX Blackwell structure and what it permits right here…
We’re going to dive deeper on the pages forward, however this fast abstract evaluating the previous few GeForce generations lays a lot of the groundwork. The GeForce RTX 50 collection options up to date shader cores with assist for neural shaders, along with 4th gen RT (ray tracing) cores and fifth gen Tensor cores, which add assist for FP4. DLSS 4 debuts with the RTX 50 collection, a brand new AI Administration – or AMP – processor is built-in into RTX Blackwell GPUs, and the media engine has been beefed-up with extra, extra succesful encoders and decoders. The RTX 50 collection additionally contains a native PCIe gen 5 interface, assist for DisplayPort 2.1b (as much as UHBR20), and GPUs are fed by the most recent excessive pace GDDR7 reminiscence.
Nearly each side of the RTX 50 collection is upgraded over earlier generations, which ends up in important efficiency uplifts in nearly each kind of workload, from generative AI, to media transcoding, rasterization, and every thing in between.
NVIDIA RTX Blackwell Structure Overview
NVIDIA made adjustments and additions to the entire numerous cores and IP employed within the GeForce RTX 50 collection. The shader cores, RT cores, and Tensor cores are all achieve new options and capabilities.
Beginning with the shader cores, NVIDIA enabled full FP32 and INT32 assist in the entire shader cores in RTX Blackwell. Within the earlier era Ada graphics structure, half of the shaders in every SM supported FP32 / INT32, whereas the opposite half solely supported FP32. This successfully doubles the INT32 bandwidth. Shader Execution Reordering, or SER, throughput within the SM has additionally doubled in Blackwell.
Neural shaders additionally debut within the RTX 50 collection. Beforehand, it wasn’t attainable to entry the tensor cores by way of a compute shader in a graphics API. For the RTX 50 collection, nonetheless, NVIDIA has labored with Microsoft to create one thing referred to as Cooperative Vectors inside DirectX, which unlocks the tensor cores and provides builders the power to run AI fashions on the tensor cores, in recreation. Extra on what that permits shortly.
RTX Blackwell’s RT cores have additionally been considerably redesigned. The function comparable Field Intersection and Opacity Micromap engines to the previous-gen Ada structure, however the Triangle Intersection engine current in Blackwell has been upgraded to what NVIDIA calls a Triangle Cluster Intersection engine, and Triangle Cluster Compression and assist for Linear Swept Spheres have been added. The brand new capabilities within the RT cores enabled what NVIDIA calls Mega Geometry.
The Tensor cores in RTX Blackwell achieve assist for FP4, over and above the FP8 and FP16 assist in Ada. Assist for FP4 successfully doubles the throughput over FP8, whereas concurrently decreasing reminiscence necessities for a selected mannequin. FP4 additionally technically ends in lowered precision versus FP8 or FP16, however by optimally quantizing fashions for the info kind and structure, the impression in lots of shopper use instances must be negligible.
To assist the GeForce RTX 50 collection higher handle the numerous AI workloads that might be run on the GPUs alongside conventional recreation engine code, NVIDIA has additionally integrated an AI Administration Processor, or AMP, into the design. AMP is a programmable processor that sits on the entrance of the GPU that may work together carefully with the entire completely different cores. AMP evaluates rendering standards to optimally dispatch and schedule AI and graphics workloads throughout the assorted cores.
To maintain the GPUs fed with knowledge, NVIDIA is utilizing the most recent GDDR7 reminiscence on the GeForce RTX 50 collection. GDD7 affords double the info price of GDDR6 (not 6X), with considerably higher effectivity. That interprets to larger reminiscence bandwidth, and lowered vitality consumption. The highest finish GeForce RTX 5090 will supply as much as 1.8TB/s of peak reminiscence bandwidth, which is about 80% larger than the RTX 4090.
The show and media engines within the RTX 50 collection have additionally been upgraded versus Ada. The RTX 50 collection assist DisplayPort 2.1 with a UHBR20 knowledge price of as much as 20Gbps, to allow excessive refresh charges at ultra-high resolutions, with HDR visuals. The RTX 50 collection additionally options high-performance, {hardware} primarily based Flip Metering, which shifts the body pacing logic to the show engine, to permit the GPU to exactly handle show timing and precisely tempo frames at excessive framerates and when multi body era is used.
RTX 50 Sequence Max-Q Energy Administration
NVIDIA’s purpose with the RTX 50 collection was to extract the utmost quantity of efficiency attainable inside a given platform’s energy finances. Like earlier gen chips, when elements or the entire GPU are idle, they rapidly enter into deeper energy states, or shut off altogether. However NVIDIA enhanced Blackwell’s capabilities on this regard too.
RTX 50 collection GPUs now assist clock, energy and rail gating, and NVIDIA additionally improved the power to dynamically modify frequencies and voltages.
The effectivity optimization begins with clock gating, but when total engines go idle, the logic and SRAMS can reap the benefits of progressively deeper energy states, till getting into the deepest sleep states, or be shut down altogether. A second energy rail for the GPUs has additionally been added and every will be gated as vital. NVIDIA claims that rail gating particularly helps battery life on the cellular variants.
None of that’s actually new, however the RTX 50 collection can enter deeper energy states and exit them extra rapidly than Ada. Actually, NVIDIA claims Blackwell reduces the time to enter a deep sleep by an element of 10. The brand new GPUs additionally supply accelerated frequency switching, and responsiveness has reportedly been improved by an element of 1000. With The RTX 50 collection, frequencies will be adjusted with a single body primarily based on the workload, which helps extract most effectivity inside the SMs.
Introducing RTX Neural Rendering
As we talked about earlier, RTX Blackwell additionally introduces the idea of neural rendering and what the corporate calls the “NVIDIA RTX Package”. The NVIDIA RTX Package is mainly an umbrella time period for RTX Neural Shaders, Hair and Pores and skin, RTX Mega Geometry, DLSS 4, Reflex 2, and RTX Remix – which is nearly to hit its 1 yr anniversary.
Observe as you’re digesting many of those new applied sciences, that they’re being enabled for builders now, however will arrive in precise video games at numerous factors sooner or later, aside from DLSS 4, which might be enabled on day 0 in roughly 75 titles.
RTX Neural Supplies permits recreation builders and artists to create larger high quality, extra lifelike supplies that require fewer reminiscence assets. RTX Neural Supplies takes the shader code and the collective of layers (textures, and so on.) for a mannequin, builds them out, after which makes use of AI to compresses them at as much as a 7:1 compression ratio, which is considerably higher than conventional block compression strategies. Materials processing can be as much as 5x quicker than earlier generations. In a single instance proven by NVIDIA, customary supplies required about 47MB of reminiscence, however with RTX Neural Materials the reminiscence requirement was introduced all the way down to 16MB, with higher visible constancy.
Subsequent up is RTX Neural Radiance Cache. RTX Neural Radiance Cache is basically an AI-based strategy to precisely rending oblique gentle in a scene. It really works by doing coaching in run time utilizing the GPU to create a mannequin in actual time, after which caching the lighting within the scene geospatially. When you play, the small neural networks are coaching on the sport knowledge. What this does is permit only one lookup into the cache to deal with many gentle bounces. It successfully traces the ray path at a shorter distance and lets the AI infer the remainder. RTX Neural Radiance Cache is at present accessible within the RTX World Illumination SDK, and might be accessible in Portal with RTX, and is coming to RTX Remix in a number of months.
NVIDIA additionally launched RTX Neural Faces, that are high-quality, generative AI faces. RTX Neural Faces takes a normal 3D face and replaces it in actual time with a extra photo-real AI generated face. The demo proven was of considerably larger high quality than the historically rendered face.
Additionally coming with RTX Blackwell is accelerated Ray Traced Strand Primarily based Hair. Ray tracing hair is computationally heavy, attributable to the entire geometry / triangles required for every strand. Blackwell, nonetheless, can ray hint hair utilizing linear swept spheres. Solely two spheres are required per line section of hair, which is extra concise (as much as 3x much less knowledge) in how it’s saved within the BVH (bounding quantity hierarchy).
Which brings us to RTX Mega Geometry. RTX Mega Geometry accelerates BVH updates for cluster-based methods like Unreal Engine 5’s Nanite. This provides builders the power to make use of a lot larger decision meshes inside ray traced scenes and eliminates the necessity for proxy meshes, which don’t seize almost the quantity of geometry.
RTX Blackwell Introduces DLSS 4 And Multi Body Era
DLSS can be getting a whole overhaul with the introduction of the RTX 50 collection. DLSS 4 introduces multi body era to additional enhance framerates and strikes to a transformer AI mannequin to enhance picture high quality and visible constancy.
Transformer fashions will be educated on a lot bigger knowledge units and infer way more knowledge from the coaching. Transformers require roughly 4x as a lot compute as earlier DLSS fashions, however in the end permit NVIDIA (and finish customers) to higher tweak and modify the tradeoffs between smoothness, framerate and picture high quality that include utilizing a expertise like DLSS. Earlier generations of DLSS used convolutional neural networks (CNN), which produced good outcomes, however may typically end in flickering, shimmering and different artifacts, like ghosting. Many artifacts seen with earlier variations of DLSS are mounted with this new transformer mannequin strategy.
The “smarter” transformer mannequin additionally enhances tremendous decision, by with the ability to get well and deduce extra particulars, and in the end producing a greater upscaled picture.
DLSS 4, when utilizing multi body era, might be working 5 AI fashions per body – the SR and Ray Reconstruction mannequin, the Body-Gen mannequin, and smaller fashions evaluating pairs of rendered frames. The top result’s that 15 out of each 16 pixels you see when multi body gen is enabled are generated by AI.
Observe that solely multi body era is unique to the RTX 50 collection. All different DLSS options might be enhanced by way of the transformer mannequin, so previous-gen RTX playing cards will reap the advantages as properly, with the options they assist.
We also needs to point out that NVIDIA is introducing new DLSS override performance inside the NVIDIA app ass properly. DLSS overrides will give customers the power to enabled DLSS multi body era in 75 DLSS FG titles, check out the most recent transformer fashions in DLSS SR titles, and override DLAA and Extremely Efficiency modes for DLSS SR titles (even in video games that don’t present a UI toggle for manipulating these options).
NVIDIA Reflex 2 To Additional Cut back Latency With Body Warping
One of many points with body era is the potential impression on recreation responsiveness, as a result of person enter did not impact the generated body. To additional improve responsiveness in video games, NVIDIA can be introducing Reflex 2. Reflex takes the entire goodness of Reflex and incorporates assist for body warp. Body warping, nonetheless, could reveal holes within the knowledge – for instance, what’s behind a pillar or wall. To handle that situation, NVIDIA can be introducing inpainting predictive rendering to fill in these elements of the scene. This may not be a win for each recreation, however it’s a good choice for video games the place somebody desires optimum latency. With Reflex 2, NVIDIA is claiming as much as 75% quicker responsiveness. Reflex 2 is coming to The Finals and Valorant quickly, and can come to extra titles at later dates.
So, what does all of this AI and expertise do for framerates and latency? Effectively, it ends in considerably lowered latency and as much as 8x larger framerates versus native rendering.
Let’s take a look at extra options and projected efficiency of NVIDIA’s GeForce RTX 50 Sequence, subsequent…